Predictive Thursdays: Is Your Machine Learning Implementation Debt Free?
Debt of any kind—if not addressed—is a time bomb waiting to explode. We can easily relate to this with reference to finance.
The comparison between technical complexity and debt was first drawn in 1992. In an experience report, Ward Cunningham alerted the industry to the problem and, in doing so, coined the term “Technical Debt.”
“Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise”. — Ward Cunningham, 1992
Do Machine Learning Models Experience Technical Debt Too?
A recent paper,” Hidden Technical Debt in Machine Learning Systems,” from Sculley et al (2015) suggests so¹. They explained that machine learning systems induce hidden technical debts in addition to the technical debt that is introduced during software development.
There is a crucial difference between hidden and technical debt. Technical debt can be addressed by refactoring code, removing dead code, reducing dependencies, introducing abstractions for easy maintainability, and so on. However, hidden debt is dangerous because it compounds silently.
The following are broad categories under which hidden debt has been identified in machine learning implementations:
- Boundary erosion
- Data dependencies
- Impact of dealing with changes in the real world
The practices of encapsulation and modular design in software engineering create strong abstraction boundaries to help maintain code. Therefore, code can be easily extended for enhancements without the need to modify the existing code.
Unfortunately, it’s difficult to enforce abstraction boundaries for machine learning systems by defining a specific intended behaviour. This is due to entanglement, correction cascades, and undeclared consumers.
According to Morgenthaler et al in their paper on managing technical debt at Google (2012)², dependency debt is an important factor contributing to code complexity and technical debt in software development. Thankfully, modern day compilers and linkers are able to detect and help fix such dependencies.
However, data dependencies have a similar impact in machine learning systems but are very difficult to detect. Unstable data dependencies, underutilized data dependencies, and static analysis of data dependencies are some of the data-related reasons why hidden debt is created in machine learning systems.
Code that is dedicated to training a model and prediction in machine learning is significantly smaller than various other types of code. Take, for example, glue code (where several otherwise incompatible components are quickly put together into a single implementation), or dead experimental code paths (where code is written for rapid prototyping to gain quick turnaround times in machine learning implementations).
Models created using machine learning algorithms are consumed in business applications that interact directly with the real world. It follows that the unstable nature of the real world is another reason why hidden debt is induced in machine learning systems.
Such situations in machine learning implementations warrant a trusted partner like SAP. At SAP, we provide the capabilities to help prevent the non-obvious, hidden debt that is created unintentionally on a predictive journey. In this way, a data driven organization can embark unencumbered on the path of digital transformation.
¹Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in Machine Learning systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15), C. Cortes, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). MIT Press, Cambridge, MA, USA, 2503-2511.
²David Morgenthaler , Misha Gridnev , Raluca Sauciuc , Sanjay Bhansali, Searching for build debt: experiences managing technical debt at Google, Proceedings of the Third International Workshop on Managing Technical Debt, p.1-6, June 05-05, 2012, Zurich, Switzerland