86% of data science decision makers across the Global 2000 believe machine learning impacts their industries today. However, many enterprises are concerned that only a fraction of their ML projects will have business impact. In some cases, investments made in ML projects are questioned and projects abandoned when the implementation does not match the vision (ref).
The ML industry is beginning to understand the need for more engineering discipline around ML. “Just as humans built buildings and bridges before there was civil engineering, humans are proceeding with the building of societal-scale ML systems. Just as early buildings and bridges sometimes fell to the ground, many of our early societal-scale ML systems are already exposing serious flaws. What we’re missing is an engineering discipline with its principles of analysis and design.” -- Prof. Michael Jordan, UC Berkeley
Identify the ML project’s stage and plan accordingly
Assess economic value
Build an economic model of the expected value from the project. Use it to provide context to inform project decisions, thus moving the focus from the ML technology to its impact on business. Doing so at the beginning of a project can dramatically change the direction and focus of the project.
Elicit key business drivers or constraints that the model must meet (such as, “must be at least as accurate as the current process,” or “must provide transparency into how decisions are being made”). These constraints become requirements for the ML system, risks to be managed, or decision criteria on whether the model is sufficiently good to proceed. Whether a model is sufficient to support the business case might be a higher bar than whether the model is a good model.
Assess the cost of errors. Given the speed and volume that ML models address, existing human intervention and oversight may be removed. What is the cost for resultant errors? If there is a cost for each error, how much tolerance is there, before the economic model ceases to be positive? If model drift occurs, the number of errors might increase. How serious a problem is that?
Quantify cost of errors by assigning a $ value to each of FP, FN, TP, TN. Use it to change model behavior itself. If the costs of different kinds of errors-such as false negatives or false positives—are widely different, that information can be used to train a model with more desirable outcomes (ref). Ex: the differential costs of errors changes the ML model used to predict breast cancer, producing fewer false negatives (undesirable and expensive) at the cost of more false positives, while still producing a cheaper model overall.
Assess data quality
A significant portion of the research component of ML projects is to assess the data quality and whether it’s appropriate for the problem.
Initial research is frequently performed on cleaned and possibly enriched data extracted from a data lake, for convenience and speed of access. The implicit assumption is that the data operated on in production will be the same, and can be provided quickly enough to act on. This assumption should be tested to ensure the ML model will work as expected.
The more data sources that are involved, the more disparate the data sources that are to be merged, and the more data transformation steps that are involved, the more complex the data quality challenge becomes.
2 ways to ensure that the model’s production performance is similar to development: (1)Compare statistics of the source input data to the data the ML model actually used to train on (2)Validate model against unclean data inputs
Data scientist : Background in math, statistics and advanced analytics. Provide provide statistical and ML specialty knowledge. Rigorous experimental design is critical, particularly for companies with large user bases or in highly regulated industries.
Engineers (data/application/infrastructure): background in programming and specialize in big data technologies. They perform data acquisition, ETL (extract, transform, load) and build data pipelines. Application engineers integrate the model into an application and use the inferences in the context of a business process.
Steering committee : business stakeholders and the financial owner of benefits and risks. They can bring in external specialists (HR/legal/PR) as needed to manage risks
Use scorecards to report on progress
Project environment scorecard
Data quality scorecard
Move from research to production
The code in the researcher’s Jupyter notebook is generally not production quality. Reengineering the researcher’s code is frequently required to make this code a good fit for a production environment.
Unfortunately, the method to communicate the requirements to the development team is frequently by giving them the researcher’s Jupyter or Zeppelin notebook, or a set of Python or R scripts. If the development team redevelops and optimizes the code for production while the research team continues from their base notebook, you have the problem of versioning the code and identifying changes.
All usual software engineering and management practices must still be applied, including security, logging and monitoring, task management, end-to-end A/B testing, API versioning (if multiple versions of the model are used), and so on.