• Home
  • Free consultation
  • Learn ML
  • Blog
  • Past Projects
  • About Us
Machine Village
  • Home
  • Free consultation
  • Learn ML
  • Blog
  • Past Projects
  • About Us

Managing Machine learning projects

5/18/2020

0 Comments

 
86% of data science decision makers across the Global 2000 believe machine learning impacts their industries today. However, many enterprises are concerned that only a fraction of their ML projects will have business impact. In some cases, investments made in ML projects are questioned and projects  abandoned when the implementation does not match the vision (ref).

The ML industry is beginning to understand the need for more engineering discipline around ML. “Just as humans built buildings and bridges before there was civil engineering, humans are proceeding with the building of societal-scale ML systems. Just as early buildings and bridges sometimes fell to the ground, many of our early societal-scale ML systems are already exposing serious flaws. What we’re missing is an engineering discipline with its principles of analysis and design.” -- Prof. Michael Jordan, UC Berkeley

Identify the ML project’s stage and plan accordingly
Research stage: 
  • Most often done by data scientists
  • ​Exploratory in nature, with an unknown answer
  • Questions are framed as, “is it possible to ...?”, or “can we use this data
  • Answer is unknown, as is the appropriate method to apply to achieve results
  • Outcome of this effort is either “yes, it is possible, and here’s one way to do it that supports that assertion”, or “no, we do not believe it is possible; here are all the things we’ve tried that did not work.”
  • Kanban approach is well suited to its iterative nature (allows for longer time blocks)

Development stage: 
  • Most often done by data engineers and software engineers
  • Method for solving the problem is now known
  • Questions shift to:
  • How to implement at scale?
  • How to pipe the data into the model in a timely fashion?
  • How to collect, store and transform data so models can be retrained consistently?
  • Can predictions be generated within the required SLA?
  • How to build an A/B testing environment?
  • Agile methods like Scrum and XP are more appropriate

Assess economic value
Build an economic model of the expected value from the project. Use it to provide context to inform project decisions, thus moving the focus from the ML technology to its impact on business. Doing so at the beginning of a project can dramatically change the direction and focus of the project.

Elicit key business drivers or constraints that the model must meet (such as, “must be at least as accurate as the current process,” or “must provide transparency into how decisions are being made”). These constraints become requirements for the ML system, risks to be managed, or decision criteria on whether the model is sufficiently good to proceed. Whether a model is sufficient to support the business case might be a higher bar than whether the model is a good model.

Assess the cost of errors. Given the speed and volume that ML models address, existing human intervention and oversight may be removed. What is the cost for resultant errors? If there is a cost for each error, how much tolerance is there, before the economic model ceases to be positive? If model drift occurs, the number of errors might increase. How serious a problem is that? 

Quantify cost of errors by assigning a $ value to each of FP, FN, TP, TN. Use it to change model behavior itself. If the costs of different kinds of errors-such as false negatives or false positives—are widely different, that information can be used to train a model with more desirable outcomes (ref). Ex: the differential costs of errors changes the ML model used to predict breast cancer, producing fewer false negatives (undesirable and expensive) at the cost of more false positives, while still producing a cheaper model overall. 

Verify assumptions
  • Variables relevant to the problem are captured and available in the data
  • How similar is the sample data to the real data?
  • Are the error sources or treatments the same?
  • Training, validation, and test data captured at a point in time remains valid
  • Appropriate model was chosen
  • Rare cases were sufficiently well represented
  • Correct statistical analysis was performed
  • Transfer learning assumes that the source model is appropriate and the learning is indeed transferable
  • The correlations found are relevant

Assess data quality
A significant portion of the research component of ML projects is to assess the data quality and whether it’s appropriate for the problem.

Initial research is frequently performed on cleaned and possibly enriched data extracted from a data lake, for convenience and speed of access. The implicit assumption is that the data operated on in production will be the same, and can be provided quickly enough to act on. This assumption should be tested to ensure the ML model will work as expected.

The more data sources that are involved, the more disparate the data sources that are to be merged, and the more data transformation steps that are involved, the more complex the data quality challenge  becomes.

2 ways to ensure that the model’s production performance is similar to development: (1)Compare statistics of the source input data to the data the ML model actually used to train on (2)Validate model against unclean data inputs

Staff appropriately
Data scientist : Background in math, statistics and advanced analytics. Provide provide statistical and ML specialty knowledge. Rigorous experimental design is critical, particularly for companies with large user bases or in highly regulated industries.

Engineers (data/application/infrastructure): background in programming and specialize in big data technologies. They perform data acquisition, ETL (extract, transform, load) and build data pipelines. Application engineers integrate the model into an application and use the inferences in the context of a business process.

Steering committee : business stakeholders and the financial owner of benefits and risks. They can bring in external specialists (HR/legal/PR) as needed to manage risks

Use scorecards to report on progress
Project environment scorecard 
  • Ethics (weapons system, Predictive policing)
  • Consequential decisions (denying access to loans)
  • Privacy (HIPAA, GDPR)
  • Bias (race identified as loan risk)
  • PR (self driving car harms someone)
  • Need for transparency & auditability
  • Closed world development vs open world deployment (robot in lab vs robot in house with pets)

Financial scorecard
  • Potential upside return
  • Potential downside risk
  • Worst-case downside
  • Liability
  • Cost of building model
  • Cost of maintaining model
  • Quality of model predictions vs expectations
  • Uncertainty in model predictions

Data quality scorecard
  • Test & production data have same characteristics (outliers discarded for both model & production)
  • Input data accuracy (sensor values estimated to be +- 5% of actual)
  • Data volumes & duration (model data only available for 3 months, but business cycle is 1 year)
  • Data sources & pre-processing validated (some data was discovered to be flawed; re-training required)
  • Data change over time (upstream system changes logic & meaning of its input to model)

Move from research to production
The code in the researcher’s Jupyter notebook is generally not production quality. Reengineering the researcher’s code is frequently required to make this code a good fit for a production environment. 

Unfortunately, the method to communicate the requirements to the development team is frequently by giving them the researcher’s Jupyter or Zeppelin notebook, or a set of Python or R scripts. If the development team redevelops and optimizes the code for production while the research team continues from their base notebook, you have the problem of versioning the code and identifying changes.

​All usual software engineering and management practices must still be applied, including security, logging and monitoring, task management, end-to-end A/B testing, API versioning (if multiple versions of the model are used), and so on.

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

  • Home
  • Free consultation
  • Learn ML
  • Blog
  • Past Projects
  • About Us