The Machine Learning Workflow - a concept and its application

Luise22 April 2021 5 min read

There is no guarantee for successful machine learning projects - but we show you how to steer the project on the right track and increase your chances for the home stretch. By following the presented machine learning workflow or lifecycle (with which Uber ensures scalable and applied machine learning) you set the right focus in the individual project phases and pay attention to all sub-aspects at the appropriate time.

The presented definition focuses not only on the technical aspects but also on the effectiveness of the whole project, closing the gap with the business context. This blog article shows how you can use the machine learning (ML) workflow for yourself and gives an overview of the different phases of the ML workflow.

The machine learning workflow can be divided into four phases. It should be noted that the "Prototype" phase itself is an iterative process.

Define - Understand business problem and define goals
Prototype - Demonstrate the feasibility of a model approach.
Production - Deployment in the production environment
Measure - Measure, monitor and control the application.

ml_lifecycle_nl (1) Figure adapted from Uber ML stack

Define

The first phase focuses on the business problem to be addressed. It is about considering the issue in an overarching manner and involving all team members. Defining the project expectation and defining the deliverables is done once a common understanding of the initial situation is established. The basic form of the solution (e.g., real-time application) and the model type becomes apparent, while the details are deliberately left open. In any case, inspiration for implementation will emerge here. For a recommended workshop method, consider Design Thinking. In this phase, the following guiding questions are answered:

What are the model results needed for?
Are there already alternatives or manual approaches?
What is expected from a realistic solution?

At the end of the phase, it is ensured that the right problem is being addressed and the framework is set. For more information on the possibilities of machine learning and application ideas for common business challenges, we recommend our whitepaper "How to advance your business through Artificial Intelligence and Machine Learning".

Prototype

Proof of concept is the matter of the second phase. Key activities are data acquisition, preparation for the analysis, and a first model selection. The data experts now enjoy special rights and have almost unlimited freedom to experiment with the design of the model and the associated database. Iteratively, new influencing variables are extracted from the data, and their effect on the model is analyzed. Because of the specific freedom, the working environment is often called Data Lab.

At the end of this experimentation phase, a working prototype of a model should emerge. The prototype reflects a suitable but not necessarily perfect solution to the problem. Elaborate optimizations behind the fourth decimal place are only in a few cases decisive whether a model is suitable in practice or not. This phase primarily serves to answer the following questions:

Is the data quantity and quality sufficient?
Can the problem be solved with a machine learning model?
Is it worth replacing the current process?

As soon as the model design meets the requirements, it is possible to move on to the next phase since the model will be optimized iteratively later on anyway.

Boost your business with
Artificial Intelligence and Machine Learning

Production

In the transition to the Production phase, the decision is made for or against the machine learning approach. If the model proves to be promising, it can be transferred to a production environment. In case the feasibility cannot be proven by the prototypes, you should stop your machine learning project here. The previous expenditures have happened and should in no way distort the decision-making process. However, the previous results are not completely useless. It can be helpful to bring in an external person to avoid wrong decisions.

During the process of going live, the model must pass further organizational and cultural hurdles. In this phase, actions should be implemented to increase acceptance among users. In terms of technical implementation, the deployment benefits from a machine learning pipeline that is as continuous and integrated as possible between the data lab and the runtime environment. Depending on the nature of the pipeline, the use of state-of-the-art technologies may be limited because the latest model types are not immediately supported. As part of the automation, on-demand data transformation will also be added to the pipeline.

The following questions, among others, are clarified in this project phase:

Is the model accepted by the user community?
Which system architecture is suitable? (Cloud, cluster, on-premise).
In what form should the results be made available? (via API, in a database,...)

At the end of the production deployment, the model is ready for use under realistic conditions. Often, the deployment of the model is accompanied by a change of responsibilities. Especially with innovative approaches in model design, problems arise during the handover. MLOps provides important structures for collaboration between data scientists and operational experts.

Measure

The fourth phase ensures that your project delivers sustainable added value for your company. To do this, the performance of the model is monitored during regular operation. In the highly agile world, many changes occur that negatively impact the model's quality of results. Examples of this are the shifting of trends or the migration of market shares. The resulting quality of the model can be negatively affected by this. For this reason, the results should be critically examined at regular intervals and, if possible, compared with a benchmark. The intervals depend, among other things, on the business significance of the model and the rapidity of the results.

The assumptions of the model must also be reviewed. If they are still valid, a retraining of the model with further fresh data can grant a performance boost. In case of profound changes, a structural adjustment of the model or the addition of a new data source will be necessary. In summary, this phase deals with the following questions:

Is the quality of results still adequate?
Are the basic assumptions of the model fulfilled?
Have the requirements changed?

In the course of continuous improvement of the model, versioning of the models is important. To the extent that the results of a model must be reproducible at any time due to, for example, legal requirements in the case of a credit inquiry, this point gains central importance.

The machine learning workflow is characterized by its flexibility. Instead of following firmly anchored goals, the possibilities are revealed by the feasibility study. Up to this point, there is uncertainty as to which results are feasible with the available data and which requirements the implementation entails. Thus, each phase holds its own challenges and opportunities. NextLytics consultants are happy to guide you through parts or the whole workflow and unleash the maximum potential through their practical experience.

If you need support in the planning and realization of machine learning projects, please do not hesitate to contact us. Our consultants have different areas of expertise and complement your machine learning team with the desired competencies.

Luise

Luise Wiesalla joined NextLytics AG in 2019 as a working student / student consultant in the field of data analytics and machine learning. She has experience with full-stack data science projects and using the open-source workflow management solution Apache Airflow. She likes to spend her free time exploring her surroundings and being on the move.