Data Science Project Mastery through KPIs | by Janik and Patrick Tinz | May, 2023


Use of key performance indicators in the controlling of data science projects

Data Science Project Mastery through KPIs
Photo by Alexandr Podvalny on Unsplash

Project controlling is essential in data science projects, as the time required and costs are often difficult to estimate. Clients want to implement their use cases efficiently and successfully with machine learning methods. You can positively influence consistent project control with meaningful key performance indicators (KPIs). According to some studies, 85% of all data science projects fail, which requires early identification of obstacles.

In classical IT projects, many KPIs exist for project controlling, but these are not sufficient for data science projects. For this reason, in this article we present basic KPIs for IT projects and analyse them with regard to data science projects. As a result, we will see that specific KPIs can make the project controlling of data science projects more transparent.

The article first deals with the basics of project controlling and presents some KPIs from the software industry. Next, we use the presented KPIs and analyse them regarding data science projects using the process model CRISP-DM (CRoss-Industry Standard Process for Data Mining). The analysis also includes personal project experiences from data science projects.

If you want to learn more about CRISP-DM, we recommend reading our article on CRISP-DM.

Project control is a central component of project management and is used to support project decisions. It has the following objectives (cf. [1], p. 334):

  • Coordination of project objectives
  • Supporting project management and identifying deviations from the plan and their causes
  • Evaluation of risks and initiation of measures to control or reduce risks.

The project controllers use instruments to control the project, including a cockpit. IT projects are usually managed on the basis of a project order. A project description usually defines the scope of the project and the expected project goals on the part of the client. However, the project description often only contains rough descriptions, which is why refinement is necessary. Finally, we need to derive a roadmap with milestones and resource and cost planning. The milestones are essential in project controlling because they determine the timing of reviews and thus facilitate the classification of the work status in the overall project.

The KPIs to support project controlling are the main focus of this article. Experienced project managers usually rely on clearly defined and easily interpretable KPIs. In the next section we present a concept for deriving meaningful KPIs for IT projects.

The concept Balanced Scorecard (short: BSC) has proven itself in practice as a holistic approach for the development of KPIs. The BSC is a balanced system of KPIs. Essential to the concept is a concretisation of the phenomenon of success as well as the integration of non-monetary indicators. (cf. [1], p. 374). The following four perspectives are considered in the concept (cf. [1], p. 375):

  • Finance: Which financial aspects do we have to consider? Which key figures are needed?
  • Customer: What is the customer’s expectation and attitude towards the project?
  • Internal processes: How efficient are internal processes regarding costs, time and quality?
  • Learning and development: How purposefully are employees deployed?

These four perspectives create the framework for the Balanced Scorecard, but they can also be extended, for example, by project results. (cf. [1], p. 374).

First, we have to derive the perspectives for the project. Then we can finally define the project goals and requirements. Subsequently, we derive the following agreements for each perspective of the Balanced Scorecard (cf. [1], p. 376):

  • Definition of the objectives pursued by the project under the respective perspective.
  • Assignment of key figures
  • Definition of targets for each key figure
  • Measures in case of deviations

Finally, we can derive key figures for each perspective based on critical success factors and strategic goals.

Critical success factors from the financial perspective are, for example, project budget, project costs and project benefits (ROI of the IT project). From the customer perspective, these are, for example, customer satisfaction, acceptance of the project work by the stakeholders and use of the software solution by the customer. From the internal processes perspective, critical success factors are, for example, project progress (milestones), productivity of the activities and innovation support of the processes. The critical success factors from the employee perspective can be, for example, the satisfaction of the project employees, the qualification of the project employees and team development.

Key Performance Indicators

In this section, we define some important KPIs for each perspective.

Finance:

KPI: Budget compliance (cf. [1], p. 378)

  • Target: Deviation under 10%
  • Unit: Number in percent
  • Formula: Actual budget/target budget

KPI: Return on investment (ROI of the project) (cf. [1], p. 381)

  • Goal: To transparently present and communicate the benefits of the IT project.
  • Unit: Absolute number
  • Formula: Determination according to the utility value analysis (criteria x weight)

Customer:

KPI: Customer Satisfaction Index (cf. [1], p. 381)

  • Goal: Satisfaction index of at least 98%
  • Unit: Number in percent
  • Formula: Result from customer survey

Internal processes:

KPI: Adherence to deadlines in the project (cf. [1], p. 382)

  • Goal: Keep to 99% of agreed deadlines
  • Unit: Number in percent
  • Formula: Actual project duration/forecast project duration

KPI: Project plan deviation (cf. [2])

  • Goal: Complete project on schedule
  • Unit: Absolute number
  • Formula: Achieved value – Planned value

KPI: Realisation value (cf. [2])

  • Goal: Achieve a positive realisation value
  • Unit: Absolute number
  • Formula: Number of planned hours – number of paid hours

Learning and development:

KPI: Staff satisfaction index (cf. [1], p. 381)

  • Goal: Ensure high satisfaction of project staff
  • Unit: Number in percent
  • Formula: Result from employee survey

In this section, we discussed some KPIs from a classic IT project. The list is not complete, because every IT project has different KPIs. Next, we will analyse the KPIs with regard to a data science project and introduce new KPIs.

In this section, we analyse which KPIs we can use in data science projects compared to classic IT projects. Furthermore, we present additional KPIs for data science projects. Data science projects often run according to the CRISP-DM process model. For this reason, we use this process model to analyse which KPIs can be used at which point in the data science process.

The phases of the CRISP-DM process model (see figure) are Business Understanding, Data Understanding, Data Preparation, Modelling/Evaluation and Deployment.

CRISP-DM Process Model (see [1])
CRISP-DM Process Model (see [3])

Business Understanding

At the beginning of a data science project, it is often not clear what potential a data analysis has for the client. The first step is to find a suitable use case. In practice, an initial workshop is often held at the beginning of a project to identify use cases with high business potential. It should also be noted that the question and the data must fit together. In this context, data scientists should already be involved, as they can better assess possible potential based on possible data sources. Once you have found the use case, you can determine KPIs to measure the project success. In this context, it is essential to consider not only classic controlling KPIs but also the measurement of real business value (ROI), such as the reduction of machine downtime by 30%. At this stage, it should be clear what utility value we want to achieve. In this phase, we define the use case and the associated goals. The goals become measurable through the definition of KPIs. Therefore, the goals should also be set in relation to the machine learning model to be developed. You can control the expected model quality using various metrics (e.g. accuracy, precision recall, F1 score, AUC score, confusion matrix or lift factor). At this stage, you can set a minimum requirement. We refer to these KPIs as Data Science KPIs in this article, and you can use them in addition to the classical KPIs.

Data Understanding

The data basis provides the foundation for a successful data science project. In this step, a data scientist must obtain an overview of the data and assess its quality. But what is meant by data quality? Data quality describes how well the data is suited to the use case. A data scientist checks the data quality using criteria such as correctness, relevance, completeness, consistency and availability. The focus of a data scientist should always be on data quality and not only on data quantity. After an initial assessment by a data scientist, there is usually an estimation of effort and budget planning.

Data Preparation

In data preparation, we create a final data set for analysis from the raw data. Data preparation takes up about 50–70% of the total project duration. For this reason, it is important to adjust the project controlling with regard to this. During data preparation, close communication with the client is necessary so that domain-specific aspects can be taken into account in the data preparation. It must be clarified how to deal with missing values or outliers. In this phase, special attention should be paid to the KPI realisation value, as the workload in this phase is usually very high. You should ensure that the time spent does not exceed a certain upper limit of, for example, 60% of the total hours. However, you have to choose this upper limit specifically for the project. Furthermore, you should ensure that the project plan is adhered to (KPI project plan deviation).

Modeling / Evaluation

In the modelling phase, appropriate analysis methods are selected and implemented. The analysis methods can be simple statistical procedures or complex machine learning procedures. Subsequently, the analysis results are evaluated and compared with the objectives. These phases are iterative, as you can always switch back to the modelling phase to adjust the model concerning the evaluation metrics. These two phases serve as a feasibility study. At this point, it’s clear whether a data science project achieves the set goals. In this phase, you should pay attention to the Data Science KPIs defined in the business understanding.

Deployment

In this phase, the developed model is deployed in an IT infrastructure when the company has an operational added value. Deployment is the key to using data profitably in the long term!

The CRISP-DM process model shows that there are several other KPIs in data science projects. In data science projects, the result of the evaluation of the models is decisive for the project’s success.

In this section, we discuss the following thesis based on the analysis results and personal experiences:

“Conventional key performance indicators from the IT environment are not sufficient in data science projects”.

Currently, the media describe data science as a “magic bullet” that can solve many problems. That is also the reason why many service companies want to develop further in this area. The service providers advertise their competence in the data science field and enter into negotiations with potential customers. First, the service provider conducts a potential analysis. The potential analysis shows whether the project generates added value for the client. A data science project will start if the potential analysis was successful. From our experience, many data science projects use the CRISP-DM process model. Experienced project managers from the IT industry often lead the projects. The project manager runs the project as a classical IT project. The project managers use KPIs for project controlling, with which they have had a good experience.

From our experience, these projects can be successful or fail. The reason for this is that the project managers often do not have an understanding of specific data science KPIs. We think that specific data science KPIs make the project more transparent for all stakeholders. It is essential to define quantifiable goals so that you can measure the success of a data science project. From our experience, it is also helpful if the project manager already has an understanding of data science. The communication and controlling will be more transparent if that is so. Furthermore, it is useful to divide data science projects into phases. You can use the CRISP process model, so it is clear where you are in the project.

The analysis showed that you should use KPIs from the IT environment in data science projects. Examples are budget compliance and project benefit (ROI). It is essential to define the ROI so that you can measure the real benefit of the project. From our experience, it is always helpful to implement a prototype very quickly that already impressively demonstrates the ROI. This prototype makes the client more relaxed, and the motivation to further optimise the prototype increases on all sides. In addition, it creates a better working atmosphere, which increases the staff satisfaction index. However, we have also learned from our previous projects that you should not manage a data science project like a classical IT project. Other aspects also play a role. You have to integrate data science KPIs into the controlling process. The data science KPIs ensure greater transparency in the development. In addition, the project manager should be familiar with the data science KPIs for a successful data science project. We can agree with the thesis because we have to use specific data science KPIs in data science projects.

This article illustrates the use of KPIs in data science projects using the CRISP-DM process model. The use of KPIs is essential in data science and in IT projects. In addition, it is helpful for data science projects if the project manager has a data science background. There are some new KPIs in data science, especially related to model quality.

Finally, it should be noted that the project managers in data science projects are still in a finding phase for the best KPIs. The experiences of the project managers and successful project implementations will ensure that further KPIs emerge that can contribute to the successful project controlling of data science projects.

👉🏽 Join our free weekly Magic AI newsletter for the latest AI updates!



Source link

Leave a Comment