Software Engineering Best Practices for Writing Maintainable ML Code | by Hennie de Harder | Aug, 2023


A data scientist who is lost in a forest full of code. Related to the second and last tip. Image created with Midjourney by the author.

Advanced coding tips for data scientists

Unlike traditional software engineering projects, ML codebases tend to lag behind in code quality due to their complex and evolving nature, leading to increased technical debt and difficulties in collaboration. Prioritizing maintainability is important to create robust ML solutions that can adapt, scale, and deliver value over time.

In recent years, machine learning has taken the world by storm, transforming industries from healthcare to finance and more. As more organizations jump on the ML bandwagon to discover new possibilities and insights, the significance of writing maintainable and robust ML code becomes crucial. By crafting ML code that’s easy to work with and stands the test of time, teams can collaborate better and guarantee success as models and projects grow and adapt. The following section will show common examples from ML codebases and explain how to handle those properly.

This tip is probably irrelevant for you, but it’s written for the single person who is not aware of this (until now)!

Monolithic scripts, a.k.a. a single script for the whole project, may arise when you reuse your experimental code in production. Copy, paste, done! It’s always a bad idea to create one single script for a project. It’s difficult to read (even for the writer), hard to debug and inefficient. You can’t easily add new features or modify the code, because every time the whole thing has to run. Adding unittests is impossible as well, because the monolith is ‘the whole unit’.

Another problem with a single script is reusability. You can’t reuse the code in other projects, because it’s so hard to read.

There is only one reason to write a monolith; that is if you don’t like the colleague who takes over your work. If you want to get this person frustrated, it’s an easy way to accomplish that.

What to do instead? Write modules and classes. Create different code files that have one specific purpose. Every file should contain functions or classes and methods. By doing this, the code becomes way easier to read, debug, reuse and test. In the next…



Source link

Leave a Comment