# Okay, You’ve Trained the Best Machine Learning Model. What’s Next? | by Albers Uzila | Jun, 2023

Table of Contents· Initialize a Repository· Migrate Your Codebase∘ config/config.py∘ config/args.json∘ tagolym/utils.py∘ tagolym/data.py∘ tagolym/train.py∘ tagolym/predict.py∘ tagolym/evaluate.py∘ tagolym/main.py· Package Your Codebase· Setup Data Source Credential· Run Your Pipeline· Miscellaneous· Push Your Project to GitHub· Wrapping Up

Let’s say you’re building a data science project, maybe for work, college, portfolio, hobby, or whatever it is. You’ve spent your days solving a problem statement and experimenting with Jupyter notebooks. Now, you’re wondering, “How do I deploy my work as a useful product?”.

To be concrete, assume you have a website that hosts forums. Users can add tags to a thread in a forum to ease navigating between forums with different topics. You want to better the user experience by suggesting predefined tags hence giving context to what the discussion is about.

The forum can be anything, so let’s be more specific; it always starts with a post explaining a math problem, followed by thoughts, questions, hints, or answers around it. Below is what a thread looks like and its 3 tags, i.e. induction, combinatorics unsolved, and combinatorics.

At this point, you’ve done everything in your notebooks, from understanding the problem statement, defining metrics, querying data, cleaning it, preprocessing, EDA, building a model, to evaluating and optimizing the model.

You notice there are so many posts with a huge total number of tags. So to simplify, you filter only 10 tags. The models you develop are simple linear classifiers (SVM, logistic regression, etc.) preceded by TF-IDF vectorization, and you train them with Stochastic Gradient Descent (SGD).

While notebooks are great and can help you conduct experiments very fast, they’re not production-ready and are sometimes hard to maintain. So, you need to migrate your codes into individual Python files, and then step-by-step add other utilities while collaborating with your team members.

This story will guide you to do exactly that with bite-sized simple steps. Before that, you might want to refresh your mind about linear models, TF-IDF, and SGD:

First, let’s create a new repo on GitHub called tagolym-ml, complete with README.md, .gitignore, and LICENSE.

To work with the repo, do these steps:

1. Clone the repo and a folder named tagolym-ml will be created.
2. Change the working directory into this folder.
3. Build a virtual environment called venv.
4. Activate the environment.
5. Upgrade pip.
6. Optionally, you can check the packages currently installed in your environment using pip list, there will be pip and setuptools.
7. Create a new git branch called code_migration and switch to it.
8. Create a file setup.py.
9. Make some new folders named config, tagolym, and credentials.
10. Create files config.py and args.json inside the config folder.
11. Create files main.py, utils.py, data.py, train.py, evaluate.py, and predict.py inside the tagolym folder.

If you don’t understand how to do those, don’t worry. Here are all the commands you need that you can run on your favorite terminal:

$git clone https://github.com/dwiuzila/tagolym-ml.git$ cd tagolym-ml$python3 -m venv venv$ source venv/bin/activate$python3 -m pip install --upgrade pip$ pip listPackage    Version---------- -------pip        23.1.2setuptools 58.0.4$git checkout -b code_migration$ touch setup.py$mkdir config tagolym credentials$ touch config/config.py config/args.json$cd tagolym$ touch main.py utils.py data.py train.py evaluate.py predict.py\$ cd ..

You now have a git repo in your local connected to the remote repo in GitHub. The local repo directories will look like this.

config/├── args.json        - preprocessing/training parameters└── config.py        - configuration setupcredentials/         - keys and passwordstagolym/├── data.py          - data processing components├── evaluate.py      - evaluation components├── main.py          - training/optimization pipelines├── predict.py       - inference components├── train.py         - training components└── utils.py         - supplementary utilitiesvenv/                - virtual environment.gitignore           - files/folders that git will ignoreLICENSE              - project licenseREADME.md            - longform description of the projectsetup.py             - code packaging

Almost all of these files are empty for now. You’ll fill them in one by one, starting with the folder config.

There are two main folders within your project, i.e. config and tagolym. You want to copy the necessary codes from your notebooks into files inside these folders. Let’s do it.

## config/config.py

Here, you define global variables related to seeds, directories, experiment tracking, preprocessing, and label names.