How Python Enumerations Make Data Configuration Elegant | by Eirik Berge, PhD | May, 2023


Photo by Zac Durant on Unsplash

Python Enums provide a more elegant solution for storing configuration information. Enums (short for enumerations) are essentially a way to define a set of named constants. The best way to understand Enums quickly is to adapt the machine learning configuration code from the previous section. The previous code you had was:

# Configuration for RandomForestClassifier
N_ESTIMATORS_RANDOM_FOREST_CLASSIFIER = 100
MAX_DEPTH_RANDOM_FOREST_CLASSIFIER = 8
N_JOBS_RANDOM_FOREST_CLASSIFIER = 3

# Configuration for AdaBoostClassifier
N_ESTIMATORS_ADA_BOOST_CLASSIFIER = 50
LEARNING_RATE_ADA_BOOST_CLASSIFIER = 1.0

It can now be written as:

from enum import Enum

class RandomForest(Enum):
"""An Enum for tracking the configuration of random forest classifiers."""
N_ESTIMATORS = 100
MAX_DEPTH = 8
N_JOBS = 3

class AdaBoost(Enum):
"""An Enum for tracking the configuration of Ada boost classifiers."""
N_ESTIMATORS = 50
LEARNING_RATE = 1.0

Some Advantages

Let‘s quickly see what advantages Python Enums give us:

  • Each parameter (e.g., MAX_DEPTH) is now stored hierarchically within the model they are used for. This makes sure that nothing is overwritten when more configuration code is introduced. Hence there is also no need for overly long variable names.
  • The different parameters used in RandomForestClassifier are now grouped in the RandomForest Enum. Thus they can be iterated over and collectively analyzed for type safety.
  • Since Enums are classes, they can have docstrings as I have illustrated above. While this might not be strictly needed for this example, in other examples this might clarify what the enumeration is referring to. Having this as a docstring that is coupled with the class rather than a free-flowing comment is a lot better. For one thing, automatic documentation software will now pick up that the docstring belongs to the enumeration. For the free-flowing comment, this would probably just be lost.

If you now want to access the configuration code further down in the script, you can simply write:

from sklearn.ensemble import RandomForestClassifier

RandomForestClassifier(
n_estimators=RandomForest.N_ESTIMATORS.value,
max_depth=RandomForest.MAX_DEPTH.value,
n_jobs=RandomForest.N_JOBS.value
)

This looks clean and is easy to read 😍

Some Python features of Enums

Let’s illustrate some simple features of Python enumerations by looking at the toy example:

from enum import Enum
class HTTPStatusCodes(Enum):
"""An Enum that keeps track of status codes for HTTP(s) requests."""
OK = 200
CREATED = 201
BAD_REQUEST = 400
NOT_FOUND = 404
SERVER_ERROR = 500

Clearly, the information contained within the enumeration is related; they are all about HTTP(s) status codes. Given this, you can now use the following simple code to extract the names and values:

print(HTTPStatusCodes.OK.name)
>>> OK

print(HTTPStatusCodes.OK.value)
>>> 200

Python Enums also allow you to go backward: Given that a status code value is 404, you can find the status code name by simply writing:

print(HTTPStatusCodes(200).name)
>>> OK

You can collectively work with the names/values pairs in an Enum by e.g., using the list constructor list():

print(list(HTTPStatusCodes))
>>> [
<HTTPStatusCodes.OK: 200>,
<HTTPStatusCodes.CREATED: 201>,
<HTTPStatusCodes.BAD_REQUEST: 400>,
<HTTPStatusCodes.NOT_FOUND: 404>,
<HTTPStatusCodes.SERVER_ERROR: 500>
]

Finally, you can pickle and unpickle enumerations in Python. To do this, simply use the pickle model as you would with other familiar objects in Python:

from pickle import dumps, loads
print(HTTPStatusCodes is loads(dumps(HTTPStatusCodes)))
>>> True

This is especially important for configuration code for machine learning models. In that case, there are also well-known libraries like MLflow that saves configuration code elegantly for you.

Sensitive Configuration Code

When using sensitive configuration code (like passwords or access keys), you should NEVER write them explicitly in the code. They should be in a separate file that is ignored by the version control system you are using, and imported into the script.

That does not mean that they can not be stored in Enums within the code. Enums are about the organization, and even censored information can be organized. As an example, here is an enumeration representing the connection to a storage account in Microsoft Azure:

from enum import Enum
import os

class StorageAccount(Enum):
ACCOUNT_NAME = "my account name"
ACCESS_KEY = os.environ.get('ACCESS_KEY')
CONTAINER_NAME = "my container name"

Here the StorageAccount enumeration is fetching the ACCESS_KEY from an environment variable. This can be set in e.g., a .env file. Notice that within the Python script, no sensitive information is revealed. Nevertheless, all the information about the storage account is neatly organized into an enumeration.



Source link

Leave a Comment