Ludwig — A “Friendlier” Deep Learning Framework | by John Adeojo | Jun, 2023


The Ludwig API allows you to architect fairly complex and customisable models declaratively. Ludwig does this through a .yaml file. Now, I appreciate many data scientists reading this might not have used .yaml files, but generally in software development these are used for configuration. The files might appear scary at first glance, but they are quite friendly. Let’s step through the main parts of the file I create to build the model.

Image by Author: Model architecture

Before we delve into the configurations, it’s worth briefly introducing the architecture at the heart of Ludwig’s deep learning framework: the Encoder, Combiner, and Decoder. Most of the models you configure in Ludwig will predominantly adhere to this architecture. Understanding this can simplify the process of stacking components to quickly build your deep learning models.

Declaring your Model

Right at the top of the file you declare the model type used. Ludwig provides two options, tree-based models, and deep neural networks for which I chose the latter.

model_type: ecd

Declaring your data splits

You can split data sets natively by declaring your split percentages, type of split, and column or variable you’re splitting on. For my purposes I wanted to ensure that a store could only appear in one of the data sets, hash splitting was perfect for that.

For best practice, I would probably advise constructing a holdout set outside of the Ludwig API especially where you are doing some initial feature engineering like one-hot-encoding or normalisation. This should help prevent data leakage.

model_type: ecd
split:
type: hash
column: Store_id
probabilities:
- 0.7
- 0.15
- 0.15
#...omitted sections...

Declaring the Model Inputs

You declare inputs by name, type, and encoder. Depending on the type of input to the model you have a variety of options for encoders. Essentially encoders are a way of transforming your inputs such that it can be interpreted by the model. The choice of encoder really depends on the data and the modelling task.

model_type: ecd
split:
type: hash
column: Store_id
probabilities:
- 0.7
- 0.15
- 0.15
input_features:
- name: Sales
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: Order
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: Discount
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: DayOfWeek
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: MonthOfYear
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: Holiday
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: Store_Type
type: category
encoder: dense
- name: Location_Type
type: category
encoder: dense
- name: Region_Code
type: category
encoder: dense
#...omitted sections...

Declaring the Combiner

Combiners, as the name suggests, amalgamate the outputs of your encoders. The Ludwig API offers an array of different combiners, each with its own specific use case. The choice of combiner can depend on the structure of your model and the relationships between your features. For instance, you might use a ‘concat’ combiner if you want to simply concatenate the outputs of your encoders, or a ‘sequence’ combiner if your features have a sequential relationship.

model_type: ecd
split:
type: hash
column: Store_id
probabilities:
- 0.7
- 0.15
- 0.15
input_features:
- name: Sales
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: Order
type: sequence
encoder: stacked_cnn
reduce_output: null
# ... omitted sections ...

- name: Location_Type
type: category
encoder: dense
- name: Region_Code
type: category
encoder: dense
combiner:
type: sequence
main_sequence_feature: Order
reduce_output: null
encoder:
# ... omitted sections ...

As with many aspects of deep learning, the optimal choice of combiner often depends on the specifics of your dataset and problem, and may require some experimentation.

Declaring the Model Outputs

Finalising your network is as simple as declaring your outputs, which are just your labels. My pet peeve with Ludwig for timeseries is that you can’t (yet) declare timeseries outputs. As I mentioned previously, you have to “hack” it by declaring each point in your time series separately. This left me with thirty separate declarations, which looks very messy in all honesty. For each output you can specify the loss function too adding additional configurability. Ludwig has a myriad of options pre-built for different output types, however I don’t know if you are able to implement custom loss functions as you can with Pytorch.

model_type: ecd
split:
type: hash
column: Store_id
probabilities:
- 0.7
- 0.15
- 0.15
input_features:
- name: Sales
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: Order
type: sequence
encoder: stacked_cnn
reduce_output: null
# ...omitted sections...

- name: Location_Type
type: category
encoder: dense
- name: Region_Code
type: category
encoder: dense
combiner:
type: sequence
main_sequence_feature: Order
reduce_output: null
encoder:
type: parallel_cnn
output_features:
- name: Order_sequence_label_2019-05-02
type: number
loss:
type: mean_absolute_error
- name: Order_sequence_label_2019-05-03
type: number
loss:
type: mean_absolute_error
#...omitted sections...

type: mean_absolute_error
- name: Order_sequence_label_2019-05-30
type: number
loss:
type: mean_absolute_error
- name: Order_sequence_label_2019-05-31
type: number
loss:
type: mean_absolute_error
#...omitted sections...

Declaring the Trainer

The trainer configuration in Ludwig, while optional due to Ludwig’s provision of sensible defaults, allows for a high degree of customisation. It gives you control over the specifics of how your model is trained. This includes the ability to specify the type of optimiser used, the number of training epochs, the learning rate, and criteria for early stopping, among other parameters.

model_type: ecd
split:
type: hash
column: Store_id
probabilities:
- 0.7
- 0.15
- 0.15
input_features:
- name: Sales
type: sequence
encoder: stacked_cnn
reduce_output: null
- name: Order
type: sequence
encoder: stacked_cnn
reduce_output: null
# ...omitted sections...

- name: Location_Type
type: category
encoder: dense
- name: Region_Code
type: category
encoder: dense
combiner:
type: sequence
main_sequence_feature: Order
reduce_output: null
encoder:
type: parallel_cnn
output_features:
- name: Order_sequence_label_2019-05-02
type: number
loss:
type: mean_absolute_error
- name: Order_sequence_label_2019-05-03
type: number
loss:
type: mean_absolute_error
#...omitted sections...

type: mean_absolute_error
- name: Order_sequence_label_2019-05-30
type: number
loss:
type: mean_absolute_error
- name: Order_sequence_label_2019-05-31
type: number
loss:
type: mean_absolute_error
trainer:
epochs: 200
learning_rate: 0.0001
early_stop: 20
evaluate_training_set: true
validation_metric: mean_absolute_error
validation_field: Order_sequence_label_2019-05-31

For your particular use case, you might find it beneficial to define these parameters yourself. For instance, you might want to adjust the learning rate, or the number of epochs based on the complexity of your model and the size of your dataset. Similarly, early stopping can be a useful tool to prevent overfitting by halting the training process if the model’s performance on a validation set stops improving.

Train your Model

Training your model can be easily done with Ludwig’s python expermiment API. See the script example below:

Other Configurations

Outside of those I mentioned, Ludwig has a myriad of possible configurations. They are all very well documented and well structured. I would advise having a read of their documentation to familiarise yourself.



Source link

Leave a Comment