6 Embarrassing Sklearn Mistakes You May Be Making And How to Avoid Them | by Bex T. | Jun, 2023

2️⃣. Judging Model Performance Only By Test Scores

You got a test score over 0.85 — should you be celebrating? Big, fat NO!

Even though high test scores generally mean robust performance, there are important caveats to interpreting test results. First and most importantly, regardless of the value, test scores should only be judged based on the score you get from training.

The only time you should be happy with your model is when the training score is higher than the test score, and both are high enough to satisfy the expectations of your unique case. However, this does not imply that the higher the difference between train and test scores, the better.

For example, 0.85 training score and 0.8 test score suggest a good model that is neither overfit nor underfit. But, if the training score is over 0.9 and the test score is 0.8, your model is overfitting. Instead of generalizing during training, the model memorized some of the training data resulting in a much lower test score.

You will often see such cases with tree-based and ensemble models. For example, algorithms such as Random Forest tend to achieve very high training scores if their tree depth is not controlled, leading to overfitting. You can read this discussion on StackExchange to learn more about the difference between train and test scores.

There is also the case where the test score is higher than the train. If the test score is higher than training even in the slightest, feel alarmed because you made a mistake! The major cause of such scenarios is data leakage, and we discussed an example of that in the last section.

Sometimes, it is also possible to get a good training score and an extremely low testing score. When the difference between train and test scores is huge, the problem will often be associated with the test set rather than overfitting. This might happen by using different preprocessing steps for the train and test sets or simply forgetting to apply preprocessing to the test set.

In summary, always examine the gap between train and test scores closely. Doing so will tell you whether you should apply regularization to overcome overfitting, look for possible mistakes you made during preprocessing or the best-case scenario, prepare the model for final evaluation and deployment.

Source link

Leave a Comment