Can Synthetic Data Boost Machine Learning Performance? | by John Adeojo | Jul, 2023

We assess the performance of each model by plotting the precision versus recall curves of the models against the holdout dataset.

Precision-Recall Curve

The Precision-Recall curve, a plot of Precision (on the y-axis) against Recall (on the x-axis) for varying thresholds, is akin to the ROC curve. It serves as a robust diagnostic tool for evaluating model performance in scenarios of significant class imbalance, such as our credit card fraud detection use case, a prime example.

The top-right corner of the plot represents the “ideal” point — a false positive rate of zero and a true positive rate of one. A skilled model should reach this point or come close to it, implying a larger area under the curve (AUC-PR) can suggest a superior model.

No Skill Predictor

A “no skill” predictor is a naïve model that makes predictions randomly. For imbalanced datasets, the no skill line is a horizontal line at a height equivalent to the positive class proportion. This is because if the model randomly predicts the positive class, precision would be equivalent to the positive instances proportion in the dataset.

Source link

Leave a Comment