Can Synthetic Data Boost Machine Learning Performance? | by John Adeojo | Jul, 2023

We assess the performance of each model by plotting the precision versus recall curves of the models against the holdout dataset.

Precision-Recall Curve

The Precision-Recall curve, a plot of Precision (on the y-axis) against Recall (on the x-axis) for varying thresholds, is akin to the ROC curve. It serves as a robust diagnostic tool for evaluating model performance in scenarios of significant class imbalance, such as our credit card fraud detection use case, a prime example.

The top-right corner of the plot represents the “ideal” point — a false positive rate of zero and a true positive rate of one. A skilled model should reach this point or come close to it, implying a larger area under the curve (AUC-PR) can suggest a superior model.

No Skill Predictor

A “no skill” predictor is a naïve model that makes predictions randomly. For imbalanced datasets, the no skill line is a horizontal line at a height equivalent to the positive class proportion. This is because if the model randomly predicts the positive class, precision would be equivalent to the positive instances proportion in the dataset.

