3 Silent Pandas Mistakes You Should Be Aware Of | by Soner Yıldırım | Aug, 2023


And how they can cause hidden failures

Photo by Malik Earnest on Unsplash

“The mistakes of the fool are known to the world, but not to himself. The mistakes of the wise man are known to himself, but not to the world.” — Charles Caleb Colton

Not knowing the mistakes we make in programming does not necessarily make us a fool. However, it may result in undesired consequences.

Some mistakes shine like a diamond and can be recognized from miles away. Even if you don’t notice them, compilers (or interpreters) inform us about them by raising errors.

On the other hand, there exist some “silent” mistakes that are hard to notice but have the potential to cause serious issues.

They don’t result in any errors but make the function or operation to execute things in a different way than you think it would. Hence, the outcome changes without you noticing.

We’ll learn about three of such issues.

You’re a data analyst working at a retail company. You’ve been asked to analyze the results of a recently run series of promotions. One of the tasks in this analysis is calculating the total sales quantities for each promotion and the grand total.

Let’s say the promotion data is stored in a DataFrame that looks like the following (definitely not this small in real life):

promotion DataFrame (image by author)

And here is the Pandas code to create this DataFrame if you’d like to follow along and do the examples on your own:

import pandas as pd

promotion = pd.DataFrame(
{
"promotion_code": ["A2", "A1", "A2", "B1", "A2", None, "A2", "B1", None, "A1"],
"sales_qty": [34, 32, 26, 71, 44, 27, 64, 33, 45, 90],
"price": [24.5, 33.1, 64.9, 52.0, 29.0, 47.5, 44.2, 25.0, 42.5, 30.0]
}
)

Calculating the total sales quantity per promotion code is a piece of cake. You just need to use the groupby function:

promotion.groupby("promotion_code").agg(…



Source link

Leave a Comment