The Angry Guide To Pandas Code (Dummy Edition, Part 1)

data science
pandas code
non-pessimization
Author

DS Rants

Published

April 28, 2025

Alright! You apparently stumbled by accident on this page after wrongly clicking on the banner: “There are single people in your area”.

Well now, you have at least one chance to do something productive today! So put on your best nerd reading glasses and listen.

Today, you’re going to learn how to write pandas code that does not look like the experimentation of a toddler smashing on the keyboard because it prints stuff on the screen. You’re going to move one inch closer to the golden age of civilization. So buckle up, stop thinking for your own good, and read carefully!

Actually, before we even start with code… NO FUCKING FOR LOOPS!

  • “But…”
  • “Did I stutter? NO FUCKING FOR LOOPS”
  • “But what if …”
  • “You inept little twerp, are you trying to aggravate me from the start?”

Anyway, we will start with the usual suspects, and a stupid-ass dataframe:

import pandas as pd
import numpy as np

original_df = pd.DataFrame(
    [
        [1, 1, 1, "2024_01"],
        [1, 2, 3, "2024_02"],
        [5, 0, 1, "2024_03"],
    ],
    columns=[0, 1, 3, "year_week"],
)

Looks stupid enough? Good, I don’t want you distracted one bit by any type of complexity. Imagine you have been working on it. What you’re probably seeing in your average notebook probably looks a lot like the following. I am sure of it because I captured this in the wild:

The Original Sin Of Pandas

# fmt: off
weekly_chart_df = original_df.reset_index()
weekly_chart_df.rename(columns={0 : 'category_A', 1 : 'category_B', 3 : 'category_C'}, inplace=True)
weekly_chart_df.year_week = weekly_chart_df.year_week.astype(str)
weekly_chart_df['total_produced'] = weekly_chart_df.category_A + weekly_chart_df.category_B + weekly_chart_df.category_C # total number of produced goods
weekly_chart_df['total_tested'] = weekly_chart_df.category_B + weekly_chart_df.category_C # total number of tested goods
weekly_chart_df['prop_tested'] = np.round(weekly_chart_df.total_tested / weekly_chart_df.total_produced *100,2) # proportion of goods tested
weekly_chart_df['prop_category_B'] = np.round(weekly_chart_df.category_B / weekly_chart_df.total_produced *100,2) # proportion of category_B
weekly_chart_df['prop_category_C'] = np.round(weekly_chart_df.category_C / weekly_chart_df.total_produced *100,2) # proportion of category_C
weekly_chart_df = weekly_chart_df[weekly_chart_df["total_tested"] < 5]
# fmt: on
# fmt: off
df = original_df.reset_index()
df.rename(columns={0 : 'category_A', 1 : 'category_B', 3 : 'category_C'}, inplace=True)
df.year_week = df.year_week.astype(str)
df['total_produced'] = df.category_A + df.category_B + df.category_C # total number of produced goods
df['total_tested'] = df.category_B + df.category_C # total number of tested goods
df['prop_tested'] = np.round(df.total_tested / df.total_produced *100,2) # proportion of goods tested
df['prop_category_B'] = np.round(df.category_B / df.total_produced *100,2) # proportion of category_B
df['prop_category_C'] = np.round(df.category_C / df.total_produced *100,2) # proportion of category_C
df = df[df["total_tested"] < 5]
weekly_chart_df = df
# fmt: on

Looks familiar? Do you feel this conundrum where either your names are long and things are a pain to write? Or you abuse the ignominious df name all over the place and everything is called the same?

Well, I am sorry to tell you that both examples up there are pure garbage! Actually, I am not even sorry. This stuff has been vomited by someone without any consideration for their fellow humans. You don’t believe me?

Did you by any chance happen to notice the little cabalistic signs around the code: fmt: off / fmt: on? Do you know what it means? It means I have to purposefully block my formatter from touching this pile of intellectual dump, otherwise, it becomes so unreadable that its ungodly nature becomes clear to all. Let me prove it to you:

weekly_chart_df = original_df.reset_index()
weekly_chart_df.rename(
    columns={0: "category_A", 1: "category_B", 3: "category_C"}, inplace=True
)
weekly_chart_df.year_week = weekly_chart_df.year_week.astype(str)
weekly_chart_df["total_produced"] = (
    weekly_chart_df.category_A
    + weekly_chart_df.category_B
    + weekly_chart_df.category_C
)  # total number of produced goods is the lot
weekly_chart_df["total_tested"] = (
    weekly_chart_df.category_B + weekly_chart_df.category_C
)  # total number of tested goods is the lot
weekly_chart_df["prop_tested"] = np.round(
    weekly_chart_df.total_tested / weekly_chart_df.total_produced * 100, 2
)  # proportion of goods tested on the lot
weekly_chart_df["prop_category_B"] = np.round(
    weekly_chart_df.category_B / weekly_chart_df.total_produced * 100, 2
)  # proportion of category_B in the lot
weekly_chart_df["prop_category_C"] = np.round(
    weekly_chart_df.category_C / weekly_chart_df.total_produced * 100, 2
)  # proportion of category_C in the lot
weekly_chart_df = weekly_chart_df[weekly_chart_df["total_tested"] < 5]

Understood? Now that the pretended pseudo-manual structure is gone, there is nothing left but chaos? So do yourself a favor, just install ruff or black, because if you write something, format it, and it looks worse, there is a 99.999% chance that what you wrote should be covered with Napalm and set on fire.

Now that we agree on the horror it actually is, it is time to fix it.

How To See The Light In Darkness

First, you’re going to remove ALL inplace=True from your whole code base! Did I stutter? I said all of them! This wretched abomination should have never seen the sunlight:

For those who are to lazy to click

Unfortunately, pandas is almost 20 years old, and mistakes were made during the young years. Sadly enough, this mistake still haunts us to this day because some people are clearly not able to perform a simple Google search. We are still stuck with this because removing it would break half of the world’s code bases, because of people like you. People who think they are being clever, when in fact they’re just performing a major enshittification of the code at a large scale.

No inplace=True EVER! I mean it!

So, the first line becomes:

weekly_chart_df = original_df.reset_index().rename(
    columns={0: "category_A", 1: "category_B", 3: "category_C"}
)

See? It’s not beautiful yet but it is certainly a step in the right direction?

The next thing you will do is transform the astype call to use it at the dataframe level, in the shape: astype({"column": <type>}), because we are not beasts. Not so goddamn fast you inconsequential stardust residue! I am sure you were about to type something ludicrous like:

p_chart_weekly = p_chart_weekly.astype({"year_week": str})

You, you’re trying to make me mad, aren’t you? Put that shit with the previous calls, and lose the name of the dataframe you don’t need it.

weekly_chart_df = original_df.reset_index().rename(
    columns={0: "category_A", 1: "category_B", 3: "category_C"}
).astype({"year_week": str})

See this shit? This is called method chaining, now repeat after me: method chaining. This is the sacred beauty that was sent upon us mere mortals to allow writing data transformations that do not look like complete crap. You will only write your pandas this way from now on! Did you hear me? And I will show you later how other languages have managed to do way better than Python and pandas in that regard. And these people were rightfully laughing at your clown ass, when you were happy writing some stupid shit like df = df.some_fucking_transformation Do you start to understand? This is why I forbade your sorry brain to ever use the option inplace=True, because it breaks the… you’re goddamn right, method chaining!

From this point, you are one keyboard shortcut away from poetry. You don’t believe me? Did you install an automatic code formatter as I so generously told you earlier? You, infected chromosomal deficiency! Ok, it appears you need another round of scolding. You need a formatter like ruff or black because your inept brain will not be able to remain consistent to format your whole project in a standard way. You don’t want to waste any part of your limited brain power on stupid shit like this. Just use the goddamn tools! Some talented person might pretend: “It looks prettier when I do it myself, and I manage to remain completely consistent across my whole project”, but you are not some person…

weekly_chart_df = (
    original_df.reset_index()
    .rename(columns={0: "category_A", 1: "category_B", 3: "category_C"})
    .astype({"year_week": str})
)

See how beautiful this is? Now you’re starting to believe. Ok, it’s time to introduce you to a concept you have never encountered: (). Those are called parentheses, and they are your new best friend!

Whenever you want to write a transformation, BEFORE you even type the name of your stupid dataframe, which is probably stupid df anyway, the absolute FIRST THING you will do is open a pair of parentheses. Why? Because each time you’re done with one call for a transformation, you can just press enter, the code will auto-indent, and you can continue whatever sorry exploration you’re attempting. Is it so much better than your typical awful backslashes, don’t lie, I know you have been abusing them…

(
    data
    .transformation_1()
    .filtering()
    .create_columns()
    .groupby()
    .agg()
)

Understood? Good!

The Unavoidable Truth Of Data Transformation

“But, sir Rants, how can I possibly continue chaining when I want to create a column?” you ask?

How about a good ol’: Read The Fucking Manual? What? Too old school? Ok, you can probably ask your favorite ChatGPT / copilot / gemini, or whatever shit was the weekly trash hype on LinkedIn this week… But you’re going to do something, you’re going to treat that piece of shit-sorry-for-lying-unreliably-hallucinating LLM like the moronic tool it is! You’re going to prompt it rightfully by explicitly writing:

Using method-chaining, and only method-chaining in pandas, how do I …

And you’re going to insult every generation of GPU that was used to breed that useless LLM until it regurgitates what you want! And do you know why? Because 95% or more people code like you. So the training dataset for good pandas code used by the LLMs is at least an order of magnitude smaller than the crap we have been seeing so far.

Regardless, you should end up with a bingo: .assign(). But you’re going to read the documentation, no matter what:

Behold, the glorious assign method!

Do you notice what I subtly highlighted for your limited brain capabilities? The first use case is a dictionary associating a string with a Callable. For those who skipped 7th grade English, it means a plain stupid function!

Now read the underlined sentence! Read it again! Now look at the examples. Read it again! It means if you use a function in your kwargs, pandas will automatically call this function on the dataframe that generated the assign call, and said dataframe will be the first argument! Do you understand how powerful this is? It means that you just gained a shortcut to reference your current dataframe! How, you ask? Do I really have to explain everything to your sorry ass?

Have you heard of the keyword lambda? What are those? You’re goddamn right: functions! And what do you use functions for? Referencing the current dataframe!

Ding! Ding! Ding!

weekly_chart_with_proportions = (
    original_df.reset_index()
    .rename(columns={0: "category_A", 1: "category_B", 3: "category_C"})
    .astype({"year_week": str})
    .assign(
        total_produced=lambda df: df.category_A + df.category_B + df.category_C,
        total_tested=lambda df: df.category_B + df.category_C,
        prop_category_B=lambda df: (df.category_B / df.total_produced * 100).round(2),
        prop_tested=lambda df: (df.total_tested / df.total_produced * 100).round(2),
        prop_category_C=lambda df: (df.category_C / df.total_produced * 100).round(2),
    )
)

See how much NICER this looks? This is starting to look like someone who knows their job. You have subject, verb one, verb two plus complements, and verb three plus complements. I bet your over-complexing anxiety-riddled brain would have never trusted me if I told you to write nice efficient code, you just need 6th-grade grammar!

Oh, and do you know the AWESOME BONUS I just gave you for free? No more SettingWithCopyWarning. EVER! PERIOD! Because we are never modifying the dataframe in-place, we’re always returning a new instance. And that’s totally fine because we don’t have tons of stupid-ass intermediate variables like df, df1, df2, temp, temp2 that serve no purpose outside supporting your limping transformation logic and eating up all your RAM.

Oh, and by the way, I took the liberty to remove your stupid calls to numpy because guess what? 99% of the time, there is a pandas equivalent that is within reach by just pressing a dumb dot ., so be intelligently lazy!

Now the last part, let’s take care of the filter. You were going to keep the original thing, weren’t you?

weekly_chart_df = weekly_chart_df[weekly_chart_df["total_tested"] < 5]

There are 2 ways to filter dynamically a dataframe in pandas. One is using the cursed sin of arbitrary string execution, and the other provides autocompletion and syntax highlighting! Clearly, only a moronic ape high on cocaine, or an agile coach brainwashed on scrum/SAFe bullshit, same difference, would choose the first over the second. Therefore, we will never mention the first abomination again!

Alright, back to the code! Do you know what those brackets [] stand for? If you say getitem you’re on the right track, but in pandas, they are used to call the function loc[]. Yes, squared-brackets here because the pandas API is fundamentally broken for those who experienced at least another language. But we will come back to this in another post.

For now, just read the documentation for the loc[] method. Is there anything that looks familiar? For those with a lightbulb that requires a constant perfusion of coffee to produce an ersatz of a thought, we are looking for a way to filter the current dataframe. Did you see it? The word callable? Are you beginning to understand? Everything I told you so far serves a glorious purpose!

Now, we can finish our business here:

weekly_chart_with_proportions = (
    original_df.reset_index()
    .rename(columns={0: "category_A", 1: "category_B", 3: "category_C"})
    .astype({"year_week": str})
    .assign(
        total_produced=lambda df: df.category_A + df.category_B + df.category_C,
        total_tested=lambda df: df.category_B + df.category_C,
        prop_category_B=lambda df: (df.category_B / df.total_produced * 100).round(2),
        prop_tested=lambda df: (df.total_tested / df.total_produced * 100).round(2),
        prop_category_C=lambda df: (df.category_C / df.total_produced * 100).round(2),
    )
    .loc[lambda df: df.total_tested < 6]
)

Now we have finally entered the age of civilization! Are you feeling proud of yourself? Not so damn fast! Time to see what other people in the R-world have been doing:

weekly_chart_with_proportions <- original_df %>%
  rename(
    category_A = `0`,
    category_B = `1`,
    category_C = `3`
  ) %>%
  mutate(
    year_week = as.character(year_week),
    total_produced = category_A + category_B + category_C,
    total_tested = category_B + category_C,
    prop_tested = round(total_tested / total_produced * 100, 2),
    prop_category_B = round(category_B / total_produced * 100, 2),
    prop_category_C = round(category_C / total_produced * 100, 2)
  ) %>%
  filter(total_tested < 5)

Do you feel the fantastic power of free pipelining with %>% granted by excellent package magrittr? (Python packages really need to get their shit together, their names suck in comparison to R!) Can you understand absolute beauty that was bestowed upon us mere mortals, by Hadley Wickham the great creator of the dplyr and ggplot2 packages? Do you notice the absence of quotation marks around column names thanks to the holiness of lazy evaluation? The R people have been doing this routinely for more than 10 years now!

Do you feel your absolute insignificance with your idiotic transformations riddled with the ubiquitous use of df = df[...]? Are you understanding why, in their eyes, you looked like a babbling, screaming monkey? Good! Time to blow your mind one last time. Have you heard of the method .pipe() in pandas? It’s limited, but it allows you to do some things like the pros! But do not abuse its power!

Besides, look! Look how easy it is to wrap your head around a language you never learned when there is little to no syntactic crap.

Alright, time to wrap up for now.

Take-Home Messages For Dummies

  1. NO FUCKING FOR LOOPS!
  2. Never use inplace=True. Only morons do so!
  3. Do not use backslashes \ like a caveman. Just use a pair of brackets (), type the name of your dataframe, and press Enter.

If you do not follow these simple rules, I will find you, and I will unscrew your head. I will go out of my way to attack you. Even if your dead ass is sitting in the middle of the ocean, I will swim out into the middle of the ocean and friggin eat you! And then, I will bang your tuna boyfriend/girlfriend.

  1. Inside of a chain, you can often reference your current dataframe inside a lambda expression.

    • Perfect for creating new columns with assign().
    • It works really well with loc[] to filter rows on the fly.
    • There are other methods that accept this too.
  2. There is a special place in hell for people who use methods that evaluate generic strings like .query(). A special place! Right next to child molesters and people who talk during movies!

  3. How long can you even chain?

    The R wizards tell us: “You can generally chain 5 to 10 transformation steps”. Beyond that, there is a higher risk for code to become less readable and more convoluted. Then you can just save the result in a nicely named variable describing exactly what your data is representing! Some person says you can do more without tampering with readability, I disagree, and then again, you are not some person…

That’s it for today. We will see more advance shit later. For now, I need to go choke on a bottle of Xanax!

Disclaimer

This rant was largely inspired by an awesome guide for Metal Gear Rising: Revengeance by Vesperas. Its reading was a great source of banter and chuckles. My dear swearing fellow, you have my most sincere gratitude for helping a sucker like me progress at such a great game!