Would have made my life a lot easier when I was learning Pandas.
Would also be cool to have a Polars version of this too.
One suggestion:
A lot of folks come to Pandas from using SQL. It might be handy to have a couple "The equivalent of this SQL statement but in Pandas"
The data world owes a lot to pandas, but it has plenty of sharp edges and using it can sometimes involve pretty close knowledge of how things like indexing/slicing/etc work under the hood.
If I get stuck in polars, its almost always just a "what's the name of the function to use?" type problem rather than needing lots of knowledge about how things are working under the hood.
Pandas on the other hand has been open source for almost two decades, and is supported by many companies. They have a governance board, and an active community. The risk of it going off the rails into corporate nonsense is much lower.
- Pandas is interwoven into downstream projects. So it will be here to stay for a long time. This is good for maintenance and stability. Advantage: Pandas.
- OTOH, the Pandas experience is awful; this was obvious to many from the outset, and yet it persisted. I haven't tracked the history. But my guess would be the competition from Polars was a key pressure for improvement. Edge: Polars.
- Lots of Python projects are moving to Rust-backed tooling: uv, Polars, etc. Front-end users get the convenience of Python and tool-developers get the confidence & capabilities of Rust. Edge: Polars.
- Pandas has a governance structure not tied to one company. Polars does not. (comment above said this) Advantage: Pandas.
But this could change. Polars users could (and may already be?) pressing for company-independent governance.
Wes came to my local PUG in like 2011 or 2012 and did a talk showing it off. It was also the first time I saw jupyter, which was the ipython notebook at the time. I remember being blown away by both, but I so rarely need dataframes that im sure i don’t know what im missing. I mostly only ever use it for manipulating time series data.
When you are still figuring out things step by step, pandas does a lot of heavy lifting for you so you don't have to think about it.
E.g. I don't have to think about timeseries alignment, pandas handles that for me implicitly because dataframes can be indexed by timestamps. Polars has timeseries support, but I need to write a paragraph of extra code to deal with it.
Pandas was never meant to be a technologist's tool. It was meant to be a researcher's tool and was unfortunately coopted to be a technical solution as well. It has not well escaped it's roots.
Pandas is fantastic for doing iterative and interactive research on semi-structured data. It has a lot of QoL facilities and utility functions for seamlessly dealing with exploratory timeseries analytics for in-core data. Data that fits into memory.
For example, I can take two time series and calculate their product:
ts3 = ts1 * ts2
This one line does a huge amount of heavily lifting by automatically aligning the timestamps and columns between the two inputs so that I'm not accidentally multiplying two entries that have the same ordinal but not the same timestamp or column label.
Can I do the same with Polars? Yes, but it comes with exponentially more cognitive overhead. And this is just one example.
Pandas is ultimately a flawed product as it's origin's go back more than a decade where R's dataframe was cutting edge. A lot of innovation happened since then and the API and internals of Pandas mean that certain choices that were made early on are nontrivial to change.
This doesn't change the fact that Pandas is still immensely useful. Eventually perhaps Polars will come close to it, but so far the focus wasn't on interactive use ergonomics unfortunately.
As it stands, I use pandas for research and polars for production systems.
Polars seems to be the most prominent competitor in the Python DataFrame space, and DuckDB appears to pursue an approach similar to SQLite, but columnar.
I am personally working on a solution to a broader problem, which can also be viewed as an alternative to Pandas [2].
[1] https://wesmckinney.com/blog/apache-arrow-pandas-internals/
[2] https://github.com/ronfriedhaber/autark
That being said, if I were to start a new project requiring that kind of work today, I would probably try Polars first. Their greenfield implementation allowed them to get rid of many of the crusty edges of pandas.