datapythonista blog - Tag pandas

How fast can we process a CSV file

Marc Garcia | Thu 22 February 2024

Introduction Comma-separated values (CSV) are an extremely popular format to store tabular data because of their simplicity and how easy is to write them. The file can be directly read by a human, as opposed to more efficient binary formats like...

comments

pandas 2.0 and the Arrow revolution (part I)

Marc Garcia | Fri 17 February 2023

Introduction At the time of writing this post, we are in the process of releasing pandas 2.0. The project has a large number of users, and it's used in production quite widely by personal and corporate users. This large use based forces us to be...

comments

pandas with hundreds of millions of rows

Marc Garcia | Thu 22 September 2022

The problem We want to find out which are the top #5 American airports with the largest average (mean) delay on domestic flights. Data We will be using the Data Expo 2009: Airline on time data dataset from the Harvard Dataverse. The data consists...

comments

An update on the pandas documentation

Marc Garcia | Thu 28 November 2019

Some context This post is mainly a technical post on what's the status of the pandas documentation. But let me provide a bit of context on where this comes from. It's a personal opinion, but I think pandas is one of the clearest examples of how...

comments

New pandas workflow

Marc Garcia | Sun 17 November 2019

Some exciting news. After some years of organizing sprints, and maintaining open source, I've been thinking on a more efficient workflow for projects with high volume of activity, like pandas. An exaggerated example would be that I want to create...

comments

Dataframe summit @ EuroSciPy write up

Marc Garcia | Wed 11 September 2019

Last week took place in Bilbao, Spain, EuroSciPy 2019. This year we introduced the maintainers track a room dedicated to discussions among maintainers. The idea is similar to the birds of a feather or unconference sessions of other conferences....

comments

pandas: The two cultures

Marc | Mon 22 July 2019

Leo Breiman was a distinguished statistician at UC Berkeley, known among other things for his major contributions to CART (decision trees), and ensemble techniques, mainly bootstrap aggregation. Combining both, he was able to define one of the...

comments

#pandasSprint write-up

Marc | Thu 22 March 2018

The past 10th of March took place #pandasSprint. To the best of my knowledge, an unprecedented kind of event, where around 500 people worked together in improving the documentation of the popular pandas library. As one of the people involved in...

comments