Filtering

Pandas

There are many ways to do filtering in pandas, the most common way for me is as follows:

df = df[df.country_txt == "United States"]
df.to_csv("python_output.csv")

Rust

To do filtering in Rust, we can refer to the docs for vector in Rust https://doc.rust-lang.org/std/vec/struct.Vec.html

There is a large umbrella of methods for Vector filtering, with many nightly features that are going to be great for data manipulation when they ship. For this use case, I used the retain method as it fitted my need perfectly:

    records.retain(|x| &x.country_txt.unwrap() == "United States");
    let mut wtr =
        csv::Writer::from_path("output_rust_filter.csv")?;

    for record in &records {
        wtr.serialize(record)?;
    }

One big difference between Pandas and Rust is that Rust filtering uses Closures (eq. lambda function in python) whereas Pandas filtering uses Pandas API based on columns. Rust can therefore make more complex filters compared to Pandas. It also adds in readability.

Performance

Time(s)Mem Usage(Gb)
Pandas3.0s2.5Gb
Rust1.6s πŸ”₯ -50%1.7Gb πŸ”₯ -32%

Even though we’re using Pandas API for filtering, we get significantly better performance using Rust.

On Filtering, Rust seems to be more capable and faster. πŸš