Conclusion

Performance overall

Time(s)Speedup Pandas
Native Rust (Single thread)24 s3.3x
Native Rust (Multithread)13.7 s5.8x
Polars (Single thread)30 s2.6x
Polars (Multithread)17 s4.7x
Polars (lazy, Multithreaded)16.5 s4.8x
Pandas80 s

As reading is IO bound, I wanted to make a benchmark of pure performance.

Performance without Reading

Time(s)Speedup Pandas
Native Rust (Single thread)12 s3.3x
Native Rust (Multithread)1.7 s23x
Polars (Single thread)10 s4x
Polars (Multithread)11 s3.6x
Polars (Lazy, Multithread)11 s3.6x
Pandas40 s

Overall takeaway

  • Use Polars if you want a great API.
  • Use Polars for merging and group by.
  • Use Polars for single instruction multiple data(SIMD) operation.
  • Use Native Rust if you’re already familiar with rust generic heap structure like vectors and hashmap.
  • Use Native Rust for linear mutation of the data with map and fold. You’ll get O(n) scalability that can be parallelized almost instantly with rayon.
  • Use pandas when performance, scalability, memory usage does not matter.

For me, both Polars and native Rust makes a lot of sense for data between 1Go and 1To.

I’ll invite you to make your own opinion. The code is available here: https://github.com/haixuanTao/dataframe-python-rust

github GitHub stars