Conclusion

As reading is IO bound, I wanted to make a benchmark of pure performance.

‌

Use Polars if you want a great API.
Use Polars for merging and group by.
Use Polars for single instruction multiple data(SIMD) operation.
Use Native Rust if you’re already familiar with rust generic heap structure like vectors and hashmap.
Use Native Rust for linear mutation of the data with map and fold. You’ll get O(n) scalability that can be parallelized almost instantly with rayon.
Use pandas when performance, scalability, memory usage does not matter.

For me, both Polars and native Rust makes a lot of sense for data between 1Go and 1To.

I’ll invite you to make your own opinion. The code is available here: https://github.com/haixuanTao/dataframe-python-rust