Batch inference: Running BERT on 10k phrases.

At work, we often develop Deep Learning model to be used on large batches of data.

To see if Rust can improve this usecase, I trained a BERT-like model and infered 10k phrases using Python and Rust.

Performance

10k phrasesPythonRust
Booting4s1s
Encoding0.7s0.3s
DL Inference75s75s
Total80s76s
Memory usage1 GiB0.7 GiB

As DL inference is taking the majority of the time, Rust will only marginely improve performance.

This is an example of a bad use case for Rust as time is consumed in the C API which does not get affected by Rust.

You can check out the code for this specific job at: https://github.com/haixuanTao/bert-onnx-rs-pipeline

github GitHub stars