Batch inference: Running BERT on 10k phrases.
At work, we often develop Deep Learning model to be used on large batches of data.
To see if Rust can improve this usecase, I trained a BERT-like model and infered 10k phrases using Python and Rust.
Performance
10k phrases | Python | Rust |
---|---|---|
Booting | 4s | 1s |
Encoding | 0.7s | 0.3s |
DL Inference | 75s | 75s |
Total | 80s | 76s |
Memory usage | 1 GiB | 0.7 GiB |
As DL inference is taking the majority of the time, Rust will only marginely improve performance.
This is an example of a bad use case for Rust as time is consumed in the C API which does not get affected by Rust.
You can check out the code for this specific job at: https://github.com/haixuanTao/bert-onnx-rs-pipeline