Batch inference: Running BERT on 10k phrases.
At work, we often develop Deep Learning model to be used on large batches of data.
To see if Rust can improve this usecase, I trained a BERT-like model and infered 10k phrases using Python and Rust.
Performance
| 10k phrases | Python | Rust |
|---|---|---|
| Booting | 4s | 1s |
| Encoding | 0.7s | 0.3s |
| DL Inference | 75s | 75s |
| Total | 80s | 76s |
| Memory usage | 1 GiB | 0.7 GiB |
As DL inference is taking the majority of the time, Rust will only marginely improve performance.
This is an example of a bad use case for Rust as time is consumed in the C API which does not get affected by Rust.
You can check out the code for this specific job at: https://github.com/haixuanTao/bert-onnx-rs-pipeline