Hardware

Initially, onnxruntime-rs did not support GPU / CUDA despite having a C API.

But by tweaking Onnxruntime-rs, I could use the GPU C API and run DL Inference on GPU.

I opened a PR: https://github.com/nbigaouette/onnxruntime-rs/pull/87 providing the CUDA support for Linux and Windows.

And with similar work, a majority of the acceleration hardware could be added actually.

GPU Support

To enable GPU support, I had to:

  • add 2 header files in bindgen's wrapper.h file as follows:
#include "onnxruntime_c_api.h"
#if !defined(__APPLE__)
  #include "cpu_provider_factory.h"
  #include "cuda_provider_factory.h"
#endif
  • Add a feature flag:
[build-dependencies]
cuda = []
  • add a safe API to the newly added bindings:
    /// Set the session to use cpu
    #[cfg(feature = "cuda")]
    pub fn use_cpu(self, use_arena: i32) -> Result<SessionBuilder<'a>> {
        unsafe {
            sys::OrtSessionOptionsAppendExecutionProvider_CPU(self.session_options_ptr, use_arena);
        }
        Ok(self)
    }

    /// Set the session to use cuda
    #[cfg(feature = "cuda")]
    pub fn use_cuda(self, device_id: i32) -> Result<SessionBuilder<'a>> {
        unsafe {
            sys::OrtSessionOptionsAppendExecutionProvider_CUDA(self.session_options_ptr, device_id);
        }
        Ok(self)
    }
  • Generate bindings for Linux:
>>> cargo build --package onnxruntime-sys --features "generate-bindings cuda" --target x86_64-unknown-linux-gnu
  • Generate bindings for Windows through a Windows VM:
>>> cargo build --features "generate-bindings cuda" --target x86_64-pc-windows-msvc
  • Modify github CI for autonomous build test:
      - name: Download prebuilt archive (GPU, x86_64-unknown-linux-gnu)
        uses: actions-rs/cargo@v1
        with:
          command: build
          args: --target x86_64-unknown-linux-gnu --features cuda
      - name: Verify prebuilt archive downloaded (GPU, x86_64-unknown-linux-gnu)
        run: ls -lh target/x86_64-unknown-linux-gnu/debug/build/onnxruntime-sys-*/out/onnxruntime-linux-x64-gpu-1.*.tgz
      # ******************************************************************
      - name: Download prebuilt archive (GPU, x86_64-pc-windows-msvc)
        uses: actions-rs/cargo@v1
        with:
          command: build
          args: --target x86_64-pc-windows-msvc --features cuda
      - name: Verify prebuilt archive downloaded (GPU, x86_64-pc-windows-msvc)
        run: ls -lh target/x86_64-pc-windows-msvc/debug/build/onnxruntime-sys-*/out/onnxruntime-win-gpu-x64-1.*.zip
  • As well as documentation.

Performance

Time per phraseSpeedup
Rust ONNX CPU~125ms
Rust ONNX GPU~10msx12🔥

Note: I have a six cores CPU and a GTX 1050 GPU.

As expected, the GPU drastically reduced the time of inference.

However, I did not found significant speedup between Onnxruntime Rust and Onnxruntime Python.