A ggml-based C++ implementation of Voxtral Realtime 4B.
Download the pre-converted GGUF model from Hugging Face:
# Default: Q4_0 quantization
./tools/download_model.sh Q4_0Build the project using CMake:
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -jThe model expects 16-bit PCM WAV files at 16kHz (mono). You can use ffmpeg to convert your audio files:
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav./build/voxtral \
--model models/voxtral/Q4_0.gguf \
--audio path/to/input.wav \
--threads 8You can quantize an existing GGUF file using the native quantizer:
./build/voxtral-quantize \
models/voxtral/voxtral.gguf \
models/voxtral/voxtral-q6_k.gguf \
Q6_K \
8The test suite runs over samples/*.wav files.
To verify numeric parity against the reference implementation:
python3 tests/test_voxtral_reference.pyYou can override comparison tolerances via environment variables:
VOXTRAL_TEST_ATOL(default: 1e-2)VOXTRAL_TEST_RTOL(default: 1e-2)VOXTRAL_TEST_THREADS