Running whisper.cpp on Windows

Whisper is a great tool to transcribe audio, it however has some drawbacks. Namely the large model is just too big to fit in a simple commercial GPU’s video RAM and it is painfully slow on simple CPUs.

This is where quantization comes in the picture. In my previous article, I have already covered the installation of whisper-ctranslate2 which offloads the processing to GPU using a quantized model. Now I will cover on how the CPU or non-Nvidia GPUs can be utilized with the whisper.cpp framework.

Preparing the environment

I assume you already have git, curl and Anaconda installed, if not, there are great resources explaining those on the Internet.

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"

Building for CPU

cmake . --fresh
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy bin\Release\main.exe bin\Release\whisper.dll ..\whisper_cpp.exe

Building with OpenBLAS

Download OpenBLAS from https://github.com/xianyi/OpenBLAS/releases. Extract the release into a folder in your source path (OpenBLAS-0.3.23-x64 at the time of writing)

set OPENBLAS_PATH=OpenBLAS-0.3.23-x64
cmake . --fresh -DWHISPER_OPENBLAS=ON -DBLAS_LIBRARIES=OpenBLAS-0.3.23-x64\lib\libopenblas.lib
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy OpenBLAS-0.3.23-x64\bin\libopenblas.dll ..\libopenblas.exp.dll
copy bin\Release\whisper.dll ..
copy bin\Release\main.exe ..\whisper_cpp.exe
copy bin\Release\quantize.exe ..

Building with CLBlast

You will need the OpenCL libraries for your architecture. I am targeting Intel Iris Xe processor, built in the I7 and it should also support the ARC family. For Intel you can download the SDK from here. Once downloaded, extract and install to the recommended location.

Download CLBlast from https://github.com/CNugteren/CLBlast/releases. Extract the release directory into your source path. (1.6.1 at the time of writing).

Edit the CLBlast-1.6.1-windows-x64\lib\cmake\CLBlast\CLBlastConfig.cmake file and change the part in the INTERFACE_INCLUDE_DIRECTORIES line from ;C:/vcpkg/packages/opencl_x64-windows/include to the path you installed the SDK in (C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk\include).

INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include;C:/Program Files (x86)/IntelSWTools/system_studio_2020/OpenCL/sdk/include/"

Edit the CLBlast-1.6.1-windows-x64\lib\cmake\CLBlast\CLBlastConfig-release.cmake file and change the line IMPORTED_LINK_INTERFACE_LIBRARIES_RELEASE to point to “C:/Program Files (x86)/IntelSWTools/system_studio_2020/OpenCL/sdk/lib/x64/OpenCL.lib”

set CLBlast_DIR=lib\cmake
cmake . --fresh -DWHISPER_CLBLAST=ON 
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy CLBlast-1.6.1-windows-x64\lib\clblast.dll ..
copy bin\Release\whisper.dll ..
copy bin\Release\main.exe ..\whisper_cpp.exe
copy bin\Release\quantize.exe ..

I fixed the issue with the kernel (Windows line-encoding issue) and the fixes are already included in the upstream.

Building for NVIDIA

To use MSVC to build with CUDA support, you need to install MS Visual Studio as well as the cuda-toolkit on your computer (not just within Conda).
Download toolkit from https://developer.nvidia.com/cuda-toolkit-archive

All you really need is the CUDA\Runtime\Libraries, CUDA\Development and CUDA\Visual Studio Integration, so you can select custom install and untick everything but these.

cmake . --fresh -DWHISPER_CUBLAS=ON
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy bin\Release\whisper.dll ..
copy bin\Release\main.exe ..\whisper_cpp.exe

Preparing your model

Switch to your whisper directory and create a directory to hold your models

cd ..
md models

Download your favourite model using, the following commands (you better delete it first, to make sure you download the entire model). As the cmd in whisper.cpp/models uses the extremely slow PowerShell to get the file, you may use curl the following if you are in a hurry.

set model=large
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-%model%.bin -o models\ggml-%model%.bin

Quantize the model to make it smaller, you can use q4_0, q4_1, q5_0, q5_1, q8_0 as the quantization scale.

quantize.exe models\ggml-large.bin models\ggml-large-q4_0.bin q4_0

Run on existing sound files

Use ffmpeg to convert your original to

ffmpeg -i INPUT.MP3 -ar 16000 -ac 1 -c:a pcm_s16le OUTPUT.WAV

Transcribing your files

whisper_cpp.exe --help

usage: whisper_cpp.exe [options] file0.wav file1.wav ...

options:
  -h,        --help              [default] show this help message and exit
  -t N,      --threads N         [4      ] number of threads to use during computation
  -p N,      --processors N      [1      ] number of processors to use during computation
  -ot N,     --offset-t N        [0      ] time offset in milliseconds
  -on N,     --offset-n N        [0      ] segment index offset
  -d  N,     --duration N        [0      ] duration of audio to process in milliseconds
  -mc N,     --max-context N     [-1     ] maximum number of text context tokens to store
  -ml N,     --max-len N         [0      ] maximum segment length in characters
  -sow,      --split-on-word     [false  ] split on word rather than on token
  -bo N,     --best-of N         [2      ] number of best candidates to keep
  -bs N,     --beam-size N       [-1     ] beam size for beam search
  -wt N,     --word-thold N      [0.01   ] word timestamp probability threshold
  -et N,     --entropy-thold N   [2.40   ] entropy threshold for decoder fail
  -lpt N,    --logprob-thold N   [-1.00  ] log probability threshold for decoder fail
  -su,       --speed-up          [false  ] speed up audio by x2 (reduced accuracy)
  -tr,       --translate         [false  ] translate from source language to english
  -di,       --diarize           [false  ] stereo audio diarization
  -tdrz,     --tinydiarize       [false  ] enable tinydiarize (requires a tdrz model)
  -nf,       --no-fallback       [false  ] do not use temperature fallback while decoding
  -otxt,     --output-txt        [false  ] output result in a text file
  -ovtt,     --output-vtt        [false  ] output result in a vtt file
  -osrt,     --output-srt        [false  ] output result in a srt file
  -olrc,     --output-lrc        [false  ] output result in a lrc file
  -owts,     --output-words      [false  ] output script for generating karaoke video
  -fp,       --font-path         [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
  -ocsv,     --output-csv        [false  ] output result in a CSV file
  -oj,       --output-json       [false  ] output result in a JSON file
  -of FNAME, --output-file FNAME [       ] output file path (without file extension)
  -ps,       --print-special     [false  ] print special tokens
  -pc,       --print-colors      [false  ] print colors
  -pp,       --print-progress    [false  ] print progress
  -nt,       --no-timestamps     [false  ] do not print timestamps
  -l LANG,   --language LANG     [en     ] spoken language ('auto' for auto-detect)
  -dl,       --detect-language   [false  ] exit after automatically detecting language
             --prompt PROMPT     [       ] initial prompt
  -m FNAME,  --model FNAME       [models/ggml-base.en.bin] model path
  -f FNAME,  --file FNAME        [       ] input WAV file path
  -oved D,   --ov-e-device DNAME [CPU    ] the OpenVINO device used for encode inference

Results

Transcoding a 10 minutes sound file with the same model parameter size has produced similar outputs on all models, but at a much better speed.

whisper (standard)11 033s
whisper-ctranslate21 623s
whisper.cpp CPU (q4_0) 1 processor, 4 threads1 001s
whisper.cpp CPU (q4_0) 3 processors, 6 threads886s
whisper.cpp CLBLast (q4_0) 1 processor, 4 threads743s
whisper.cpp CLBLast (q4_0) 3 processors, 6 threads577s

13 thoughts on “Running whisper.cpp on Windows

  1. Hi,

    When I run main.exe, there is no result displayed.

    Also when I run your app in the command mode, there is a popup window and the whole app crashes.

    Any idea why?

    1. Hi,

      First of all, I cannot take any credit for the app, all I did was listing the steps to build the native Whisper implementation by the legendary G. Gerganov. (https://github.com/ggerganov/whisper.cpp) I did the writeup mostly for myself, so it will be repeatable across a number of my machines.

      Unfortunately, I can’t help you, as I have zero information on what machine you are running, what build you made, what the popup looks like, etc.

      main.exe MUST be run from the command line and there is no way it would not return without a response, if you built for the proper environment.

      Try building for the CPU first, that’s the simplest. Once you’re done with that, you can start building for your proper environment OpenBlas should run on most architectures, ClBlast needs CPU specific libraries, of those I got Intel’s as all my machines run Intel, so if you have AMD, you’ll need to research it yourself. If you have Nvidia GPU, you may try the CuDA version.

      If you find a particular issue, please let me know, so I can update the article!

  2. Any idea how to fix cublas build?
    cuda 12.2 and 11.6
    VS 2022 community
    PS E:\AUDIO_AI\whisper.cpp> cmake . –fresh -DWHISPER_CUBLAS=ON
    — Building for: Visual Studio 17 2022
    CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
    Compatibility with CMake < 3.5 will be removed from a future version of
    CMake.
    Update the VERSION argument value or use a … suffix to tell
    CMake that the project does not need compatibility with older versions.
    — The C compiler identification is MSVC 19.37.32824.0
    — The CXX compiler identification is MSVC 19.37.32824.0
    — Detecting C compiler ABI info
    — Detecting C compiler ABI info – done
    — Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe – skipped
    — Detecting C compile features
    — Detecting C compile features – done
    — Detecting CXX compiler ABI info
    — Detecting CXX compiler ABI info – done
    — Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe – skipped
    — Detecting CXX compile features
    — Detecting CXX compile features – done
    — Found Git: C:/Program Files/Git/cmd/git.exe (found version “2.42.0.windows.2”)
    — Performing Test CMAKE_HAVE_LIBC_PTHREAD
    — Performing Test CMAKE_HAVE_LIBC_PTHREAD – Failed
    — Looking for pthread_create in pthreads
    — Looking for pthread_create in pthreads – not found
    — Looking for pthread_create in pthread
    — Looking for pthread_create in pthread – not found
    — Found Threads: TRUE
    — Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/include (found version “11.6.55”)
    — cuBLAS found
    CMake Error at C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:753 (message):
    Compiling the CUDA compiler identification source file
    “CMakeCUDACompilerId.cu” failed.
    Compiler:
    Build flags:
    Id flags: –keep;–keep-dir;tmp -v
    The output was:
    1
    MSBuild version 17.7.2+d6990bcfa for .NET Framework
    Build started 24/09/2023 00:05:00.
    Project
    “E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj”
    on node 1 (default targets).
    PrepareForBuild:
    Creating directory “Debug\”.
    Creating directory “Debug\CompilerIdCUDA.tlog\”.
    InitializeBuildStatus:
    Creating “Debug\CompilerIdCUDA.tlog\unsuccessfulbuild” because “AlwaysCreate” was specified.
    Touching “Debug\CompilerIdCUDA.tlog\unsuccessfulbuild”.
    AddCudaCompileDeps:
    C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\bin\HostX64\x64\cl.exe /E /nologo /showIncludes /TP /D__CUDACC__ /D__CUDACC_VER_MAJOR__=11 /D__CUDACC_VER_MINOR__=6 /D_MBCS /I”C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include” /I”C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin” /I”C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include” /I. /FIcuda_runtime.h /c E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CMakeCUDACompilerId.cu
    Project
    “E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj”
    (1) is building
    “E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj”
    (1:2) on node 1 (CudaBuildCore target(s)).
    CudaBuildCore:
    Compiling CUDA source file CMakeCUDACompilerId.cu…
    C:\Program Files\Microsoft Visual
    Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA
    11.6.targets(790,9): error MSB3686: Unable to create Xaml task.
    Compilation failed.
    [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    C:\Program Files\Microsoft Visual
    Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA
    11.6.targets(790,9): error MSB3686: Source file
    ‘C:\windows\TEMP\zww0gwsc.0.cs’ could not be found
    [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    C:\Program Files\Microsoft Visual
    Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA
    11.6.targets(790,9): error MSB3686:
    [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    C:\Program Files\Microsoft Visual
    Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA
    11.6.targets(790,9): error MSB4175: The task factory “XamlTaskFactory”
    could not be loaded from the assembly “Microsoft.Build.Tasks.v4.0,
    Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a”. Object
    reference not set to an instance of an object.
    [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    Done Building Project
    “E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj”
    (CudaBuildCore target(s)) — FAILED.
    Done Building Project
    “E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj”
    (default targets) — FAILED.
    Build FAILED.
    “E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj”
    (default target) (1) ->
    “E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj”
    (CudaBuildCore target) (1:2) ->
    (CudaBuildCore target) ->
    C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 11.6.targets(790,9): error MSB3686: Unable to create Xaml task. Compilation failed. [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    C:\Program Files\Microsoft Visual
    Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA
    11.6.targets(790,9): error MSB3686: Source file
    ‘C:\windows\TEMP\zww0gwsc.0.cs’ could not be found
    [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    C:\Program Files\Microsoft Visual
    Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA
    11.6.targets(790,9): error MSB3686:
    [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 11.6.targets(790,9): error MSB4175: The task factory “XamlTaskFactory” could not be loaded from the assembly “Microsoft.Build.Tasks.v4.0, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a”. Object reference not set to an instance of an object. [E:\AUDIO_AI\whisper.cpp\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]
    0 Warning(s)
    2 Error(s)
    Time Elapsed 00:00:01.58
    Call Stack (most recent call first):
    C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
    C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
    C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCUDACompiler.cmake:307 (CMAKE_DETERMINE_COMPILER_ID)
    CMakeLists.txt:178 (enable_language)
    — Configuring incomplete, errors occurred!
    PS E:\AUDIO_AI\whisper.cpp>

  3. Thanks for writing this. I was struggling to compile whisper.cpp at all until I found this page

    I have an update to share:

    Looks like the large models have been versioned so the curl statement no longer works. Here is a download url for one of the large model versions:

    https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3.bin

    Notes for other users:

    If whisper_cpp.exe gives you a FileNotFound error install ffmpeg
    I used “scoop install ffpeg”

    Powershell notes:
    – If using powershell: use $env:OPENBLAS_PATH = “your_path”
    – powershell: replace %model% with your model
    – Some arguments needed to be quoted in powershell

  4. Similarly to build llama-cpp-python python package with OpenBLAS


    "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
    set CMAKE_ARGS=-DLLAMA_BLAS=ON -DBLAS_LIBRARIES=...\OpenBLAS-0.3.23-x64\lib\libopenblas.lib -DBLAS_INCLUDE_DIRS=...\OpenBLAS-0.3.23-x64\include
    pip install llama-cpp-python[server] --force-reinstall --no-deps --no-cache --verbose

    or CLBlast

    "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
    set CLBlast_DIR=...\CLBlast-1.6.1-windows-x64\lib\cmake\
    set CMAKE_ARGS=-DLLAMA_CLBLAST=ON -DBLAS_LIBRARIES=...\CLBlast-1.6.1-windows-x64\lib\clblast.lib
    pip install llama-cpp-python[server] --force-reinstall --no-deps --no-cache --verbose

  5. the article is very confusing and does not even show what dependencies to install on windows (like cmake), what if someone is starting from scratch?, its also unclear what to do with this command:
    “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat”

    1. I am sorry to disappooint you. I am not in the position of writing detailed descriptions on how to set up a basic Windows development environment from scratch.
      There are great tutorials by great educators for setting up the basics all over the Internet. If you need help in that however, this article is not for you.

      1. i didnt ask for “how to set up a basic Windows development”, i asked: “what is the basic env that you expect me to setup”? what dependencies does it contain? i can then go figure the “how” myself.
        you did not answer my question on what does this line even mean: “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat”, its just a path not even a command.
        it seems like you are explaining things to yourself, not to other ppl (self notes), sorry i dont mean to be rude but the article is useless.

        1. “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat”, its just a path not even a command.

          This is a batch file, (so a command, executable from the command line) to load the Visual C variables (vcvars) from an installed Visual Studio 20222 Community edition. That’s all there in the name. This is required to use build tools from the command line.

          This guide is for advanced users, who chose to build stuff for themselves, not an all-in-one walkthrough. This may require further research on your behalf.

          I believe you can already download these programs as prebuilts from several sources now, so there is no need to build it on your own.

  6. Hi,
    great article, I’m quite a newbie with AI, and dev in general but after some search I ended up getting it to work :o)

    I’m not sure to understand what quantizing models exactly do, in fact i’m sure I don’t !
    I’ve made some tests and got some results ( 5_1 seems to work the best for me),
    Can you recommend a source of information on this subject?

    1. There are great tutorials on YouTube, I learned mostly on ollama, langchain and llama.cpp GitHub wikis.
      In short you take each model parameter and change it’s value from high definition float value (32 bits per model weight) and map it to low definition 4, 5 or 8 bit values. It’s loosing a lot of clarity, but you will have so many parameters it’s still a good approximation of the model.
      You will be able to fit the model in the memory and it will be quite close to the high definition results.

  7. very nice tutorial.
    Managed to compile fully with no error the NVIDIA version CUDA 12.5, VS 2022.
    However, when I execute whisper.exe in the terminal, I get 0 error, 0 info. Just release the command after few seconds…strange. Any idea ?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.