Skip to content

unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly #479

@mathieu-benoit

Description

@mathieu-benoit

I'm getting this error:

Failed to generate a response: error response: status=500 body=unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly: llama.cpp exit status: exit status 0xc0000005

I'm with DD on Windows, using WSL2 (ECI disabled in this case).
DMR enabled via DD.
I'm able to run docker model pull commands, docker model list:

MODEL NAME           PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED       CONTEXT  SIZE
gemma3               3.88 B      MOSTLY_Q4_K_M   gemma3        a353a8898c9d  2 months ago           2.31 GiB
qwen2.5:0.5B-F16     494.03 M    F16             qwen2         3e1aad67b4cc  7 months ago           942.43 MiB
smollm2:360M-Q4_K_M  361.82 M    IQ2_XXS/Q4_K_M  llama         354bf30d0aa3  8 months ago           256.35 MiB

But when running docker model run ai/qwen2.5:0.5B-F16 "who are you", I'm getting the error described above: unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly.
Also, docker model status is saying:

Docker Model Runner is running

Status:
llama.cpp: running llama.cpp latest-cpu (sha256:ea16f02ab4b7ce60f05a2cc3d08d2643e53f2c7bb9187c6644fbf108d898739d) version: unknown
vllm: not installed

Looking at docker model logs -f, seeing this:

-------------------------------------------------------------------------------->8
[2025-11-20T20:49:58.776194000Z][inference] Running on system with 32265 MB RAM
[2025-11-20T20:49:58.780206900Z][inference.model-manager] Successfully initialized store
[2025-11-20T20:49:58.780206900Z][inference] 2 backends available
-------------------------------------------------------------------------------->8
[2025-11-20T20:56:13.809536300Z][inference] Running on system with 4094 MB VRAM
[2025-11-20T20:56:13.834988700Z][inference] Running on system with 32265 MB RAM
[2025-11-20T20:56:13.836031600Z][inference.model-manager] Successfully initialized store
[2025-11-20T20:56:13.836578100Z][inference] 2 backends available
[2025-11-20T20:56:17.663784000Z][inference] Reconciling service state on initialization
[2025-11-20T20:56:17.664289000Z][inference] Reconciling service state on settings change
[2025-11-20T20:56:17.664289000Z][inference.inference-llama.cpp] downloadLatestLlamaCpp: latest, cpu, C:\Program Files\Docker\Docker\resources\model-runner\bin, <HOME>\.docker\bin\inference\com.docker.llama-server.exe
[2025-11-20T20:56:18.991409200Z][inference.inference-llama.cpp][W] could not get llama.cpp version: exit status 0xc0000005
[2025-11-20T20:56:18.991409200Z][inference.inference-llama.cpp] failed to ensure latest llama.cpp: bundled llama.cpp version is up to date, no need to update
[2025-11-20T20:56:19.174425300Z][inference.inference-llama.cpp][W] Failed to determine if llama-server is built with GPU support: exit status 0xc0000005
[2025-11-20T20:56:19.174425300Z][inference.inference-llama.cpp] installed llama-server with gpuSupport=false
[2025-11-20T20:56:19.174425300Z][inference][W] Backend installation failed for vllm: not implemented
[2025-11-20T20:56:36.211617900Z][inference.model-manager] Listing available models
[2025-11-20T20:56:36.212180800Z][inference.model-manager] Successfully listed models, count: 0

And this:

[2025-12-02T15:57:10.302145800Z][inference.model-manager] Successfully listed models, count: 3
[2025-12-02T15:57:15.206081100Z][inference.model-manager] Getting model by reference: ai/qwen2.5:0.5B-F16
[2025-12-02T15:57:18.599097500Z][inference.model-manager] Getting model by reference: ai/qwen2.5:0.5B-F16
[2025-12-02T15:57:18.600749200Z][inference.model-manager] Getting model by reference: ai/qwen2.5:0.5B-F16
[2025-12-02T15:57:18.602932000Z][inference.model-manager] Checking model by reference: sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:23.575350900Z][inference] Loading sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26, which will require 1563 MB RAM and 129 MB VRAM on a system with 32265 MB RAM and 4094 MB VRAM
[2025-12-02T15:57:23.575350900Z][inference] Loading llama.cpp backend runner with model sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26 in completion mode
[2025-12-02T15:57:23.575350900Z][inference.openai-recorder][W] SetConfigForModel called with nil config for model sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:23.583948900Z][inference.inference-llama.cpp] llama.cpp args: [-ngl 999 --metrics --model C:\\Users\\<USER>\\.docker\\models\\bundles\\sha256\\3e1aad67b4cc8e3dca660fe65f9f73edb598474284256f...[truncated] --host C:\\Users\\<USER>\\AppData\\Local\\Docker\\run\\inference-0.sock --ctx-size 4096 --jinja]
[2025-12-02T15:57:24.117776700Z][inference][W] Backend llama.cpp running model ai/qwen2.5:0.5B-F16 exited with error: llama.cpp terminated unexpectedly: llama.cpp exit status: exit status 0xc0000005
[2025-12-02T15:57:24.577909900Z][inference.model-manager] Getting model by reference: sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:24.581195900Z][inference.openai-recorder][W] No records found for model: sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:24.581195900Z][inference][W] Initialization for llama.cpp backend runner with model sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26 in completion mode failed: llama.cpp terminated unexpectedly: llama.cpp exit status: exit status 0xc0000005

More info about my setup:

Processor	Intel(R) Core(TM) Ultra 7 155H (1.40 GHz)
System type	64-bit operating system, x64-based processor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions