-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
I'm getting this error:
Failed to generate a response: error response: status=500 body=unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly: llama.cpp exit status: exit status 0xc0000005
I'm with DD on Windows, using WSL2 (ECI disabled in this case).
DMR enabled via DD.
I'm able to run docker model pull commands, docker model list:
MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED CONTEXT SIZE
gemma3 3.88 B MOSTLY_Q4_K_M gemma3 a353a8898c9d 2 months ago 2.31 GiB
qwen2.5:0.5B-F16 494.03 M F16 qwen2 3e1aad67b4cc 7 months ago 942.43 MiB
smollm2:360M-Q4_K_M 361.82 M IQ2_XXS/Q4_K_M llama 354bf30d0aa3 8 months ago 256.35 MiB
But when running docker model run ai/qwen2.5:0.5B-F16 "who are you", I'm getting the error described above: unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly.
Also, docker model status is saying:
Docker Model Runner is running
Status:
llama.cpp: running llama.cpp latest-cpu (sha256:ea16f02ab4b7ce60f05a2cc3d08d2643e53f2c7bb9187c6644fbf108d898739d) version: unknown
vllm: not installed
Looking at docker model logs -f, seeing this:
-------------------------------------------------------------------------------->8
[2025-11-20T20:49:58.776194000Z][inference] Running on system with 32265 MB RAM
[2025-11-20T20:49:58.780206900Z][inference.model-manager] Successfully initialized store
[2025-11-20T20:49:58.780206900Z][inference] 2 backends available
-------------------------------------------------------------------------------->8
[2025-11-20T20:56:13.809536300Z][inference] Running on system with 4094 MB VRAM
[2025-11-20T20:56:13.834988700Z][inference] Running on system with 32265 MB RAM
[2025-11-20T20:56:13.836031600Z][inference.model-manager] Successfully initialized store
[2025-11-20T20:56:13.836578100Z][inference] 2 backends available
[2025-11-20T20:56:17.663784000Z][inference] Reconciling service state on initialization
[2025-11-20T20:56:17.664289000Z][inference] Reconciling service state on settings change
[2025-11-20T20:56:17.664289000Z][inference.inference-llama.cpp] downloadLatestLlamaCpp: latest, cpu, C:\Program Files\Docker\Docker\resources\model-runner\bin, <HOME>\.docker\bin\inference\com.docker.llama-server.exe
[2025-11-20T20:56:18.991409200Z][inference.inference-llama.cpp][W] could not get llama.cpp version: exit status 0xc0000005
[2025-11-20T20:56:18.991409200Z][inference.inference-llama.cpp] failed to ensure latest llama.cpp: bundled llama.cpp version is up to date, no need to update
[2025-11-20T20:56:19.174425300Z][inference.inference-llama.cpp][W] Failed to determine if llama-server is built with GPU support: exit status 0xc0000005
[2025-11-20T20:56:19.174425300Z][inference.inference-llama.cpp] installed llama-server with gpuSupport=false
[2025-11-20T20:56:19.174425300Z][inference][W] Backend installation failed for vllm: not implemented
[2025-11-20T20:56:36.211617900Z][inference.model-manager] Listing available models
[2025-11-20T20:56:36.212180800Z][inference.model-manager] Successfully listed models, count: 0
And this:
[2025-12-02T15:57:10.302145800Z][inference.model-manager] Successfully listed models, count: 3
[2025-12-02T15:57:15.206081100Z][inference.model-manager] Getting model by reference: ai/qwen2.5:0.5B-F16
[2025-12-02T15:57:18.599097500Z][inference.model-manager] Getting model by reference: ai/qwen2.5:0.5B-F16
[2025-12-02T15:57:18.600749200Z][inference.model-manager] Getting model by reference: ai/qwen2.5:0.5B-F16
[2025-12-02T15:57:18.602932000Z][inference.model-manager] Checking model by reference: sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:23.575350900Z][inference] Loading sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26, which will require 1563 MB RAM and 129 MB VRAM on a system with 32265 MB RAM and 4094 MB VRAM
[2025-12-02T15:57:23.575350900Z][inference] Loading llama.cpp backend runner with model sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26 in completion mode
[2025-12-02T15:57:23.575350900Z][inference.openai-recorder][W] SetConfigForModel called with nil config for model sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:23.583948900Z][inference.inference-llama.cpp] llama.cpp args: [-ngl 999 --metrics --model C:\\Users\\<USER>\\.docker\\models\\bundles\\sha256\\3e1aad67b4cc8e3dca660fe65f9f73edb598474284256f...[truncated] --host C:\\Users\\<USER>\\AppData\\Local\\Docker\\run\\inference-0.sock --ctx-size 4096 --jinja]
[2025-12-02T15:57:24.117776700Z][inference][W] Backend llama.cpp running model ai/qwen2.5:0.5B-F16 exited with error: llama.cpp terminated unexpectedly: llama.cpp exit status: exit status 0xc0000005
[2025-12-02T15:57:24.577909900Z][inference.model-manager] Getting model by reference: sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:24.581195900Z][inference.openai-recorder][W] No records found for model: sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26
[2025-12-02T15:57:24.581195900Z][inference][W] Initialization for llama.cpp backend runner with model sha256:3e1aad67b4cc8e3dca660fe65f9f73edb598474284256ffdd9ba460b5b35ff26 in completion mode failed: llama.cpp terminated unexpectedly: llama.cpp exit status: exit status 0xc0000005
More info about my setup:
Processor Intel(R) Core(TM) Ultra 7 155H (1.40 GHz)
System type 64-bit operating system, x64-based processor
Metadata
Metadata
Assignees
Labels
No labels