Qwen 3 Coder on Docker Hub would be a good model to test this with https://github.com/ggml-org/llama.cpp/pull/17570 The max context size an 36GB VRAM macbook pro can handle is: llama-server -hf unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M -c 65536