Skip to content

Implement and test Anthropic Messages API #475

@ericcurtin

Description

@ericcurtin

Qwen 3 Coder on Docker Hub would be a good model to test this with

ggml-org/llama.cpp#17570

The max context size an 36GB VRAM macbook pro can handle is:

llama-server -hf unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M -c 65536

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions