-
Notifications
You must be signed in to change notification settings - Fork 68
Add SGLang backend #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add SGLang backend #477
Conversation
Summary of ChangesHello @denyszhak, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code. (link)
- Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code. (link)
General comments:
- The SGLang Docker integration seems inconsistent:
initFromDockerdetects installation solely via/opt/sglang-env/sglang, but the Dockerfile only sets up a venv and version file without creating that binary, so either the detection logic should rely on the venv/version file or the image should install a wrapper at/opt/sglang-env/sglangto match the runtime expectations. - The TCP configuration for SGLang is currently hard-coded to
127.0.0.1and a fixed base port inrunner.go/sglang_config.go; consider feeding the host/port via configuration or environment so multiple instances or non-local setups can avoid port clashes and have more flexible networking. - Right now the SGLang backend is always created and registered in
main.goregardless of platform, withRunfailing at runtime whenSupportsSGLangis false; you may want to gate backend construction/registration onplatform.SupportsSGLang()to avoid picking an unusable backend on unsupported OSes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The SGLang Docker integration seems inconsistent: `initFromDocker` detects installation solely via `/opt/sglang-env/sglang`, but the Dockerfile only sets up a venv and version file without creating that binary, so either the detection logic should rely on the venv/version file or the image should install a wrapper at `/opt/sglang-env/sglang` to match the runtime expectations.
- The TCP configuration for SGLang is currently hard-coded to `127.0.0.1` and a fixed base port in `runner.go`/`sglang_config.go`; consider feeding the host/port via configuration or environment so multiple instances or non-local setups can avoid port clashes and have more flexible networking.
- Right now the SGLang backend is always created and registered in `main.go` regardless of platform, with `Run` failing at runtime when `SupportsSGLang` is false; you may want to gate backend construction/registration on `platform.SupportsSGLang()` to avoid picking an unusable backend on unsupported OSes.
## Individual Comments
### Comment 1
<location> `pkg/inference/backends/sglang/sglang.go:101-110` </location>
<code_context>
+func (s *sglang) initFromDocker() error {
</code_context>
<issue_to_address>
**issue (bug_risk):** Docker-based installation path and `binaryPath` do not align, so the "docker" path is effectively never used
In the Dockerfile we only create a venv at `/opt/sglang-env` and install the `sglang` package, but we never create an executable at `/opt/sglang-env/sglang`. `initFromDocker` uses the presence of `binaryPath()` (`/opt/sglang-env/sglang`) to detect a Docker-provided binary, so in the current image this check always fails with `fs.ErrNotExist` and the code always falls back to `initFromHost`, even in the container. That also means the `sandboxPath` branch in `Run` that depends on `binaryPath()` is never exercised.
To fix this, either point `binaryPath()` to the venv Python (e.g. `/opt/sglang-env/bin/python3`) and keep `-m sglang.launch_server`, or create a launcher at `/opt/sglang-env/sglang` in the Dockerfile so the current detection logic works as intended.
</issue_to_address>
### Comment 2
<location> `pkg/inference/scheduling/loader_test.go:50-51` </location>
<code_context>
return m.usesExternalModelMgmt
}
+func (m *mockBackend) UsesTCP() bool {
+ return false
+}
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests that cover the new TCP vs Unix-socket behavior in the runner
Right now the only test usage of `Backend.UsesTCP()` is this mock always returning `false`, so the TCP branch in `runner.run` isn’t exercised. Please add tests (e.g. in `runner_test.go`) that:
- use a backend mock with `UsesTCP() == true` and assert the transport dials `tcp` to `127.0.0.1:<base+slot>`;
- use a backend mock with `UsesTCP() == false` and assert it still dials `unix` using `RunnerSocketPath`;
- cover an error from `RunnerSocketPath` when `UsesTCP()` is false.
This will validate the new connection logic for both TCP-based (SGLang) and existing backends.
Suggested implementation:
```golang
func (m *mockBackend) UsesTCP() bool {
return m.usesTCP
}
```
To fully implement your review comment you will also need to:
1. Update the `mockBackend` struct definition in this file to include a `usesTCP bool` field:
- This allows tests to construct backends with either `UsesTCP() == true` or `UsesTCP() == false`.
2. In `runner_test.go` (or the appropriate runner test file), add tests that:
- Create a backend mock with `usesTCP: true` and assert that the runner dials `tcp` with `127.0.0.1:<base+slot>`.
- Create a backend mock with `usesTCP: false` and assert that the runner dials `unix` using `RunnerSocketPath`.
- Use a backend mock with `usesTCP: false` and a `RunnerSocketPath` implementation that returns an error, and assert that the runner surfaces that error.
3. Ensure those tests construct `mockBackend` instances with the new `usesTCP` field set appropriately, and that any existing test helpers/constructors for `mockBackend` are updated to set `usesTCP` to `false` by default to preserve current behavior.
These changes, together with the edit above, will let the tests cover both TCP-based and Unix-socket-based behavior in the runner without altering production code paths.
</issue_to_address>
### Comment 3
<location> `pkg/inference/backends/sglang/sglang_config_test.go:48-57` </location>
<code_context>
+ expected []string
+ expectError bool
+ }{
+ {
+ name: "empty safetensors path should error",
+ bundle: &mockModelBundle{
+ safetensorsPath: "",
+ },
+ mode: inference.BackendModeCompletion,
+ config: nil,
+ expected: nil,
+ expectError: true,
+ },
+ {
+ name: "basic args without context size",
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for invalid socket format to exercise the error path in GetArgs
Right now `TestGetArgs` only covers a valid socket (`"127.0.0.1:30000"`), but `GetArgs` returns an error when `net.SplitHostPort(socket)` fails. Please add a table entry with an invalid socket (e.g. `"127.0.0.1"` or `"30000"`) and wire it so that malformed input is passed into `GetArgs`, asserting `expectError: true` to cover this error path.
Suggested implementation:
```golang
{
name: "empty safetensors path should error",
bundle: &mockModelBundle{
safetensorsPath: "",
},
mode: inference.BackendModeCompletion,
config: nil,
expected: nil,
expectError: true,
},
{
name: "invalid socket format should error",
bundle: &mockModelBundle{
safetensorsPath: "/path/to/model/model.safetensors",
},
mode: inference.BackendModeCompletion,
config: &inference.BackendConfiguration{
// TODO: adjust field name to match the actual socket/host:port field used by GetArgs.
// This should be set to an invalid value (no port) so that net.SplitHostPort fails.
Socket: "127.0.0.1",
},
expected: nil,
expectError: true,
},
{
name: "basic args without context size",
```
1. Adjust the `Socket` field name in the `inference.BackendConfiguration` literal to match your actual configuration type (e.g. it might be `Socket`, `Address`, `ListenAddr`, `GRPCHostPort`, etc.), ensuring this is the field that `GetArgs` passes to `net.SplitHostPort`.
2. Confirm that `TestGetArgs` actually constructs and passes the `config` field from the table entry into `GetArgs`. If not, update the test loop to do so, so that the invalid socket value flows into `GetArgs`.
3. If `GetArgs` requires additional mandatory configuration fields, add them to this test case as needed so that the only failure cause is the malformed socket format.
</issue_to_address>
### Comment 4
<location> `pkg/inference/backends/sglang/sglang_config_test.go:239` </location>
<code_context>
+ }
+}
+
+func TestGetContextLength(t *testing.T) {
+ tests := []struct {
+ name string
</code_context>
<issue_to_address>
**suggestion (testing):** Consider basic integration-style tests for the sglang backend Install/Run flows
The configuration helpers (`GetArgs`, `GetContextLength`) are well covered, but the higher-level behavior in `sglang.go` (Docker vs host Python detection, `Status()` contents, and how `Run` chooses between the binary vs `python3 -m sglang.launch_server`) isn’t exercised. It would be valuable to add tests that, using fakes for `platform.SupportsSGLang`, filesystem checks, and `exec.Command`, verify that:
- `Install` sets `status` and errors correctly for unsupported platforms, missing `python3`, or missing the `sglang` package.
- `Run` fails clearly if no binary and no `pythonPath` are available, and that it passes `Args` and `socket` correctly to `backends.RunBackend`.
These can be shallow unit tests over the key branches rather than full end-to-end tests.
</issue_to_address>
### Comment 5
<location> `pkg/inference/backends/sglang/sglang.go:138` </location>
<code_context>
if err := exec.Command(pythonPath, "-c", "import sglang").Run(); err != nil {
</code_context>
<issue_to_address>
**security (go.lang.security.audit.dangerous-exec-command):** Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
*Source: opengrep*
</issue_to_address>
### Comment 6
<location> `pkg/inference/backends/sglang/sglang.go:144` </location>
<code_context>
output, err := exec.Command(pythonPath, "-c", "import sglang; print(sglang.__version__)").Output()
</code_context>
<issue_to_address>
**security (go.lang.security.audit.dangerous-exec-command):** Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| func (s *sglang) initFromDocker() error { | ||
| sglangBinaryPath := s.binaryPath() | ||
|
|
||
| if _, err := os.Stat(sglangBinaryPath); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| versionBytes, err := os.ReadFile(sglangVersionFile) | ||
| if err != nil { | ||
| s.log.Warnf("could not get sglang version: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Docker-based installation path and binaryPath do not align, so the "docker" path is effectively never used
In the Dockerfile we only create a venv at /opt/sglang-env and install the sglang package, but we never create an executable at /opt/sglang-env/sglang. initFromDocker uses the presence of binaryPath() (/opt/sglang-env/sglang) to detect a Docker-provided binary, so in the current image this check always fails with fs.ErrNotExist and the code always falls back to initFromHost, even in the container. That also means the sandboxPath branch in Run that depends on binaryPath() is never exercised.
To fix this, either point binaryPath() to the venv Python (e.g. /opt/sglang-env/bin/python3) and keep -m sglang.launch_server, or create a launcher at /opt/sglang-env/sglang in the Dockerfile so the current detection logic works as intended.
|
|
||
| s.pythonPath = pythonPath | ||
|
|
||
| if err := exec.Command(pythonPath, "-c", "import sglang").Run(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (go.lang.security.audit.dangerous-exec-command): Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
Source: opengrep
| return ErrSGLangNotFound | ||
| } | ||
|
|
||
| output, err := exec.Command(pythonPath, "-c", "import sglang; print(sglang.__version__)").Output() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (go.lang.security.audit.dangerous-exec-command): Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
Source: opengrep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces sglang as a new inference backend, which is a great addition. The implementation includes creating a new Docker image variant, adding the necessary backend logic, and integrating it with the scheduler. A new UsesTCP method has been added to the Backend interface to support sglang's communication protocol, and this has been implemented across all existing backends.
My review focuses on a few key areas to improve the robustness and security of the new backend:
- Ensuring proper GPU support in the Dockerfile.
- Correctly handling unimplemented memory estimation to prevent scheduler issues.
- Enabling sandboxing for security.
- Expanding platform support to include macOS as intended.
Overall, the changes are well-structured. Addressing these points will make the sglang backend more robust and secure.
| return inference.RequiredMemory{ | ||
| RAM: 1, | ||
| VRAM: 1, | ||
| }, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a sentinel value of 1 for RAM and VRAM is dangerous as it signals to the scheduler that the model requires almost no memory. This can lead to the scheduler overallocating models, causing out-of-memory errors. Since memory estimation is not yet implemented for this backend, it's safer to return an error, similar to how the mlx backend handles it. This will prevent the scheduler from attempting to manage memory for this backend until the feature is implemented.
return inference.RequiredMemory{}, ErrNotImplemented| # Install uv and SGLang as modelrunner user | ||
| RUN curl -LsSf https://astral.sh/uv/install.sh | sh \ | ||
| && ~/.local/bin/uv venv --python /usr/bin/python3 /opt/sglang-env \ | ||
| && ~/.local/bin/uv pip install --python /opt/sglang-env/bin/python "sglang==${SGLANG_VERSION}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current pip install command for sglang will only install the base package, which may not include dependencies for GPU acceleration (like CUDA). To ensure sglang can leverage NVIDIA GPUs, you should install it with the [all] extra, which includes all necessary dependencies for features like FlashAttention and CUDA.
&& ~/.local/bin/uv pip install --python /opt/sglang-env/bin/python "sglang[all]==${SGLANG_VERSION}"
| return fmt.Errorf("sglang: no docker binary at %s and no python runtime configured; did you forget to call Install?", binaryPath) | ||
| } | ||
| binaryPath = s.pythonPath | ||
| sandboxPath = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By setting sandboxPath = "", you are disabling the sandboxing mechanism for the sglang backend when it runs from a Python installation (which is the case even inside the Docker container). Running model inference without a sandbox can pose a security risk, as it allows the model code to have broader access to the system. To maintain proper isolation, the sandbox should remain enabled. The sglangDir (/opt/sglang-env) can be used as the sandbox root.
ericcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Have a look through the bot comments to see if they make sense. One Nvidia/x86_64 test could be helpful if somebody would like to give @denyszhak a hand
|
I can test it on Linux with NVIDIA GPU. |
|
@denyszhak try and make sourcery build green if you can, it has a certain tolerance threshold and you have not met it. Consider what it's saying and try and fix a few. If you feel sourcery is ultimately being harsh after considering all it's comments, let us know. |
|
Hey @denyszhak, I gave this PR a go and I spotted a few things missing:
With my changes, you can now test it (even the CPU variant only) by using:
or
I notice it doesn't work though. I get the same error for both CPU and CUDA variants. I hope this helps. Thanks! 🙌 |
Thank you! I will go over this soon |
|
@denyszhak I checked out doringeman's branch (feature/add-sglang-backend) and can confirm that the docker image builds successfully, the SGLang backend installs correctly, and that server starts fine; I did this on an H200 cluster where I cloned the @doringeman 's branch and in one terminal ran However, I tried running a model with the following command in a seperate terminal with the same model: but got the following message:
Seems like the SGLang backend only supports safetensor formatted models and the test model is in GGUF format (for llama) - is there a compatible model that I should use for testing? |
|
@dougyster welcome! The DMR way would be to use the models from here with -vllm suffix: https://hub.docker.com/u/ai?page=1&search=-vllm they are safetensor ones. |
Awesome! Will try this out and thanks for the warm welcome :) |
|
When this is tested end to end, if you have sglang backend selected, it would be just: and you should have a chatbot that can talk to sglang (in this codebase, the daemon and client are called model-runner and model-cli, so in this codebase we can often run this for testing ./model-cli run ai/smollm2-vllm) |
Change:
Adds SGLang as a new inference backend.
Testing:
Closes #431