Skip to content

Commit aff56ac

Browse files
committed
copy over
Signed-off-by: Brian Yu <[email protected]>
1 parent f67fa48 commit aff56ac

File tree

8 files changed

+608
-2
lines changed

8 files changed

+608
-2
lines changed

docs/about/concepts/key-terminology.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,8 @@ Online vs Offline Training
8585
Multi-turn
8686
Conversations spanning multiple exchanges where context and state persist across turns.
8787
88-
Multi-step
89-
Complex tasks requiring models to break problems into sequential steps, often using tools and intermediate reasoning.
88+
Multi-step
89+
Complex tasks requiring agents to break problems into sequential steps, often using tools and intermediate reasoning.
9090
9191
Tool Use / Function Calling
9292
Models invoking external capabilities (APIs, calculators, databases) to accomplish tasks beyond text generation.

docs/index.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,16 @@ how-to-faq.md
171171
reference/cli-commands.md
172172
```
173173

174+
```{toctree}
175+
:caption: Training
176+
:hidden:
177+
:maxdepth: 1
178+
179+
training/index
180+
training/rl-framework-integration/index.md
181+
```
182+
183+
174184
```{toctree}
175185
:caption: Reference
176186
:hidden:

docs/training/index.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
(training-index)=
2+
3+
# Training with NeMo Gym
4+
5+
Conceptual guides for training with NeMo Gym.
6+
7+
---
8+
9+
::::{grid} 1 1 1 1
10+
:gutter: 1 1 1 2
11+
12+
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` Integrate Gym into RL frameworks
13+
:link: training-framework-integration
14+
:link-type: ref
15+
Implement NeMo Gym integration into a new training framework.
16+
+++
17+
{bdg-primary}`training` {bdg-secondary}`infra`
18+
:::
19+
20+
::::
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
(generation-backend-and-openai-compatible-http-server)=
2+
3+
# Generation Backend
4+
5+
Gym requires an OpenAI-compatible HTTP server to handle model generations during training. This page covers the server requirements and existing implementations across popular RL frameworks.
6+
7+
## OpenAI-Compatible Server Requirements
8+
9+
Gym communicates with generation backends using the OpenAI HTTP API specification. Your generation server must implement endpoints compatible with one of these reference implementations:
10+
11+
```{list-table}
12+
:header-rows: 1
13+
:widths: 30 70
14+
15+
* - Provider
16+
- Documentation
17+
* - OpenAI API
18+
- [Responses API Reference](https://platform.openai.com/docs/api-reference/responses/create)
19+
* - Gemini
20+
- [OpenAI Compatibility](https://ai.google.dev/gemini-api/docs/openai)
21+
* - vLLM
22+
- [OpenAI-Compatible Server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server/)
23+
* - SGLang
24+
- [OpenAI-Compatible APIs](https://docs.sglang.io/basic_usage/openai_api.html)
25+
* - TGI
26+
- [OpenAI Messages API](https://huggingface.co/docs/text-generation-inference/en/reference/api_reference#openai-messages-api)
27+
```
28+
29+
## Generation in RL Training
30+
31+
Most RL frameworks that support policy optimization algorithms (PPO, GRPO) require online on-policy model generations. Integrating generation backends into the RL training loop introduces several challenges:
32+
33+
- **Refit**: Synchronizing model weights between training and generation
34+
- **Off-policyness**: Ensuring generations reflect the current policy state
35+
- **Latency**: Minimizing generation overhead during training iterations
36+
37+
## Existing Framework Implementations
38+
39+
The following table shows how popular RL frameworks implement generation backends.
40+
41+
:::{tip}
42+
If your framework uses vLLM or SGLang, you can reference these implementations when adding OpenAI HTTP server support.
43+
:::
44+
45+
```{list-table}
46+
:header-rows: 1
47+
:widths: 25 25 50
48+
49+
* - Framework
50+
- Generation Backend
51+
- Reference Implementation
52+
* - NeMo RL
53+
- vLLM
54+
- [vllm_generation.py](https://github.com/NVIDIA-NeMo/RL/blob/a99bc262e5cde92575538c31ccacde27c60c3681/nemo_rl/models/generation/vllm/vllm_generation.py)
55+
* - VeRL
56+
- HF, vLLM, SGLang
57+
- [hf_rollout.py](https://github.com/volcengine/verl/blob/fd893c788dbdb967c6eb62845b09a02e38819ac1/verl/workers/rollout/hf_rollout.py), [vLLM rollout](https://github.com/volcengine/verl/tree/fd893c788dbdb967c6eb62845b09a02e38819ac1/verl/workers/rollout/vllm_rollout), [SGLang rollout](https://github.com/volcengine/verl/tree/fd893c788dbdb967c6eb62845b09a02e38819ac1/verl/workers/rollout/sglang_rollout)
58+
* - TRL
59+
- vLLM, HF
60+
- [grpo_trainer.py (vLLM)](https://github.com/huggingface/trl/blob/cbd90d4297a877587a07bdcd82f8fc87338efe5b/trl/trainer/grpo_trainer.py#L557), [grpo_trainer.py (HF)](https://github.com/huggingface/trl/blob/cbd90d4297a877587a07bdcd82f8fc87338efe5b/trl/trainer/grpo_trainer.py#L661)
61+
* - Slime
62+
- SGLang
63+
- [sglang_engine.py](https://github.com/THUDM/slime/blob/0612652a8e6ed7fd670ecc29101d4ca877490bf6/slime/backends/sglang_utils/sglang_engine.py#L87)
64+
* - OpenPIPE ART
65+
- vLLM
66+
- [vLLM module](https://github.com/OpenPipe/ART/tree/6273a6fa5457e87e696b1c3a5820292826684370/src/art/vllm)
67+
```
68+
69+
NeMo RL, VeRL, Slime, and OpenPIPE ART all expose OpenAI-compatible HTTP server endpoints.
70+
71+
## Integration Guidelines
72+
73+
### Frameworks Using vLLM or SGLang
74+
75+
If your training framework already uses vLLM or SGLang but does not expose an OpenAI-compatible HTTP server:
76+
77+
1. Reference the implementations listed above
78+
2. Add server endpoints that follow the OpenAI API specification
79+
3. Test your implementation using the [vLLM HTTP server tests from NeMo RL](https://github.com/NVIDIA-NeMo/RL/blob/a99bc262e5cde92575538c31ccacde27c60c3681/tests/unit/models/generation/test_vllm_generation.py#L1079-L1247)
80+
81+
### Frameworks Using Other Backends
82+
83+
If your training framework does not use vLLM or SGLang as a generation backend, you may need significant refactoring to achieve proper Gym integration. Consider:
84+
85+
- Migrating to vLLM or SGLang for generation
86+
- Implementing an adapter layer that exposes OpenAI-compatible endpoints
87+
- Evaluating the complexity of maintaining a custom generation backend
88+
89+
## Related Topics
90+
91+
After setting up your generation backend, proceed to:
92+
93+
- {doc}`openai-compatible-http-server-on-policy-correction` - Required fixes for multi-step and multi-turn scenarios
94+
- {doc}`gym-integration-footprint-and-form-factor` - Full integration component breakdown
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
(gym-integration-footprint-and-form-factor)=
2+
3+
# Integration Footprint
4+
5+
This page provides a reference for the components required to integrate Gym into your training framework. Each component includes links to the NeMo RL reference implementation and corresponding tests.
6+
7+
## Integration Components
8+
9+
A complete Gym integration consists of five components, implemented in sequence:
10+
11+
```{list-table}
12+
:header-rows: 1
13+
:widths: 5 25 35 35
14+
15+
* -
16+
- Component
17+
- Implementation
18+
- Tests
19+
* - 1
20+
- **OpenAI-Compatible HTTP Server**
21+
- [vllm_worker_async.py:264](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/nemo_rl/models/generation/vllm/vllm_worker_async.py#L264)
22+
- [test_vllm_generation.py:1107](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/tests/unit/models/generation/test_vllm_generation.py#L1107)
23+
* - 2
24+
- **On-Policy Token ID Fixes**
25+
- [vllm_worker_async.py:40](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/nemo_rl/models/generation/vllm/vllm_worker_async.py#L40)
26+
- [test_vllm_generation.py:1250](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/tests/unit/models/generation/test_vllm_generation.py#L1250)
27+
* - 3
28+
- **Gym Spinup and Integration**
29+
- [nemo_gym.py](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/nemo_rl/environments/nemo_gym.py)
30+
- [test_nemo_gym.py](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/tests/unit/environments/test_nemo_gym.py)
31+
* - 4
32+
- **Rollout Orchestration**
33+
- [rollouts.py:975](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/nemo_rl/experience/rollouts.py#L975)
34+
- [test_rollouts.py:754](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/tests/unit/experience/test_rollouts.py#L754)
35+
* - 5
36+
- **GRPO Train Loop Integration**
37+
- [grpo.py:1157](https://github.com/NVIDIA-NeMo/RL/blob/64ab08df3edf25131959fc474b44ed5e36a1600b/nemo_rl/algorithms/grpo.py#L1157)
38+
- End-to-end tests in progress
39+
```
40+
41+
:::{note}
42+
As of December 8, 2025, end-to-end tests for GRPO train loop integration are still being implemented in the NeMo RL repository.
43+
:::
44+
45+
## Component Details
46+
47+
### 1. OpenAI-Compatible HTTP Server
48+
49+
**Purpose**: Expose your generation backend as an OpenAI-compatible endpoint.
50+
51+
**Prerequisites**: vLLM or SGLang generation backend.
52+
53+
**Reference**: Refer to {doc}`generation-backend-and-openai-compatible-http-server` for implementation guidance.
54+
55+
### 2. On-Policy Token ID Fixes
56+
57+
**Purpose**: Prevent train-generation mismatch in multi-step and multi-turn scenarios.
58+
59+
**Prerequisites**: OpenAI-compatible HTTP server.
60+
61+
**Reference**: Refer to {doc}`openai-compatible-http-server-on-policy-correction` for technical details.
62+
63+
### 3. Gym Spinup and Integration
64+
65+
**Purpose**: Initialize and connect to Gym training environments.
66+
67+
**Key responsibilities**:
68+
69+
- Environment configuration loading
70+
- Connection management
71+
- State synchronization
72+
73+
### 4. Rollout Orchestration
74+
75+
**Purpose**: Coordinate rollout collection between the policy and Gym environments.
76+
77+
**Key responsibilities**:
78+
79+
- Batch rollout management
80+
- Multi-step and multi-turn handling
81+
- Token ID tracking for on-policy corrections
82+
83+
### 5. GRPO Train Loop Integration
84+
85+
**Purpose**: Integrate Gym rollouts into the policy optimization training loop.
86+
87+
**Key responsibilities**:
88+
89+
- Rollout scheduling within training iterations
90+
- Loss calculation with Gym-generated experiences
91+
- Weight synchronization between training and generation
92+
93+
## Implementation Checklist
94+
95+
Use this checklist to track your integration progress:
96+
97+
- [ ] OpenAI-compatible HTTP server implemented and tested
98+
- [ ] On-policy token ID fixes implemented and tested
99+
- [ ] Gym spinup and environment connection working
100+
- [ ] Rollout orchestration handling multi-step/multi-turn scenarios
101+
- [ ] GRPO (or equivalent) train loop integration complete
102+
103+
## Related Topics
104+
105+
- {doc}`gym-rl-framework-integration-success-criteria` - Validate your integration
106+
- {doc}`generation-backend-and-openai-compatible-http-server` - Generation backend setup
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
(gym-rl-framework-integration-success-criteria)=
2+
3+
# Success Criteria
4+
5+
Use these criteria to validate that your Gym integration is working correctly. A successful integration must pass all validation benchmarks.
6+
7+
:::{tip}
8+
These success criteria may evolve as new integration challenges are discovered. Check this page for updates when troubleshooting integration issues.
9+
:::
10+
11+
## Validation Checklist
12+
13+
### 1. Component Form Factor
14+
15+
Verify that your integration implements all required components as specified in {doc}`gym-integration-footprint-and-form-factor`:
16+
17+
- [ ] OpenAI-compatible HTTP server
18+
- [ ] On-policy token ID fixes
19+
- [ ] Gym spinup and integration
20+
- [ ] Rollout orchestration
21+
- [ ] Training loop integration
22+
23+
### 2. Environment Configuration
24+
25+
Verify that your integration can load and run arbitrary Gym training environments through configuration:
26+
27+
- [ ] Environment configuration loads from YAML
28+
- [ ] Multiple environments can be selected at runtime
29+
- [ ] Environment parameters are configurable without code changes
30+
31+
### 3. Math Reasoning Benchmark
32+
33+
Train on the DAPO17k math training environment and verify model improvement on AIME24.
34+
35+
```{list-table}
36+
:header-rows: 1
37+
:widths: 25 75
38+
39+
* - Parameter
40+
- Value
41+
* - Training environment
42+
- [DAPO17k math environment](https://github.com/NVIDIA-NeMo/Gym/blob/299e8c04f4a3bbf0f6069139092225f2fe3aa70f/resources_servers/math_with_judge/configs/bytedtsinghua_dapo17k.yaml)
43+
* - Base model
44+
- [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
45+
* - Minimum training steps
46+
- 1,000
47+
* - Validation set
48+
- AIME24 (included with training environment)
49+
* - Target accuracy
50+
- ≥85%
51+
```
52+
53+
### 4. Workplace Assistant Benchmark
54+
55+
Train on the workplace assistant environment and verify validation set improvements.
56+
57+
```{list-table}
58+
:header-rows: 1
59+
:widths: 25 75
60+
61+
* - Parameter
62+
- Value
63+
* - Training environment
64+
- [Workplace assistant environment](https://github.com/NVIDIA-NeMo/Gym/tree/299e8c04f4a3bbf0f6069139092225f2fe3aa70f/resources_servers/workplace_assistant)
65+
* - Base model
66+
- [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
67+
* - Minimum training steps
68+
- 100
69+
* - Success criterion
70+
- Observable validation set improvement
71+
```
72+
73+
## Troubleshooting
74+
75+
If your integration fails to meet the success criteria:
76+
77+
1. **Training crashes**: Check for off-policy issues. Refer to {doc}`openai-compatible-http-server-on-policy-correction`
78+
2. **No improvement**: Verify rollout orchestration is correctly tracking token IDs
79+
3. **Environment errors**: Verify OpenAI-compatible HTTP server endpoints match the specification
80+
81+
## Related Topics
82+
83+
- {doc}`gym-integration-footprint-and-form-factor` - Required integration components
84+
- {doc}`openai-compatible-http-server-on-policy-correction` - On-policy training fixes

0 commit comments

Comments
 (0)