We are using the Qwen3-32B model for RL training. This model natively supports 32K context length and can extend the context window to 64K or even 128K using the YaRN method.
During the previous SFT stage, we enabled YaRN for long context training.
We hope ROLL can support rope_scaling configuration, specifically including:
Training Stage (Trainer): Support passing rope_scaling configuration when loading the model
Inference Stage (Rollout/vLLM/SGLang): Support corresponding RoPE scaling configuration during generation