[Feature Request] Support RoPE Scaling (YaRN) for Long Context Extension Training

We are using the Qwen3-32B model for RL training. This model natively supports 32K context length and can extend the context window to 64K or even 128K using the YaRN method.

During the previous SFT stage, we enabled YaRN for long context training.

We hope ROLL can support **rope_scaling** configuration, specifically including:

Training Stage (Trainer): Support passing rope_scaling configuration when loading the model
Inference Stage (Rollout/vLLM/SGLang): Support corresponding RoPE scaling configuration during generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Support RoPE Scaling (YaRN) for Long Context Extension Training #276

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Support RoPE Scaling (YaRN) for Long Context Extension Training #276

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions