-
Notifications
You must be signed in to change notification settings - Fork 188
Open
Labels
type:bugSomething isn't workingSomething isn't working
Description
Expected Behavior
Evaluation should work consistently regardless of the value of TRAIN_MICRO_BATCH_SIZE.
Actual Behavior
When TRAIN_MICRO_BATCH_SIZE = 1 and the internal GRPO micro-batch size becomes smaller than num_generations, GRPO grouping breaks silently.
This causes:
- Pre-training evaluation:
0% accuracy - Post-training evaluation: all metrics
0% format_accuracy = 0%partial_accuracy = 0%accuracy = 0%
This occurs even though the model generates valid CURE-style completions, and the same code works correctly when micro-batch size ≥ num_generations, even if TRAIN_MICRO_BATCH_SIZE itself is 1.
Root cause: The issue is not "micro-batch size = 1", but that the effective batch slicing becomes smaller than num_generations, causing GRPO to reshape rewards incorrectly.
Steps to Reproduce
- Run Tunix GRPO training with:
TRAIN_MICRO_BATCH_SIZE = 1
NUM_GENERATIONS = 2 # (default)-
Train normally (training runs without any errors).
-
Run evaluation:
evaluate(test_dataset, sampler, **GENERATION_CONFIGS["greedy"])- Observe that all evaluation metrics are:
accuracy = 0%
partial_accuracy = 0%
format_accuracy = 0%
Environment
- OS: Kaggle TPU environment (Debian-based)
- Project Version:
- Tunix:
google-tunix[prod]==0.1.3 - JAX: Kaggle TPU default
- Tunix:
- TPU: v3-8
- Python: Kaggle default kernel
- Model: Gemma 3 1B IT
- Training: GRPO + LoRA + QWIX
- Notebook Environment: Kaggle Notebooks
Checklist
- I have searched the existing issues for a similar bug report.
- I have provided all the required information in the "Environment" section.
- I have provided a minimal, reproducible example.
Would you like to help us fix it?
Yes, I can provide:
- A minimal reproducible Kaggle Notebook
- Logs and evaluation outputs
- Sample completions showing correct CURE structure
- Any additional debugging information needed

Metadata
Metadata
Assignees
Labels
type:bugSomething isn't workingSomething isn't working