Skip to content

GRPO Evaluation Fails When TRAIN_MICRO_BATCH_SIZE < num_generations #816

@aaghaazkhan

Description

@aaghaazkhan

Expected Behavior

Evaluation should work consistently regardless of the value of TRAIN_MICRO_BATCH_SIZE.

Actual Behavior

When TRAIN_MICRO_BATCH_SIZE = 1 and the internal GRPO micro-batch size becomes smaller than num_generations, GRPO grouping breaks silently.

This causes:

  • Pre-training evaluation: 0% accuracy
  • Post-training evaluation: all metrics 0%
  • format_accuracy = 0%
  • partial_accuracy = 0%
  • accuracy = 0%

This occurs even though the model generates valid CURE-style completions, and the same code works correctly when micro-batch size ≥ num_generations, even if TRAIN_MICRO_BATCH_SIZE itself is 1.

Root cause: The issue is not "micro-batch size = 1", but that the effective batch slicing becomes smaller than num_generations, causing GRPO to reshape rewards incorrectly.

Steps to Reproduce

  1. Run Tunix GRPO training with:
   TRAIN_MICRO_BATCH_SIZE = 1
   NUM_GENERATIONS = 2   # (default)
  1. Train normally (training runs without any errors).

  2. Run evaluation:

   evaluate(test_dataset, sampler, **GENERATION_CONFIGS["greedy"])
  1. Observe that all evaluation metrics are:
   accuracy = 0%
   partial_accuracy = 0%
   format_accuracy = 0%

Environment

  • OS: Kaggle TPU environment (Debian-based)
  • Project Version:
    • Tunix: google-tunix[prod]==0.1.3
    • JAX: Kaggle TPU default
  • TPU: v3-8
  • Python: Kaggle default kernel
  • Model: Gemma 3 1B IT
  • Training: GRPO + LoRA + QWIX
  • Notebook Environment: Kaggle Notebooks

Checklist

  • I have searched the existing issues for a similar bug report.
  • I have provided all the required information in the "Environment" section.
  • I have provided a minimal, reproducible example.

Would you like to help us fix it?

Yes, I can provide:

  • A minimal reproducible Kaggle Notebook
  • Logs and evaluation outputs
  • Sample completions showing correct CURE structure
  • Any additional debugging information needed
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions