GRPO Evaluation Fails When TRAIN_MICRO_BATCH_SIZE < num_generations

## Expected Behavior

Evaluation should work consistently regardless of the value of `TRAIN_MICRO_BATCH_SIZE`.

## Actual Behavior

When `TRAIN_MICRO_BATCH_SIZE = 1` and the internal GRPO micro-batch size becomes smaller than `num_generations`, GRPO grouping breaks silently.

This causes:
* Pre-training evaluation: `0% accuracy`
* Post-training evaluation: all metrics `0%`
* `format_accuracy = 0%`
* `partial_accuracy = 0%`
* `accuracy = 0%`

This occurs even though the model generates valid CURE-style completions, and the same code works correctly when micro-batch size ≥ num_generations, even if `TRAIN_MICRO_BATCH_SIZE` itself is 1.

**Root cause:** The issue is not "micro-batch size = 1", but that the effective batch slicing becomes smaller than `num_generations`, causing GRPO to reshape rewards incorrectly.

## Steps to Reproduce

1. Run Tunix GRPO training with:
```python
   TRAIN_MICRO_BATCH_SIZE = 1
   NUM_GENERATIONS = 2   # (default)
```

2. Train normally (training runs without any errors).

3. Run evaluation:
```python
   evaluate(test_dataset, sampler, **GENERATION_CONFIGS["greedy"])
```

4. Observe that all evaluation metrics are:
```
   accuracy = 0%
   partial_accuracy = 0%
   format_accuracy = 0%
```

## Environment

* **OS:** Kaggle TPU environment (Debian-based)
* **Project Version:** 
  * Tunix: `google-tunix[prod]==0.1.3`
  * JAX: Kaggle TPU default
* **TPU:** v3-8
* **Python:** Kaggle default kernel
* **Model:** Gemma 3 1B IT
* **Training:** GRPO + LoRA + QWIX
* **Notebook Environment:** Kaggle Notebooks

## Checklist

- [x] I have searched the existing issues for a similar bug report.
- [x] I have provided all the required information in the "Environment" section.
- [x] I have provided a minimal, reproducible example.

## Would you like to help us fix it?

**Yes**, I can provide:
* A minimal reproducible Kaggle Notebook
* Logs and evaluation outputs
* Sample completions showing correct CURE structure
* Any additional debugging information needed

<img width="1389" height="677" alt="Image" src="https://github.com/user-attachments/assets/1055cef5-ac76-4561-8b83-5fc21fafd69c" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO Evaluation Fails When TRAIN_MICRO_BATCH_SIZE < num_generations #816

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

Checklist

Would you like to help us fix it?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO Evaluation Fails When TRAIN_MICRO_BATCH_SIZE < num_generations #816

Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

Checklist

Would you like to help us fix it?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions