Stage 3 student policy fine-tuning does not work as expected.

Hi,

I have reported these bugs in the [IsaacLab GitHub issue](https://github.com/isaac-sim/IsaacLab/issues/4156), but I believe the Stage-3 finetuning issue may actually originate from the rsl-rl side rather than IsaacLab.

In addition to testing with the example locomotion code, I also implemented Stage-3 finetuning in my own RL environments, and I consistently observe the same behaviour: the distilled student policy does not continue learning from the Stage-2 weights, but instead appears to learn from scratch.

Here is the training curve from my environments (they eventually learn good policies, but the curves match training-from-scratch behaviour):

<img width="645" height="415" alt="Image" src="https://github.com/user-attachments/assets/de28d5f0-0a5c-47e9-a0da-ea870a8dc08e" />

Please note that in stage 3 only the NN params are loaded. I did not load the optimizer parameters, since the distilled policy and the finetuning stage use different training objectives and therefore should not share the same optimizer state (please correct me if this assumption is incorrect).

I am wondering whether there are any recommendations or known tips for ensuring that Stage-3 truly performs finetuning of the distilled policy rather than re-initializing learning unintentionally.

All the best,
Yijiong (Bourne)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stage 3 student policy fine-tuning does not work as expected. #147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stage 3 student policy fine-tuning does not work as expected. #147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions