-
Notifications
You must be signed in to change notification settings - Fork 452
Description
Hi,
I have reported these bugs in the IsaacLab GitHub issue, but I believe the Stage-3 finetuning issue may actually originate from the rsl-rl side rather than IsaacLab.
In addition to testing with the example locomotion code, I also implemented Stage-3 finetuning in my own RL environments, and I consistently observe the same behaviour: the distilled student policy does not continue learning from the Stage-2 weights, but instead appears to learn from scratch.
Here is the training curve from my environments (they eventually learn good policies, but the curves match training-from-scratch behaviour):
Please note that in stage 3 only the NN params are loaded. I did not load the optimizer parameters, since the distilled policy and the finetuning stage use different training objectives and therefore should not share the same optimizer state (please correct me if this assumption is incorrect).
I am wondering whether there are any recommendations or known tips for ensuring that Stage-3 truly performs finetuning of the distilled policy rather than re-initializing learning unintentionally.
All the best,
Yijiong (Bourne)