Skip to content

Stage 3 student policy fine-tuning does not work as expected. #147

@yijionglin

Description

@yijionglin

Hi,

I have reported these bugs in the IsaacLab GitHub issue, but I believe the Stage-3 finetuning issue may actually originate from the rsl-rl side rather than IsaacLab.

In addition to testing with the example locomotion code, I also implemented Stage-3 finetuning in my own RL environments, and I consistently observe the same behaviour: the distilled student policy does not continue learning from the Stage-2 weights, but instead appears to learn from scratch.

Here is the training curve from my environments (they eventually learn good policies, but the curves match training-from-scratch behaviour):

Image

Please note that in stage 3 only the NN params are loaded. I did not load the optimizer parameters, since the distilled policy and the finetuning stage use different training objectives and therefore should not share the same optimizer state (please correct me if this assumption is incorrect).

I am wondering whether there are any recommendations or known tips for ensuring that Stage-3 truly performs finetuning of the distilled policy rather than re-initializing learning unintentionally.

All the best,
Yijiong (Bourne)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions