only add parameter with grads to parameter group#7869
Conversation
Signed-off-by: Ma, Guokai <guokai.ma@intel.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0265ef538
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
A new test is added, this test will fail on master with the exact error described, and pass with this fix. |
Signed-off-by: Ma, Guokai <guokai.ma@intel.com>
Signed-off-by: Ma, Guokai <guokai.ma@intel.com>
|
Hi @PKUWZP This PR is ready for review again. A test is added to expose the issue it intend to fix. |
This PR fix a bug when Muon optimizer is used on training part of the model parameters.
When train part of the model parameters (and freeze all others). In certain case, all trainable paramters will use Muon optimizer and non of them use AdamW optimizer, or vice versa. It will cause one of
muon_paramsandnon_muon_paramsto contain only non-trainable parameters, which would eventurally cause the following failure.A reasonable fix is only add parameter with grads to
muon_paramsandnon_muon_params, so the case above would cause one of the parameter groups to be empty and get filterd out immediately.