Skip to content

Add support for Qwen3-Omni-30B-A3B-Thinking#677

Draft
ajrasane wants to merge 19 commits intomainfrom
ajrasane/qwen3-omni-30B
Draft

Add support for Qwen3-Omni-30B-A3B-Thinking#677
ajrasane wants to merge 19 commits intomainfrom
ajrasane/qwen3-omni-30B

Conversation

@ajrasane
Copy link
Contributor

What does this PR do?

Type of change:
Model support

Overview:

Usage

python hf_ptq.py \
    --pyt_ckpt_path Qwen/Qwen3-Omni-30B-A3B-Thinking \
    --qformat fp8 \
    --calib_size 512 \
    --export_path ./qwen3_omni_30b_fp8 \
    --trust_remote_code \
    --batch_size 2 \
    --calib_size 2 \
    --attn_implementation flash_attention_2

Testing

Able to quantize model and generate output

example outputs before ptq: ['<think>\nGot it, which states are we talking about? Wait, the user didn\'t list any states. Oh, maybe the problem is missing the list
? Wait, no, maybe this is a standard question where the options are implied? Wait, no, the user probably forgot to include the options. Wait, but maybe in the orig
inal context, there were states listed, but here it\'s cut off. Wait, no, looking back: the user says "Which of these states is farthest north?" but didn\'t provid
e the "these states" part. Oh, maybe this is a common question where the options are like Maine, Florida, etc. Wait, but maybe the user made a mistake. Wait, no, m
aybe in the problem, the states are implied by the context. Wait, no, let\'s think: the farthest north state in the US is Alaska, but if it\'s contiguous US, it\'s
 Minnesota or North Dakota? Wait, no, North Dakota is farther north than Minnesota. Wait, but maybe the options are different. Wait, but the user didn\'t list the 
states. Wait, maybe this is a trick question where the answer is Alaska, but let\'s check. Wait, no, the user probably forgot to include the options. Wait, but maybe in the original problem, the states are given, but here it\'s missing. Wait, no, maybe the user is referring to a standard set. Wait, let\'s think: common states for such questions: Alaska, Maine, North Dakota, Minnesota, etc. Alaska is the northernmost state, with its northernmost point at 71°23\' N latitude. The contiguous US has North Dakota as the northernmost, but Alaska is a state. So if Alaska is an option, it\'s Alaska. But since the user didn\'t list the states, maybe they expect Alaska. Wait, but maybe the question is from a specific set. Wait, no, the user probably made a mistake, but in standard US geography, the northernmost state is Alaska. Let\'s confirm: Alaska\'s northernmost point is Cape Prince of Wales at 71°23\' N, while the contiguous US has North Dakota at about 49° N, so Alaska is way farther north. So if Alaska is one of the options, it\'s Alaska. Since the user didn\'t list the states, but this is a common question, the answer is Alaska.\n</think>\n\nTo determine which state is farthest north, we analyze the **geographic latitude** of U.S. states. Among all U.S. states, **Alaska** is the northernmost. Its northernmost point (Cape Prince of Wales) lies at approximately **71°23′ N latitude**, far surpassing the northern limits of contiguous states like North Dakota (≈49° N). Even if the question refers to contiguous states only, North Dakota is the northernmost, but since Alaska is a state and the question does not specify "contiguous," **Alaska** is the correct answer.  \n\n**Answer:** Alaska']
--------
example outputs after ptq: ['<think>\nGot it, ```json\n{\n  "question": "Which of these states is farthest north?",\n  "answer": "Alaska"\n}\n```\n</think>\n\nAlaska']

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

@ajrasane ajrasane self-assigned this Dec 11, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 11, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

"qwen3omni only supports one dataset for calibration, can extend this in the future"
)
assert processor is not None, "The processor must be set for qwen3omni model."
dataset_name = args.dataset[0] if args.dataset else "scienceqa"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still recommend scienceqa as the default calib dataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to cnn_dailymail

num_samples=args.calib_size[0],
)
elif model_type == "qwen3omni":
assert len(args.calib_size) == 1, (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this part, I think we may want to host it in a model specific python file/module. E.g. llm_ptq/models/qwen3omni.py.

@shengliangxu WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to do it for now, I'll come up with a full design doc and then we can convert the whole repo afterwards. Even if we separate things out now, we may still refactor these anyway.

# if args.verbose:
# mtq.print_quant_summary(full_model)

import contextlib
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the top

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -283,7 +283,8 @@ def _get_free_gpu_mem():

free_mem_before, max_allocated_before = _get_free_gpu_mem()
is_enc_dec = model_type_is_enc_dec(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge this into _model_requires_generate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.tokenizer = tokenizer
# Handle invalid device values that can come from multi-GPU models with device_map="auto"
if device is None or str(device) in ("auto", "meta", "cpu"):
device = "cuda"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe print a warning?

And does it mean if "cuda" not in str(device): device="cuda"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this

model_is_already_quantized = is_quantized(model)

model_type = get_model_type(model)
if model_type == "qwen3omni" and os.environ.get("DISABLE_TALKER", "0") == "1":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably need to find a better way for configurations like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have disabled the talker quantization by default

@ajrasane ajrasane force-pushed the ajrasane/qwen3-omni-30B branch 2 times, most recently from 7f80e6f to 0c4b38f Compare December 17, 2025 08:32
@ajrasane ajrasane force-pushed the ajrasane/qwen3-omni-30B branch from 04b3dc6 to 732e686 Compare January 22, 2026 01:22
ajrasane and others added 14 commits February 2, 2026 18:54
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Comment out import and registration of Qwen3OmniMoe classes.

Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/qwen3-omni-30B branch from 2725797 to aa77565 Compare February 2, 2026 19:28
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ajrasane/qwen3-omni-30B

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ajrasane ajrasane force-pushed the ajrasane/qwen3-omni-30B branch from 79ac487 to 0208cc6 Compare February 2, 2026 20:12
@ajrasane ajrasane force-pushed the ajrasane/qwen3-omni-30B branch from 0208cc6 to 3e775ea Compare February 2, 2026 20:16
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/qwen3-omni-30B branch from 3e775ea to 3f12551 Compare February 3, 2026 00:00
quant_cfg["quant_cfg"]["*self_attn.q*"] = {"enable": False}
quant_cfg["quant_cfg"]["*self_attn.kv*"] = {"enable": False}

if model_type == "qwen3omni":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this level of qformat is too detailed. Can you recommend one and use it for Qwen3 Omni?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic nvfp4 format works fine, we can use that for now. I will add these formats in a separate document for later reference.

# See the License for the specific language governing permissions and
# limitations under the License.

"""Script to pre-generate processed video dataset for Qwen3-Omni quantization."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this generation script qwen3_omni specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I dont think we need to merge this in our codebase. Will document this separately.

"nvfp4_mlp_only": mtq.NVFP4_MLP_ONLY_CFG,
"nvfp4_svdquant": mtq.NVFP4_SVDQUANT_DEFAULT_CFG,
"mxfp8": mtq.MXFP8_DEFAULT_CFG,
"qwen3_nvfp4_qkv_disabled": mtq.NVFP4_DEFAULT_CFG,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I would recommend we don't introduce this qformats

# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""Script to load and run a quantized Qwen3Omni model from export_hf_checkpoint."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: why do we need this example?

print(f" Copied {fname}")


def main():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this file? Does vllm serve work?

f"Running optimization on language model with fake_input shape: {fake_input.shape}"
)
language_model(fake_input)
with set_quantizer_by_cfg_context(model, {"*": {"enable": False}}):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

"This is required for requantization/resmoothing optimization. "
"Please ensure the model architecture is supported or file an issue."
)
elif "qwen3omni" in model_type:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update get_language_model_from_vl and cover the following logic inside get_language_model_from_vl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sub_module, quantizer_attrs.weight_quantizer
)

# Skip export if weight quantizer is disabled or has no amax (not calibrated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq what error will we see if these logics are not added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wont be required

dtype: The data type for weight conversion.
is_modelopt_qlora: Whether the model is a modelopt-trained QLoRA model.
If True, modules with base_layer attribute are skipped.
pack_weights: Whether to pack quantized weights.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially trying to export the checkpoint without packing the weights. But this wont be required as vllm also expects the model to have packed weights.

@cjluo-nv cjluo-nv requested a review from Edwardf0t1 February 3, 2026 07:33
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants