Skip to content

Conversation

@yiguanxianyu
Copy link

This PR fixes the following:

  • ensures video metadata is correctly returned in Qwen3VLProcessor when return_metadata=True during processor.apply_chat_template().

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_vl

@Rocketknight1
Copy link
Member

cc @zucchini-nlp @molbap @yonigozlan

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting! I didn't know that the BatchFeature doesn't work well with non array-like structure. I don't think the current solution works well, it pushed us to update all models and the output type is different depending on the metadata

I recommend to fix it in def as_tensor() fn in BatchFeature, so that it doesn't try to convert everything to torch/np and instead checks the type of the input, whether it can be converted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants