-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Alternative glTF coordinate conversion #20394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Alternative glTF coordinate conversion #20394
Conversation
…s`, which has separate options for nodes, scene, and meshes. Scene conversion is not implemented yet.
…t both meshes and joints were converted, but we're only converting meshes now. Also moved it to a clearer `ConvertInverseCoordinates` trait.
… the vertex conversion, so spawned scenes are visually correct.
…weaked documentation.
…or of exposing a conversion `Mat4` and letting the skin code handle it.
…e of the conversion transform.
janhohenheim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not test this in-depth with all scenes, but the explanation and the code make sense to me. This is a well thought out improvement on the situation on main and I feel comfortable merging this, provided we untangle the question of whether it should convert by default or not.
I'm in favor of still nudging the user to opt-in so we can gather feedback before changing the default, but that decision is outside the scope of this PR.
|
Changed from draft to ready for review. I've done some more testing, added a migration guide for 0.18, and expanded the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the code. Didn't test asset importing with this PR.
…ser should consider.
| /// coordinate system. This means cameras and lights will match | ||
| /// `Transform::forward` even if conversion is disabled. | ||
| #[derive(Copy, Clone, Default, Debug, Serialize, Deserialize)] | ||
| pub struct GltfConvertCoordinates { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like the big downside of this new approach is that local transforms within a scene (that correspond to nodes) will now no longer have "correct" forward directions. This relies on nesting the scene underneath a top level entity that is "uncorrected", which we use to call "forward" on. Trying to do that within the scene will not produce the expected "forward" behavior.
This is likely to be problematic in the world of BSN "scene inheritance", where rather than spawning the gltf scene root underneath a higher level "bevy scene root", we would instead treat the gltf scene root as the root, and write our changes directly on top. I believe this is the better model when compared to the current "add another root on top" approach, as it allows developers to think about and interact with GLTF scenes exactly as if they were bevy scenes (ex: if the root node has a property, then when we inherit from it, we can directly read/write that property on the root).
I believe this will also come into play if developers try to animate the root transform of the scene, which will snap our corrections back to whatever the animation thinks is correct.
The benefit of course is that the scene is "locally consistent". But within that space, everything is still "uncorrected". If a "player" node exists inside a larger GLTF "world" scene, that player scene's forward will be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got one thing to check before I reply in full - does BSN require a single scene root entity or does it support multiple roots? Asking because glTF supports multiple roots, so if BSN doesn't then that could spoil things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, gonna reply now so we can keep moving forward - I'll assume that BSN does support multiple roots.
There's something I should have clarified in the PR: dropping support for node conversion in the short-term doesn't stop us adding it back in later. There are some awkward questions around cameras (which the 0.17 conversion has to address too), but if they're solved then it's feasible to support "scene conversion" and "node conversion" as mutually exclusive options. I've edited the PR to add more detail on this - see the "Future" section at the bottom.
Having two options might seem more complex than necessary. But I think it reflects that different users will have different ideas of what the correct semantics are, and which parts of the scene they prefer to modify.
So right now the 0.17 conversion could be framed as "support one use case, but with bugs". While this PR is "support a different use case, without bugs". Both choices have paths to supporting the other's use case. But I feel this PR is the least worst choice in the short-term, and solves the more common use case of current users.
Another path forward is to try solving the camera problems first, then come back to scene conversion versus node conversion later. I can expand on this solution if desired.
This is likely to be problematic in the world of BSN "scene inheritance", where rather than spawning the gltf scene root underneath a higher level "bevy scene root", we would instead treat the gltf scene root as the root, and write our changes directly on top.
That behaviour can be supported by this PR if the correction is baked into the glTF root nodes instead of the current scene root.
I believe this will also come into play if developers try to animate the root transform of the scene, which will snap our corrections back to whatever the animation thinks is correct.
That issue can be solved by this PR if the animation of root nodes is corrected. Note that 0.17 and node conversion in general already requires correcting animations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll assume that BSN does support multiple roots.
Yes and no. BSN currently has two forms: bsn! and bsn_list!. These are separate things because bsn! can then be a single root / directly inheritable / directly writeable on top and bsn_list! can be spawned flat as many roots or as a relationship. We're also discussing "BSN sets", which would be collections of multiple named BSN entries.
I think we'll want to latch on to one of the two models for glTF assets (or perhaps load it as a BSN set and force developers to select individual scenes from it).
Having two options might seem more complex than necessary. But I think it reflects that different users will have different ideas of what the correct semantics are, and which parts of the scene they prefer to modify.
Another path forward is to try solving the camera problems first, then come back to scene conversion versus node conversion later. I can expand on this solution if desired.
I don't object to supporting both paths, provided they are clearly labeled. I think node conversion is likely the "endgame" and I think I actually prefer leaving the current node conversion code in, but putting it under an experimental flag with docs explaining the situation and limitations. No need to throw it out entirely / start from scratch. It seems like the limitations are a result of not doing some things, rather than doing things incorrectly.
That behaviour can be supported by this PR if the correction is baked into the glTF root nodes instead of the current scene root.
I don't think thats true in all cases. The current corrective behavior only works because we have a root node that is "uncorrected / identity" that we call forward on. Doing the correction at the root removes that, so forward would still be mismatched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think thats true in all cases. The current corrective behavior only works because we have a root node that is "uncorrected / identity" that we call forward on. Doing the correction at the root removes that, so forward would still be mismatched.
I think I disagree, but I might be just misunderstanding how BSN works. The "scene conversion" option in this PR is basically "do whatever is necessary to make it visually correct but without changing node semantics". I think that can work however BSN organizes the hierarchy, even if some cases are more complex (like needing animation fixup). But maybe this discussion should wait until BSN lands and I can understand it better.
I don't object to supporting both paths, provided they are clearly labeled. I think node conversion is likely the "endgame" and I think I actually prefer leaving the current node conversion code in, but putting it under an experimental flag with docs explaining the situation and limitations. No need to throw it out entirely / start from scratch. It seems like the limitations are a result of not doing some things, rather than doing things incorrectly.
On node conversion being the endgame, I don't agree. It's a useful option, but I think some users will also want the option of whole scene conversion that doesn't change node semantics. Maybe that's how their assets ended up, or maybe they prefer those semantics, or maybe they feel more comfortable with a more minimal option.
Moving forward, I've sketched out a PR that might more agreeable: greeble-dev#1. This keeps the current node conversion and adds a mutually exclusive scene conversion option. I don't like this, because I think it's risky to continue shipping a quirky node conversion when users might come to depend on the quirks, and the path to reliable node conversion remains unclear. But I can follow it through to a real PR if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "scene conversion" option in this PR is basically "do whatever is necessary to make it visually correct but without changing node semantics". I think that can work however BSN organizes the hierarchy
The point is that "do whatever" currently relies on adding an additional layer of hierarchy when spawning in Bevy to provide the "correct" forward direction. If we remove that, then we can't "do whatever" to fix it without changing the node semantics. This isn't a problem currently, but it will surface the moment we remove that added layer of hierarchy (and I strongly think that we should).
Just want to note that I fully agree with what @greeble-dev says here
I'm cool with "no node conversion, reliable scene conversion" path for 0.18. I just expect us to very quickly feel the need to reopen this conversation (and reintroduce essentially all of the existing node conversion code) when BSN lands and we drop the extra hierarchy layer.
Semantics: "scene conversion"
I don't love this framing for the current implementation. I think a more functional description like rotate_scene_root would help people conceptualize this better than something like "scene conversion" which could be interpreted to be a more holistic conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm still confused about the hierarchy issues, but I'll assume we don't have to resolve that now.
I'm happy to develop greeble-dev#1 into an alternative PR if a maintainer thinks that might help move things forward or make decisions clearer.
Semantics: "scene conversion"
I don't love this framing for the current implementation. I think a more functional description like
rotate_scene_rootwould help people conceptualize this better than something like "scene conversion" which could be interpreted to be a more holistic conversion.
I'm reluctant to use the word "root" as (pre-BSN at least) that could be misinterpreted as the glTF root node(s) or the entity with the SceneRoot component.
I could call it rotate_scenes. But the catch is that the separate mesh and (future) node conversions are more complex, and arguably a functional description won't fit in their names - they've got more going on than a single rotation. So we'd probably end up with an awkward struct GltfCoordinateConversion { rotate_scenes, meshes, nodes }.
Or we add rotate to all of them, even if it's a bit approximate. So I favor the current state, but I'm fine to change it to rotate_scenes, rotate_meshes if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed a proposed rotate_scene_entity (this matches the terminology in the docs and solves the "root ambiguity" issue) and rotate_meshes rename. I really don't think high level / general purpose names are appropriate here. Open to alternatives though!
I also added some additional clarity to the rotate_scene_entity docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm down to merge this for 0.18 if y'all are.
Objective
Change glTF coordinate conversion to satisfy some common use cases while dodging the more controversial aspects. This fixes #20621, but at the cost of removing one feature.
Summary
The Bevy glTF loader can optionally convert nodes and meshes from glTF's "+Z forward" semantics to Bevy's "-Z forward". But the current implementation has issues, particularly with cameras and lights. It might also cause problems for users who want to re-orient the scene as a whole while preserving the original node semantics.
This PR replaces node conversion with a simpler correction to the scene root and mesh entities. The new approach satisfies many use cases and fixes the issues with cameras and lights. But it could be a regression for some users.
Background
There's been confusion over how glTF behaves and what users might want from coordinate conversion. This section recaps the basic concepts, glTF's semantics, the current loader behaviour, and some potential user stories. Or you can skip to the next section if you want to get straight to the changes.
Click to expand
Coordinate Systems and Semantics
3D coordinate systems can have semantics assigned to their axes. These semantics are often defined as a forward axis, an up axis, and a handedness - the side axis is implicit in the other choices.
Bevy's standard semantics are "-Z = forward, +Y = up, right handed". This standard is codified by the
forwardandupmethods ofTransformandGlobalTransform, and by the renderer's interpretation of camera and light transforms. There are debates about the standard and whether users should be able to choose different semantics. This PR does not account for those debates, and assumes that users want to follow the current standard.Other engines, DCCs, and file formats can have different semantics. Unlike Bevy, some vary their semantics by object type - a camera's forward axis may not be the same as a light's. Some only specify an up axis, leaving the forward and side axes unspecified.
Assets might not follow the standard semantics of their file format. Static mesh hierarchies and skeletal animation rigs may even have per-node or per-joint semantics - a character rig could be +Y forward in the scene, while the head joint is +Z forward. One character rig might have both feet +X forward, while another rig might have the left foot +X forward and the right foot -X forward. This creates complexity, but also creates jobs, so no-one can say if it's good or bad.
Asset Loaders And Coordinate Conversion
Bevy currently has a glTF loader, and I'm assuming it will get in-repo FBX and USD loaders at some point. These loaders are likely to follow a common pattern:
Meshassets and skinned meshes.Transform.Scenewith an entity hierarchy that tries to match the file's node hierarchy.Users may want asset loaders that convert assets to Bevy's standard semantics, so
Transform::forwardmatches the asset. But the details of conversion can be contentious - users may want some parts of the scene to be converted differently from other parts, and assets may have ambiguities than can only be resolved by the user. There will never be a simple "it just works" option, although there could be a least worst default that satisfies the biggest group of users.Converting in the loader is not the only option. The user could edit the assets themselves or run a conversion script in DCC. But that's a pain - particularly for users who rely on asset packs and don't have DCC experience. Another option is to implement an asset transform that does coordinate conversion. But having the options right there in the loader is convenient.
User Stories
For coordinate conversion in the loader, some user stories might be:
SceneRoot(load("my.gltf"))and have it visually match the entity'sTransform::forward(), and cameras and lights should do the right thing.Transform::forward().Mesh3d(load("mesh.gltf#Mesh0"))and wants it to match the entity's forward.glTF Semantics
glTF scene semantics are "+Z = forward, +Y = up, right handed". This is almost the same as Bevy, except that scene forward is +Z instead of Bevy's -Z.
Some glTF assets do not follow the spec's scene semantics. The Kenney asset packs use a mix of +Z and -Z forward. At least one of the Khronos sample assets uses +X forward. That said, the majority of Kenney assets and almost all the Khronos sample assets I tested do follow the spec.
glTF camera node and light node semantics are different from glTF scene semantics - they're -Z forward, same as Bevy.
The glTF spec doesn't explicitly say if non-camera/light nodes and mesh buffers have semantics. I'm guessing that some users will have nodes and meshes that follow the spec's scene semantics, and might want them converted to Bevy semantics. But as noted in the user stories, it's likely that other users will have different needs.
glTF and Bevy allow a single node/entity to be both a mesh and a camera or a light. This only makes sense if the user intends the mesh to have the same semantics as cameras and lights. I think it's very unlikely that significant numbers of users will want support for this combination - many other DCCs, file formats and engines don't support it at all.
How The Bevy glTF Loader Works
The loader maps glTF nodes to Bevy entities. It also adds entities for two cases:
SceneRootcomponent - the scene root entity is a child of that entity.A single branch of the resulting scene hierarchy might look like this:
SceneRootcomponent.Mesh3dcomponent)Mesh3dcomponent).glTF Loader Changes In 0.17
In Bevy 0.16, the only user story supported by the glTF loader was "no conversion". During the 0.17 cycle, #19633 and some follow up PRs implemented an option that converts nodes, meshes and animation tracks.
The changes do satisfy some user stories, including the common "convert scene semantics" (mostly) and "convert mesh semantics". But there's some problems (#20621):
Solution
The big change in this PR is the removal of node conversion. Instead, corrective transforms are applied to the scene root entity and mesh primitive entities.
Before this PR:
After this PR:
The result is visually the same even though the scene internals are different. Cameras and lights now work correctly, including when animated.
The new conversion is also simpler. There's no need to convert animations, and the scene part of the conversion only changes a single entity:
Removing node conversion might be a regression for some users. My guess is that most users just want to spawn a scene with the correct orientation and don't worry about individual node transforms, so on balance this PR will be win. But I don't have much evidence to back that up. There might also be a path to adding node conversion back in as an option - see the "Future" section below.
The previous conversion option -
GltfPlugin::use_model_forward_direction- has been split into two separate options for scene and mesh conversion.struct GltfPlugin { ... - use_model_forward_direction: bool, + convert_coordinates: GltfConvertCoordinates, }This might be turn out to be unnecessary flexibility, but I think it's the safer option for now in case users have unexpected needs. Both options are disabled by default.
Testing
I've tested various examples and glTFs with each combination of options, including glTFs with animated cameras and lights.
Future
Click to expand
This PR removes node conversion, which is a desirable feature for some users. There are a couple of ways it could be added back as an option.
The difficult part of node conversion is how to support camera and light nodes. glTF's camera/light semantics already match Bevy's -Z forward, so simply converting every node from +Z to -Z forward will leave camera and light nodes facing the wrong direction.
The obvious solution is to special case camera/light node transforms - this is what the 0.17 conversion tries to do. But it's surprisingly complex to get right due to animation, child nodes, and nodes that can be meshes and cameras and lights. E.g. children of cameras and lights need a counter-conversion applied to their transform and animation tracks.
For cameras, an alternative would be to split them multiple entities. The existing entity would correspond to the glTF node and be converted like every other node. But the Bevy
Cameracomponent would be on a new child entity and have a corrective transform.Before:
Cameracomponent and animated transform.After:
Cameracomponent and corrective transform.Lights are already set up this way, so they only need the corrective transform.
This approach is simpler since nodes are treated uniformly. And it's arguably a better reflection of the glTF format - glTF cameras are kind of a separate thing from nodes, and can be given a name that's different to their node's name. So it could be better for some users.
The downside is that the glTF node entity might have the wrong semantics from the perspective of some users (although not all). And it will be annoying for users who currently assume the
Cameracomponent is on the node entity.Alternatives
Click to expand
What About The Forward Flag Proposal?
There's a proposal to allow per-transform semantics, aka the "forward flag". This means the axis of
Transform::forward()and others would depend on a variable in theTransform. In theory the forward flag might avoid the need for coordinate conversion in the loader. But whether that works in practice is unclear, and the proposal appears to be stalled.What Do Other Engines Do?
Godot's semantics are the same as the glTF standard. Godot doesn't offer any conversion options.
Unreal's default semantics are "+X forward, +Z up, left handed", except meshes are typically "+Y forward, +Z up". Their glTF importer converts nodes and meshes to Unreal's mesh semantics - this is done by swapping the Y and Z axes, which implicitly flips the X for handedness. So Unreal's approach is actually closer to the current main approach of node + mesh conversion, versus this PR's scene + mesh conversion. The Unreal importer also supports a custom scene/mesh rotation and translation that's applied after normal conversion. There's no option to disable conversion.