[FLINK-39150][runtime] Fix join operator crashes jobs when using custom types or custom type serializers#27697
[FLINK-39150][runtime] Fix join operator crashes jobs when using custom types or custom type serializers#27697noorall wants to merge 3 commits intoapache:masterfrom
Conversation
…om types or custom type serializers
| } catch (ClassNotFoundException | IOException e) { | ||
| return false; | ||
| throw new RuntimeException( | ||
| "Failed to deserialize AdaptiveJoin instance. " |
There was a problem hiding this comment.
When would the flink-table-planner-loader.jar not be in the class path.
Some thoughts:
- I would think this situation issue is more about not being able to loa a class that the deserialization requires. Have I misunderstood?
- the class not found exception seems to occur when a Class of a serialized object cannot be found in the readObject on the stream. Are expecting this to be in the jar.
- do all the io exceptions also imply that the flink-table-planner-loader.jar not be in the class path.
There was a problem hiding this comment.
Typically this situation should not happen. The existing workflow assumes that the current thread context ClassLoader or the table planner ClassLoader can resolve all required classes (which is a wrong assumption). Therefore, once this exception occurs, the most common cause is a missing flink-table-planner-loader.jar. The message is mainly meant to be user-friendly and to aid troubleshooting, and it does not imply this is the only possible root cause.
- Your understanding is correct.
- The missing classes are not necessarily all contained in the planner-related jars; they can also come from the user jar.
- As explained above, a missing
flink-table-planner-loader.jaris the most common scenario, so this log message should be treated as a suggestive troubleshooting hint rather than a definitive diagnosis.
|
Looks like this test I provided verifies only MiniCluster behavior, where user classes are mixed with Flink classes. Even after applying your patch Flink's JobManager still crashes after throwing an exception after job JAR is uploaded: |
…ng custom types or custom type serializers
I’ve completed the fix and verified it locally. Could you please try again on your side? |
…hen using custom types or custom type serializers
|
Thanks, test job passed 👍 Would it be possible to add a proper regression test for this, i.e. one that actually tests with a user JAR? |
What is the purpose of the change
The
PlannerComponentClassLoaderuses a strict whitelist-based routing strategy and does not automatically fall back to the parentClassLoaderfor packages that are not whitelisted.org.apache.flink.*): follow the configured lookup order (e.g., parent-first/component-first) and can fall back accordingly.table-planner-*.jar(component), deserialization fails withClassNotFoundException.Required fix:
ClassLoader. This ensures custom user types can be resolved during the AdaptiveJoin deserialization.UserClassLoaderinstead of the current thread contextClassLoader.Brief change log
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation