[tmva][sofie] Add ONNX Gelu operator support#20876
[tmva][sofie] Add ONNX Gelu operator support#20876AdityaDRathore wants to merge 2 commits intoroot-project:masterfrom
Conversation
Implement the Gelu operator for the SOFIE inference engine with ONNX Opset 20 specifications. The implementation uses the exact Gelu formula: y = 0.5 * x * (1+ erf( x / sqrt(2))) Uses compile-time hexfloat constants for 1/sqrt(2) and 0.5 to ensure bit-exact reproducibility. Updates RModelParser_ONNX to register the Gelu operator node. It explicitly rejects the approximate attribute (tanh approximation) as it is not yet supported. Verified against SciPy/ONNX using a custom opset-20 graph constructed via onnx.helper (bypassing PyTorch export decomposition). Achieved bit-exact results (max error 0.0) on the test interval [-3.0, 3.0].
|
@lmoneta @sanjibansg I saw there's an older PR #20788 for GELU, so I wanted to point out a few differences in my approach for your review:
|
|
Hi @AdityaDRathore — thanks for the detailed comparison. I’ve pushed updates to my PR (#20788) incorporating your suggestions:
Thanks again — this made the implementation more robust. |
Validates generation of bit-exact hexfloat constants and shape inference. Ensures no runtime division is performed in the generated kernel. Golden data validation from scipy.special.erf Type/shape inference and error handling tests
|
Hi @Shlok-Saxena , glad the parser suggestions were helpful for robustness. However, my primary concern regarding numeric stability remains addressed only in this PR (#20876). For HEP physics validation, we typically require strict bit-exactness across platforms. Relying on x / std::sqrt(2) introduces runtime division and platform-dependent rounding differences depending on the compiler's FPU flags. This PR uses compile-time hexfloats to lock in the precision. I have added a GTest suite (TestSofieGELU.cxx) that not only checks the numeric output against SciPy Golden Data but also inspects the generated C++ code to enforce the presence of these hexfloat constants. |
This Pull request:
This PR implements the GELU operator (Opset 20) for TMVA/SOFIE.
Changes or fixes:
This PR implements the
Geluoperator for the SOFIE inference engine, targeting ONNX Opset 20 specification. This addition extends SOFIE's support to modern transformer architectures which rely on the Gaussian Error Linear Unit.Technical Approach
0x1.6a09e667f3bcdp-1and0.5).RModelParser_ONNXregisters theGelunode and explicitly validates attributes to reject unsupported approximations (liketanh), ensuring no silent numerical mismatches.Key Features:
Bit-Exact Reproducibility: - Uses hexfloat constants ($1/\sqrt{2}$ ) to guarantee identical floating-point behavior across all IEEE-754 platforms (x86/ARM/PPC).
0x1.6a09e667f3bcdp-1forstd::sqrt(2)division which is susceptible to compiler-specific fast-math optimizations.Safety: - Parser explicitly rejects
approximate='tanh'(unsupported) to prevent silent model degradation.Validation (New):
TestSofieGELU.cxx(GTest).Validation
Validation was performed using a custom Opset-20 ONNX graph constructed via
onnx.helperto prevent the operator decomposition often produced bytorch.onnx.export.erfimplementation.0.0) against the ground truth on the test interval between[-3.0, 3.0].Checklist:
This PR fixes #
cc: @lmoneta @sanjibansg @omazapa