Skip to content

Conversation

@coderfender
Copy link
Contributor

@coderfender coderfender commented Jan 31, 2026

Which issue does this PR close?

Closes: #3209

I added new environment variable ENABLE_COMET_WRITER which enables native comet writer and runs spark tests

3 kinds of tests have been ignored:

  1. We have 11 tests in SQL suite which failed / exposed corner cases with Comet writers ( Insert Overwrite / CTAS statements)
  2. SQL tests for the charvarcha.sql files
  3. Parquet metadata encoding tests (Spark writes metadata such as spark version while comet doesnt)

Rationale for this change

Now that we have a native comet writer, this PR enables spark tests while ignoring the ones which fail

What changes are included in this PR?

Diff file changes to enable comet native writer on spark tests

How are these changes tested?

@coderfender coderfender force-pushed the enable_spark_tests_parquet_writer branch from b5cea4f to 951b394 Compare January 31, 2026 01:35
cd apache-spark
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ${{ matrix.config.scan-env }} ENABLE_COMET_LOG_FALLBACK_REASONS=${{ github.event.inputs.collect-fallback-logs || 'false' }} \
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ENABLE_COMET_WRITER=true ${{ matrix.config.scan-env }} ENABLE_COMET_LOG_FALLBACK_REASONS=${{ github.event.inputs.collect-fallback-logs || 'false' }} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to enable

--conf spark.comet.operator.DataWritingCommandExec.allowIncompatible=true

Copy link
Contributor Author

@coderfender coderfender Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

@coderfender coderfender Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently dont have the config available in spark tests through ENABLE_COMET_WRITER=true . Perhaps you meant enabling them in the benchmark suite TPCH / TPCDS as well ?

@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.99%. Comparing base (f09f8af) to head (6c6a142).
⚠️ Report is 918 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3351      +/-   ##
============================================
+ Coverage     56.12%   59.99%   +3.86%     
- Complexity      976     1464     +488     
============================================
  Files           119      175      +56     
  Lines         11743    16165    +4422     
  Branches       2251     2681     +430     
============================================
+ Hits           6591     9698    +3107     
- Misses         4012     5114    +1102     
- Partials       1140     1353     +213     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderfender
Copy link
Contributor Author

coderfender commented Jan 31, 2026

@comphead , We have bunch of failures largely due to hitting edge cases with comet writer (CTAS , Insert Overwrite which is likely a bug) . I am thinking of two possible options here

  1. Keep the changes minimal and ignore the failing I/O tests for now
  2. Create a separate comet writer suite which ignores all the tests relevant to Comet native writer

@coderfender
Copy link
Contributor Author

@comphead please kickoff CI whenever you get a chance

@coderfender
Copy link
Contributor Author

My diff files for Spark versions 3.4 and 3.5 missed disabling comet writer for CTAS statements (while it was set for Spark 4) causing above test failures . However, there are only 2 failed tests down from 13

@coderfender coderfender changed the title feat : enable_spark_tests_comet_native_writer feat: enable_spark_tests_comet_native_writer Feb 1, 2026
@coderfender
Copy link
Contributor Author

Seems like there were issues applying diff file for Spark 4

@coderfender
Copy link
Contributor Author

@comphead all the tests passed .Please take a look whenever you get a chance

@coderfender
Copy link
Contributor Author

All tests passed and the following tests are ignored :

  Common across all Spark versions (3.4, 3.5, 4.0) :

  1. empty file should be skipped while write to file
  2. INSERT INTO TABLE - complex type but different names
  3. Insert overwrite table command should output correct schema: basic
  4. parquet timestamp conversion
  5. SPARK-29174 Support LOCAL in INSERT OVERWRITE DIRECTORY to data source
  6. SPARK-33901: ctas should should not change table's schema
  7. SPARK-37160: CREATE TABLE AS SELECT with CHAR_AS_VARCHAR
  8. SPARK-38336 INSERT INTO statements with tables with default columns: positive tests
  9. SPARK-38811 INSERT INTO on columns added with ALTER TABLE ADD COLUMNS: Positive tests
  10. SPARK-43071: INSERT INTO from queries whose final operators are not projections
  11. write path implements onTaskCommit API correctly
  12. Write Spark version into Parquet metadata

Additional tests which had to be ignored for spark 4 : 

  1. ctas with union
  2. SPARK-48817: test multi inserts

  Total: 12 tests (Spark 3.4/3.5) | 14 tests (Spark 4.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run Spark tests with Comet native writer

3 participants