Fix: FlyteDirectory fails for multiple files with the same name#3325
Fix: FlyteDirectory fails for multiple files with the same name#3325mys007 wants to merge 4 commits intoflyteorg:masterfrom
Conversation
dcbfe6a to
dadc642
Compare
|
Bito Automatic Review Failed - Technical Failure |
b824f2d to
8241a82
Compare
|
Bito Automatic Review Failed - Technical Failure |
Signed-off-by: Martin Simonovsky <martin.simonovsky@aleph-alpha-ip.ai>
Signed-off-by: Martin Simonovsky <martin.simonovsky@aleph-alpha-ip.ai>
Signed-off-by: Martin Simonovsky <martin.simonovsky@aleph-alpha-ip.ai>
Signed-off-by: Martin Simonovsky <martin.simonovsky@aleph-alpha-ip.ai>
bde858a to
3d4e564
Compare
|
Bito Automatic Review Failed - Technical Failure |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3325 +/- ##
==========================================
- Coverage 85.26% 75.76% -9.50%
==========================================
Files 386 215 -171
Lines 30276 22504 -7772
Branches 2969 2954 -15
==========================================
- Hits 25814 17051 -8763
- Misses 3615 4583 +968
- Partials 847 870 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'm not sure why the single test of |
Tracking issue
Why are the changes needed?
Persisting
FlyteDirectorydoesn't work correctly when there are multiple files with the same name in the directory.Let's have two files:
And a trivial workflow:
Running with
pyflyte run ... --fd /home/user/my_dirresults in onlyfile.txtbeing uploaded in the flyte S3 bucket. The filesubdir/file.txtis skipped. This is of course not as expected.In addition, note that if the files have a different content, this raises an exception in
get_upload_signed_url!Our current workaround is to pack a directory into an archive and use
FlyteFile.What changes were proposed in this pull request?
This bugfix is based on the observation that
get_upload_signed_urlis only provided file basename and file content hash to determine the URL. This leads to the both files ending up in the same location (the directory is flatten). To fix that, I add the path relative to the local root to the filename. This seems to work and both files are uploaded.Disclaimer: I have no idea if there are any side effects of this fix. It can be that some other functionality gets broken. Or there might be a better fix. I would appreciate a review from the maintainers:).
How was this patch tested?
Empirical test: For the minimal example above this works, the call to
super()._putinFlyteFS._put()returns two objects:['s3://flyte/flytesnacks/development/QLGCX23Y7HX3FI2LIEGVF5OW7Q======/subdir/file.txt', 's3://flyte/flytesnacks/development/QLGCX23Y7HX3FI2LIEGVF5OW7Q======/file.txt']. This works both when the file content is identical and when it's different.Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This pull request fixes a bug in the FlyteDirectory persistence mechanism that prevented the upload of multiple files with the same name and content. By including the relative path of files in the upload process, it enhances the Flyte framework's ability to manage directories with duplicate files effectively.