Parallelize backups and restore file operations#4023
Parallelize backups and restore file operations#4023elangelo wants to merge 8 commits intoapache:mainfrom
Conversation
…to 1 thread, allow overriding with system properties
epugh
left a comment
There was a problem hiding this comment.
I don't have the multi thread chops to approve this, but reaidn through it looks good. I wanted a change to the variable name. Do we need any new tests for this capablity, or do the existing ones cover it well enough?
| * SOLR_BACKUP_MAX_PARALLEL_UPLOADS}. | ||
| */ | ||
| private static final int DEFAULT_MAX_PARALLEL_UPLOADS = | ||
| EnvUtils.getPropertyAsInteger("solr.backup.maxParallelUploads", 1); |
There was a problem hiding this comment.
The pattern we are using now is dot cased, so solr.backup.maxparalleluploads, or maybe if we had mulitple properties solr.backup.parraleluploads.max....
There was a problem hiding this comment.
Good use of EnvUtils, we need them everywhere.
| Backup and restore operations can transfer multiple index files in parallel to improve throughput, especially when using cloud storage repositories like S3 or GCS where latency is higher. | ||
| The parallelism is controlled via system properties or environment variables: | ||
|
|
||
| `solr.backup.maxParallelUploads`:: |
There was a problem hiding this comment.
solr.backup.maxparalleluploads ?
I think the current tests actually cover everything already. Mind that I did change the gcsrepository and s3repository tests to have some parallelism. Unfortunately I was limited to only 2 threads as with more I got an OutOfMemoryException. But I think it still covers what needs covering. |
epugh
left a comment
There was a problem hiding this comment.
LGTM. I'd love another committer who is more comfortable with this code base and especially the multithreaded nature of it to review as well.
… would be such a bottleneck
…was referred to by the non-canonical name `ExecutorUtil.MDCAwareThreadPoolExecutor.CallerRunsPolicy`
Description
This PR ensures multiple threads are used to create backups and to restore backups. This ensures a considerate speedup when using cloud storage such as S3.
For comparison a backup to s3 of 1.8TiB takes roughly 16 minutes with this code. a 340GiB collection on the old code takes roughly 50 minutes.
Restoring the same collection took 7 minutes instead of 1 hour and 20 minutes (on a 6 node cluster)
Solution
As the previous implementation already had a loop over all files that needed to be backed up to the backup repository I simply wrapped that in a ThreadPool Executor
Tests
I have ran this code locally on a solrcloud cluster
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.