Bugfix: Grouped store operations by partition key#7
Bugfix: Grouped store operations by partition key#7petero-dk wants to merge 4 commits intoCoreHelpers:masterfrom
Conversation
|
This is a wonderful extension but I'm worried about the performance with huge amount of data sets, meaning multiple 10K. Because I see a parameter which enables this operation, what do you think? |
|
I am not sure what you mean by "parameter that enables this operation" either the caller of the library should do this manually or the library should do this. Any batch operation that spans multiple partitions will fail with an error. I can see that the Linq groupby clause could be a pain point for performance for very large datasets, I could probably take a stab at changing that out for a manual grouping. Do you think the performance increase would be worth it compared to the readability of the code? |
|
You are right in the opinion either the caller or the library should do this. I would like to prefer to have two APIs one which is not doing this grouping automatically and which runs into an error but which is optimized for huge datasets and another one more as comfort function which is doing this stuff so the caller can decide. It could be also encoded in the storage operation flag or so. |
|
I see your point now, let me think for a second |
|
I will have an update on this early next week. I did a little testing and refactoring and found one error in my PR and then I got a factor 7 increase in speed on the HugeDemoTest with the latest code. I just need to clean it up |
|
Please note that I have NOT removed the timer code yet. I just wanted to let you see the progress. The testing has shown that the LINQ groupby clause has zero effect on performance. However the foreach loop that actually makes the connections to tablestorage can be parallelized and that gives a huge performance boost My test case with 20.000 rows (up from 2000) |
Batch operations will fail when they are on different partitions. When using custom filters this is more of an issue.
This bugfix will simply separate the batch operations into different batches based on the partition key.