-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Several functions on a DictStore are written in a way where they have O(n^2) time complexity, I believe. In any case, a DictStore-backed ZGroup with a large number of files can become quite slow – more than one would expect if the runtime would just scale linearly with the number of files. I've seen this happening so far with consolidate_metadata, zopen, and (repeated) zgroup/zcreate.
From what I can tell, zgroup and zcreate won't be super easy to fix without changing how DictStore itself works. I haven't looked more deeply into zopen yet. But consolidate_metadata's complexity can be improved to O(n) by defining this method:
function consolidate_metadata(s::DictStore, d, prefix)
# Make the prefix have a trailing slash and pre-escape any `\Q` and `\E`
_prefix = isempty(prefix) ? "" : replace(rstrip(prefix, '/'),
"\\E" => "\\\\E", "\\Q" => "\\\\Q") * '/'
# Matches any `.zattrs`, `.zarray`, or `.zgroup` below `prefix`
regex = Regex("^\\Q$_prefix\\E(?:.*/)?(?:\\.zattrs|\\.zarray|\\.zgroup)\$")
for (k, v) in s.a
occursin(regex, k) || continue
d[k] = JSON.parse(String(copy(v)); dicttype = Dict{String, Any})
end
return d
endMetadata
Metadata
Assignees
Labels
No labels