Skip to content

Polynomial runtime with DictStore #227

@publik-void

Description

@publik-void

Several functions on a DictStore are written in a way where they have O(n^2) time complexity, I believe. In any case, a DictStore-backed ZGroup with a large number of files can become quite slow – more than one would expect if the runtime would just scale linearly with the number of files. I've seen this happening so far with consolidate_metadata, zopen, and (repeated) zgroup/zcreate.

From what I can tell, zgroup and zcreate won't be super easy to fix without changing how DictStore itself works. I haven't looked more deeply into zopen yet. But consolidate_metadata's complexity can be improved to O(n) by defining this method:

function consolidate_metadata(s::DictStore, d, prefix)
  # Make the prefix have a trailing slash and pre-escape any `\Q` and `\E`
  _prefix = isempty(prefix) ? "" : replace(rstrip(prefix, '/'),
    "\\E" => "\\\\E", "\\Q" => "\\\\Q") * '/'

  # Matches any `.zattrs`, `.zarray`, or `.zgroup` below `prefix`
  regex = Regex("^\\Q$_prefix\\E(?:.*/)?(?:\\.zattrs|\\.zarray|\\.zgroup)\$")

  for (k, v) in s.a
    occursin(regex, k) || continue
    d[k] = JSON.parse(String(copy(v)); dicttype = Dict{String, Any})
  end

  return d
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions