Skip to content

v2.3.0 hdr histograms, performance (and a very small breaking change)

Choose a tag to compare

@anton-povarov anton-povarov released this 03 May 11:48
· 137 commits to master since this release

This release is an effort to improve performance by introducing hdr histograms (forked from https://github.com/HdrHistogram/HdrHistogram_c) and optimising internal timer aggregation and histogram machinery.

Currently in production at Badoo we're seeing up to 5 million timers/sec (each with 10-20-30 tags) per instance.
And some heavily loaded reports (the very non-specific ones, that aggregate almost the entire stream) - are hitting 100% cpu mark. So this release aims to improve that.

Breaking change

  • added 'timers_skipped_by_bloom' field to 'active' report. Breaking, since it was added 'in the middle', after 'timers_aggregated' field.

Release highlights

  • histograms now use hdr_histogram-like machinery internally
    • percentiles (at the end of histogram interval) become more coarse might shift slightly
    • percentiles (at the start of histogram interval) become more precise
    • histograms will use slightly more memory on average (if your workload is anything like ours)
    • performance should improve in most cases
    • it's now possible and feasible to have histograms with 1 microsecond resolution (which is nice if you measure some short on-cpu functions for example) - they'll use more cpu (~2x for 1us vs 1ms histograms).
  • performance enhancements
    • 'request' reports are now significantly faster (and use less memory) both in aggregation and selects (converted them to be very similar to timer reports internally). use case: aggregating stats from nginx (response codes, etc.)
    • added bloom filters for individual timers in the packet (fast skip for timers that the report is definitely not interested in)
    • coordinator thread now uses considerably less resources (5M timers/sec are transcoded into internal format using ~1 cpu core).