Skip to content

Comments

Fix background thread initialization race#68

Merged
guangli-dai merged 1 commit intofacebook:devfrom
puzpuzpuz:background-thread-race
Feb 21, 2026
Merged

Fix background thread initialization race#68
guangli-dai merged 1 commit intofacebook:devfrom
puzpuzpuz:background-thread-race

Conversation

@puzpuzpuz
Copy link

The race condition happens when per-cpu arenas and background threads features are enabled:

  1. Thread A calls malloc() first -> starts malloc_init_hard()
  2. While Thread A is still in malloc_init_hard(), Threads B, C, D also call malloc()
  3. These threads trigger arena creation (with percpu_arena, each CPU gets its own arena)
  4. arena_init() -> arena_new_create_background_thread() -> background_thread_create(arena_ind=12, etc.)
  5. Thread A hasn't reached line 2232 (background_thread_create(tsd, 0)) yet
  6. So a non-zero arena can mark thread slot 0 as "started" before arena 0 does

The initialization has locks, but arena creation for different arenas can proceed in parallel as soon as malloc_init_state = initialized and init_lock is released, and the background_thread_create(tsd, 0) call happens late in malloc_init_hard() (after arenas are already being created by other threads).

This bug won't be noticeable for applications that have single-threaded initialization path, but in case of JVM it's easily reproducible. When background thread initialization silently fails due to the race, RSS grows infinitely until the process gets OOM killed.

@meta-cla
Copy link

meta-cla bot commented Jan 10, 2026

Hi @puzpuzpuz!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla
Copy link

meta-cla bot commented Jan 10, 2026

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the cla signed label Jan 10, 2026
@puzpuzpuz puzpuzpuz force-pushed the background-thread-race branch from e52a78e to 05ebbdd Compare January 12, 2026 11:05
@puzpuzpuz
Copy link
Author

The build seems to be timing out on ARM. Not sure if that has anything to do with the patch.

@lexprfuncall
Copy link

This bug won't be noticeable for applications that have single-threaded initialization path, but in case of JVM it's easily reproducible. When background thread initialization silently fails due to the race, RSS grows infinitely until the process gets OOM killed.

Thank you for the patch! These kinds of initialization bugs can be very tricky to root cause. I saw a similar one recently where the page size at compile time did not match the page size at runtime caused a multi-threaded Python process to deadlock when it called mallctl just to see if jemalloc was in use.

I will take a closer look a the fix alongside the team next week. If possible, I would like to add a test for this. (Might require a follow-up commit.) As is, multi-threaded initialization does not have as much coverage as it deserves.

@lexprfuncall lexprfuncall self-assigned this Jan 16, 2026
@puzpuzpuz puzpuzpuz force-pushed the background-thread-race branch 2 times, most recently from e52a78e to 3c2d2da Compare January 20, 2026 10:24
@puzpuzpuz
Copy link
Author

If possible, I would like to add a test for this. (Might require a follow-up commit.) As is, multi-threaded initialization does not have as much coverage as it deserves.

I've added some multi-threaded tests in 3c2d2da. PTAL

@lexprfuncall
Copy link

lexprfuncall commented Feb 18, 2026

If possible, I would like to add a test for this. (Might require a follow-up commit.) As is, multi-threaded initialization does not have as much coverage as it deserves.

I've added some multi-threaded tests in 3c2d2da. PTAL

Thank you. The Win32 and macOS tests are failing. I think the fix might be as simple as adding wrapping your new tests in a #ifdef JEMALLOC_HAVE_BACKGROUND_THREAD. Could you give that a try? I think we can land your fix once the CI turns green.

@puzpuzpuz
Copy link
Author

Thank you. The Win32 and macOS tests are failing. I think the fix might be as simple as adding wrapping your new tests in a #ifdef JEMALLOC_HAVE_BACKGROUND_THREAD. Could you give that a try? I think we can land your fix once the CI turns green.

@lexprfuncall done in 56087ee

@guangli-dai
Copy link

Thank you for the contribution! Would you mind squashing commits 3c2d2da and 56087ee into one? Because sometimes people bisect and separating them will lead to unexpected test failures if the first commit is included but not the second one.

@puzpuzpuz puzpuzpuz force-pushed the background-thread-race branch from 56087ee to 13119bd Compare February 20, 2026 05:56
@puzpuzpuz
Copy link
Author

puzpuzpuz commented Feb 20, 2026

Thank you for the contribution! Would you mind squashing commits 3c2d2da and 56087ee into one? Because sometimes people bisect and separating them will lead to unexpected test failures if the first commit is included but not the second one.

@guangli-dai done that. PTAL

@puzpuzpuz puzpuzpuz force-pushed the background-thread-race branch from 13119bd to c05272c Compare February 20, 2026 05:58
@guangli-dai guangli-dai merged commit bee30a9 into facebook:dev Feb 21, 2026
155 checks passed
@puzpuzpuz
Copy link
Author

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants