Skip to content

🐛 Bug Report: Intermittent Docker Initialization Failure on Self-Hosted Runner #4107

@dylan-class101

Description

@dylan-class101

Describe the bug

We are using a self-hosted GitHub Actions runner with test containers that depend on a Redis service. Recently, we started encountering intermittent Docker Initialize failed errors during CI runs. The error seems related to the Docker daemon not being detected or initialized properly.

Hypothesis
It’s possible that the Docker daemon on the self-hosted runner is not always loaded correctly or fails to initialize properly under certain conditions.

What to Investigate / Next Steps
1. Check Docker daemon logs (journalctl -u docker or equivalent) around the time of the failure.
2. Validate that the runner service properly starts Docker before workflows begin.
3. Confirm there’s no resource contention (e.g., concurrent jobs exhausting Docker socket or system resources).
4. Verify network connectivity or registry rate limiting events that might interrupt Docker startup.
5. Consider adding a pre-check step in the workflow to confirm Docker daemon readiness before running tests.

Environment
• GitHub Actions Self-Hosted Runner
• CI uses Testcontainers (Redis)
• Container registry: ECR Public (previously Docker Hub)
• Issue frequency: Intermittent / Non-deterministic

Would you like me to make it more formal and formatted like an actual GitHub issue template (with Markdown checkboxes and sections like “Steps to Reproduce” and “Logs”)?
To Reproduce
Steps to reproduce the behavior:

  1. assign action runner on k8s
  2. initialize docker daemon

Expected behavior

Docker daemon should initialize consistently and allow containers to start reliably across all runs.

Runner Version and Platform

Version of your runner?
2.328.0

OS of the machine running the runner? OSX/Windows/Linux/...
amazon linux

What's not working?

Please include error messages and screenshots.

Occasional Docker Initialize failed errors, seemingly without code or configuration changes.

java.lang.ExceptionInInitializerError
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1160)
	at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Constructor.java:549)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at java.base/java.util.Optional.orElseGet(Optional.java:364)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
Caused by: java.lang.IllegalStateException: Could not find a valid Docker environment. Please see logs and check configuration
	at org.testcontainers.dockerclient.DockerClientProviderStrategy.lambda$getFirstValidStrategy$7(DockerClientProviderStrategy.java:274)
	at java.base/java.util.Optional.orElseThrow(Optional.java:403)
	at org.testcontainers.dockerclient.DockerClientProviderStrategy.getFirstValidStrategy(DockerClientProviderStrategy.java:265)
	at org.testcontainers.DockerClientFactory.getOrInitializeStrategy(DockerClientFactory.java:154)
	at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:196)
	at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:108)
	at com.github.dockerjava.api.DockerClientDelegate.authConfig(DockerClientDelegate.java:109)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:321)
	at net.class101.media.config.TestRedisServerConfig.<clinit>(TestRedisServerConfig.java:21)
	... 7 more

Job Log Output

If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.

It just Test Failure because docker not be initialized

Runner and Worker's Diagnostic Logs

If applicable, add relevant diagnostic log information. Logs are located in the runner's _diag folder. The runner logs are prefixed with Runner_ and the worker logs are prefixed with Worker_. Each job run correlates to a worker log. All sensitive information should already be masked out, but please double-check before pasting here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions