build-node-image: add a skip tests flag #1268

jbtrystram · 2025-11-19T10:29:45Z

Allow skipping the node image tests. Allow unblocking the nodeimage pipeline when the QEMU artifact for the latest RHCOS build is not available.
Patch best viewed without whitespace change.

Allow skipping the node image tests. Allow unblocking the nodeimage pipeline when the QEMU artifact for the latest RHCOS build is not available. Patch best viewed without whitespace change.

dustymabe · 2025-11-20T17:47:27Z

the QEMU artifact for the latest RHCOS build is not available.

In what case does that happen? Did someone run with EARLY_ARCH_JOBS set?

aaradhak · 2025-12-03T21:14:36Z

@dustymabe Looks like we have encountered this issue in the build-node-image job run today - https://jenkins-rhcos--prod-pipeline.apps.int.prod-stable-spoke1-dc-iad2.itup.redhat.com/job/build-node-image/1235/

dustymabe · 2025-12-03T21:40:04Z

https://jenkins-rhcos--prod-pipeline.apps.int.prod-stable-spoke1-dc-iad2.itup.redhat.com/job/build/2573/parameters/ had EARLY_ARCH_JOBS by @sdodson so that's why
https://jenkins-rhcos--prod-pipeline.apps.int.prod-stable-spoke1-dc-iad2.itup.redhat.com/job/build-node-image/1235/ ended up failing.

Rather than skipping tests here we can just simply rerun the build-node-image job later. I don't see much value in skipping tests.

dustymabe · 2025-12-03T21:46:49Z

actually it looks like the real problem with https://jenkins-rhcos--prod-pipeline.apps.int.prod-stable-spoke1-dc-iad2.itup.redhat.com/job/build-node-image/1235/ is that x86_64 is trying to download different images than the other arches (again, because EARLY_ARCH_JOBS), but EARLY_ARCH_JOBS isn't really the entire problem here, it just exposes it.

I think the real problem is that we're not enforcing that we are running the test against the same RHCOS for all arches.

We probably should enforce that we download the same RHCOS qemu as the node image is based on.

i.e. we need to update

fedora-coreos-pipeline/jobs/build-node-image.Jenkinsfile

Lines 198 to 200 in 9cf0417

    
           cosa buildfetch \ 
        
               --arch=$arch --artifact qemu --url=s3://${s3_dir}/builds \ 
        
               --aws-config-file \${AWS_BUILD_UPLOAD_CONFIG} --find-build-for-arch

to replace --find-build-for-arch with --build=$BUILDID where we found the RHCOS buildid that was used to build the node image.

jbtrystram · 2025-12-04T06:16:10Z

We probably should enforce that we download the same RHCOS qemu as the node image is based on.

i.e. we need to update

fedora-coreos-pipeline/jobs/build-node-image.Jenkinsfile

Lines 198 to 200 in 9cf0417

cosa buildfetch \

--arch=$arch --artifact qemu --url=s3://${s3_dir}/builds \

--aws-config-file \${AWS_BUILD_UPLOAD_CONFIG} --find-build-for-arch

to replace --find-build-for-arch with --build=$BUILDID where we found the RHCOS buildid that was used to build the node image.

The issue @Roshan-R hit when working on this is that we upload incomplete builds for rhcos. So you'd have to parse the meta file first to find a build with all the arches.
We're rebasing to the container image for the tests so the build we boot with does not matter too much.

dustymabe · 2025-12-04T16:08:13Z

The issue @Roshan-R hit when working on this is that we upload incomplete builds for rhcos. So you'd have to parse the meta file first to find a build with all the arches.

If the build-node-image job is running it was (typically) triggered by a release job for the RHCOS base image the node image is being derived from. If we use the same RHCOS that we used to build the node image on top of (by using buildfetch --build=<build> then we know the build isn't incomplete (i.e. the release job wouldn't have run if it was incomplete).

We're rebasing to the container image for the tests so the build we boot with does not matter too much.

👍

sdodson · 2025-12-04T19:47:31Z

I don't have the understanding of the pipeline that anyone else here does, but it looks like a change triggered ART automation to kick off build-node-image independent of the rhel-9.6 build, build-arch, and release job completion. The build-node-image jobs that were subsequently triggered by the successful release job were just fine. Could the build, build-arch, and release jobs relevant to downstream build-node-image jobs just hold a lock that prevents starting new instances until complete?

Leveraging the early arch builds flag is highly valuable as it trims the overall pipeline duration by at least an hour.

…uild This ensures we don't somehow pick up a different base qemu image than what we were built on. It also eliminates some awkward race conditions where a newer in progress RHCOS build was causing node image tests to fail. xref: coreos#1268

dustymabe · 2025-12-04T20:50:39Z

I opened #1279

…uild This ensures we don't somehow pick up a different base qemu image than what we were built on. It also eliminates some awkward race conditions where a newer in progress RHCOS build was causing node image tests to fail. xref: #1268

jbtrystram · 2025-12-08T08:30:48Z

If the build-node-image job is running it was (typically) triggered by a release job for the RHCOS base image the node image is being derived from.

Not necessarily. ART triggers the build-node-image job multiple times a day.

If we use the same RHCOS that we used to build the node image on top of (by using buildfetch --build=<build> then we know the build isn't incomplete (i.e. the release job wouldn't have run if it was incomplete).

Yes, but back when this job was written, we were allowing incomplete builds to be released, which is why we had to use --find-build-for-arch

build-node-image: add a skip tests flag

1ef4f31

Allow skipping the node image tests. Allow unblocking the nodeimage pipeline when the QEMU artifact for the latest RHCOS build is not available. Patch best viewed without whitespace change.

dustymabe mentioned this pull request Dec 4, 2025

build-node-image: fetch qemu that matches RHCOS base for node image build #1279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build-node-image: add a skip tests flag #1268

build-node-image: add a skip tests flag #1268

Uh oh!

jbtrystram commented Nov 19, 2025

Uh oh!

dustymabe commented Nov 20, 2025

Uh oh!

aaradhak commented Dec 3, 2025 •

edited

Loading

Uh oh!

dustymabe commented Dec 3, 2025 •

edited

Loading

Uh oh!

dustymabe commented Dec 3, 2025

Uh oh!

jbtrystram commented Dec 4, 2025

Uh oh!

dustymabe commented Dec 4, 2025

Uh oh!

sdodson commented Dec 4, 2025

Uh oh!

dustymabe commented Dec 4, 2025

Uh oh!

jbtrystram commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

build-node-image: add a skip tests flag #1268

Are you sure you want to change the base?

build-node-image: add a skip tests flag #1268

Uh oh!

Conversation

jbtrystram commented Nov 19, 2025

Uh oh!

dustymabe commented Nov 20, 2025

Uh oh!

aaradhak commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dustymabe commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dustymabe commented Dec 3, 2025

Uh oh!

jbtrystram commented Dec 4, 2025

Uh oh!

dustymabe commented Dec 4, 2025

Uh oh!

sdodson commented Dec 4, 2025

Uh oh!

dustymabe commented Dec 4, 2025

Uh oh!

jbtrystram commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aaradhak commented Dec 3, 2025 •

edited

Loading

dustymabe commented Dec 3, 2025 •

edited

Loading