Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 1, 2025

  • Explore and understand the codebase
  • Add a new notification message for consecutive failed builds (MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES) in readthedocs/projects/notifications.py
  • Create a signal receiver to check for consecutive failed builds after build completion in readthedocs/builds/signals_receivers.py
  • Add a helper method to count consecutive failed builds for the default version
  • When 50+ consecutive failures occur: attach notification to Project and disable builds (skip=True)
  • Add documentation explaining this behavior in docs/user/builds.rst
  • Add tests for the new functionality in readthedocs/builds/tests/test_signals_receivers.py
  • Run code review and fix any issues
  • Run CodeQL security check - no vulnerabilities found
  • Address review feedback:
    • Move "Automatic disabling of builds" section before "Build resources"
    • Simplify documentation text
    • Move RTD_BUILDS_MAX_CONSECUTIVE_FAILURES setting to base.py
    • Update docstring to reference the setting name
    • Keep only the main test for disabling at threshold
    • Convert signal handler to trigger a Celery task (builders don't have DB access)
    • Add comment explaining why project_slug/version_slug check is needed
    • Revert task logic to use groupby solution (the simplified version didn't work correctly)
Original prompt

This section details on the original issue you should resolve

<issue_title>Build: disable webhooks after N failed builds in a row</issue_title>
<issue_description>Analyzing the data we have, I've found that we have a bunch of projects that have been triggering builds where all of them have failed. During months. This consumes resources in our side, but also degrade the UX for valid projects since they have to wait for a build that we already know it's gonna fail, before their project's build is taken.

So, to reduce projects in this scenario, we are thinking about "auto-disable webhooks on projects that have N failed builds in a row" and communicate their maintainers. To re-enable the webhook, they would have to go to the project's admin page and re-activate/re-configure/re-add the webhook.

This is the metabase question for future reference: https://ethicalads.metabaseapp.com/question/309-project-with-all-failed-builds-in-the-last-3-months. It shows the amount of builds and minutes consumed per project.</issue_description>

<agent_instructions>We want to attach a notification to the Project when the active default version fails 50+ times to build. Besides, we want to disable the builds for this project and tell the user to contact support if it was a mistake.

Also, adds documentation about this behavior.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@agjohnson Also, it would be great to have a maintainers site notification specific to this that goes out, so we can re-use this pattern when soft-disabling a project. A hard disable (setting `Project.skip = True`) can be reserved for abusive/spam projects, but that "notification" pattern is quite broken for normal users.

Or perhaps I'm describing two potential notifications -- a specific notification and a generic notification?

We need to probably set two different thresholds here:

  1. N consecutive builds on branch/tags (lower, maybe 25) and disable the webhook completely
  2. N consecutive builds on PR builds (upper, maybe 50) and only disable PR building

I'm trying to say that "builds failing on PRs are more acceptable than failing on branch/tags and should be handled differently"</comment_new>
<comment_new>@agjohnson
Yeah, great point on PR builds. Maybe we are strictly only concerned with failed active versions?

I agree we can rethink notifications in a larger PR. If we add something here it will be using the new notification pattern anyways. Some of the old patterns can eventually be culled, and just adding a new mechanism for disabling projects here would be enough to start that process.</comment_new>
<comment_new>@humitos
@agjohnson

Maybe we are strictly only concerned with failed active versions?

It makes sense to me, yeah.</comment_new>
<comment_new>@humitos
I'd like to think about prioritizing this feature if possible because this will help us a lot to know "how many active/valid non-spam projects do we have in our platform" in an easier way. This data will be useful when migrating/deprecating things as we are doing with "building without a config file" since we will know "how many of the active/valid projects have already migrated".

Many non-spam projects will not migrate to the new configuration file just because they don't need it or have to. Mainly projects that are archived or where created just for testing/development/educational purposes, for example. I assume we have a lot of them.

The work required that I see here is:

  • check for N failed builds on the default version and delete/disable the webhook integrated with that project
  • send an onsite and email notification to all the maintainers explaining the situation
  • write some documentation explaining this behavior (probably in https://docs.readthedocs.io/en/stable/guides/connecting-git-account.html)</comment_new>
    <comment_new>@agjohnson
    Because taking automated actions on projects has been hard to get right -- spam banning, build retry -- maybe we should start fairly conservatively here. I think I'm comfortable saying that any project who has a default version that has 25 consecutive failed builds is probably not monitoring their project.

We also don't have to take any automated action yet, we could start with just a notification or drip campaign.

At least disabling the webho...


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@read-the-docs-community
Copy link

read-the-docs-community bot commented Dec 1, 2025

Documentation build overview

📚 docs | 🛠️ Build #30553397 | 📁 Comparing 1dc405c against latest (fd3bcfd)


🔍 Preview build

Show files changed (1 files in total): 📝 1 modified | ➕ 0 added | ➖ 0 deleted
File Status
builds.html 📝 modified

Copilot AI changed the title [WIP] Add feature to disable webhooks after N failed builds Build: disable webhooks after 50 consecutive failed builds on default version Dec 1, 2025
Copilot AI requested a review from humitos December 1, 2025 11:48
Copilot finished work on behalf of humitos December 1, 2025 11:48
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with some comments. Once we address this feedback we should be ready to go.

@humitos
Copy link
Member

humitos commented Dec 1, 2025

@copilot can you work on the feedback I gave you?

Copy link
Contributor Author

Copilot AI commented Dec 1, 2025

@copilot can you work on the feedback I gave you?

All feedback addressed in commit 1b8fdcb:

  • Moved "Automatic disabling of builds" section before "Build resources"
  • Simplified documentation text
  • Moved RTD_BUILDS_MAX_CONSECUTIVE_FAILURES setting to base.py
  • Updated docstring to reference the setting name
  • Simplified tests to keep only the main threshold test

Copilot AI requested a review from humitos December 1, 2025 14:21
Copilot finished work on behalf of humitos December 1, 2025 14:21
@humitos humitos changed the title Build: disable webhooks after 50 consecutive failed builds on default version Build: disable builds after 50 consecutive failed builds on default version Dec 1, 2025
@humitos humitos marked this pull request as ready for review December 1, 2025 15:05
@humitos humitos requested review from a team as code owners December 1, 2025 15:05
Copilot AI requested a review from ericholscher December 1, 2025 16:05
Copilot finished work on behalf of ericholscher December 1, 2025 16:05
@humitos
Copy link
Member

humitos commented Dec 2, 2025

This seems like a solid improvement. I wonder if 50 is too many -- I think something like 25 is probably a more reasonable default, but we probably also want a way to have people re-enable the builds manually without emailing us? I think if they are re-engaging with the project they should probably be able to just re-enable them manually. We probably want another way to specify skip that we apply and one that the automations apply as well?

Yes to all of this 😄

I'm using 50 on purpose because they don't currently have a way to undo this mark, so I didn't want to catch projects by mistake. We can reduce to 25 or less once we have a way for users to re-enable their project.

However, I wanted to have a starting point to test this feature and grow from there.

Copilot finished work on behalf of humitos December 2, 2025 09:01
@humitos humitos requested a review from stsewd December 2, 2025 09:25
@humitos
Copy link
Member

humitos commented Dec 2, 2025

I added a field in the Project object to use instead of skip. The user can now re-enable the builds for this project from the project's details form.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and is a nice feature. Will hopefully save our servers.

@humitos humitos changed the title Build: disable builds after 50 consecutive failed builds on default version Build: disable builds after 25 consecutive failed builds on default version Dec 2, 2025
Comment on lines +38 to +39
@receiver(post_save, sender=Project)
def project_n_consecutive_failed_builds(instance, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can cancel the notification just once after the value has been changed from the form. Otherwise, we will run the cancel query every time an active project is saved (probably isn't bad, but still...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but then we need to cancel the notification if we change this value from the Django admin also, and maybe other places. I prefer to follow the same pattern we are following with .skip as it's proven it works fine.

Comment on lines +697 to +699
# Only check for the default version
if version_slug != project.get_default_version():
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wonder if there is a way to know that we are dealing with the default version, so we don't call this task for every version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is tricky because it requires access to the database, so we can't perform this from the build task.

featured = models.BooleanField(_("Featured"), default=False)

skip = models.BooleanField(_("Skip (disable) building this project"), default=False)
n_consecutive_failed_builds = models.BooleanField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make this a more generic name so it can be used for disabling projects based on other rules (builds_disabled or builds_enabled)? The notification would show the reason to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that was my initial thought by using Project.skip but we wanted to differentiate something we apply from automated disabled.

I think it's fine for now to have a specific field for this use case. We can refactor later if we consider that a shared field between multiple reason is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we can also do something like skip with choices of the various reasons, but I think that's a nice to have we can do later if we add more.

@read-the-docs-community
Copy link

read-the-docs-community bot commented Dec 3, 2025

Documentation build overview

📚 dev | 🛠️ Build #30553396 | 📁 Comparing 1dc405c against latest (fd3bcfd)


🔍 Preview build

Show files changed (25 files in total): 📝 1 modified | ➕ 24 added | ➖ 0 deleted
File Status
http-routingtable.html ➕ added
index.html 📝 modified
design/apiv3.html ➕ added
design/better-doc-urls-handling.html ➕ added
design/build-images.html ➕ added
design/embed-api.html ➕ added
design/file-tree-diff.html ➕ added
design/flyout-redesign.html ➕ added
design/future-builder.html ➕ added
design/in-doc-search-ui.html ➕ added
design/index.html ➕ added
design/new-notifications-system.html ➕ added
design/new-search-api.html ➕ added
design/new-sphinx-guides.html ➕ added
design/organizations.html ➕ added
design/pr-builder.html ➕ added
design/privacy-levels.html ➕ added
design/redirects.html ➕ added
design/refactor-remote-repository.html ➕ added
design/secure-api-access-from-builders.html ➕ added
design/sphinx-jquery.html ➕ added
design/system-packages.html ➕ added
design/telemetry.html ➕ added
design/theme-context.html ➕ added
design/yaml-file.html ➕ added

@humitos humitos enabled auto-merge (squash) December 3, 2025 15:51
@github-project-automation github-project-automation bot moved this to Planned in 📍Roadmap Dec 3, 2025
@humitos humitos moved this from Planned to Needs review in 📍Roadmap Dec 3, 2025
@humitos humitos removed this from 📍Roadmap Dec 3, 2025
Comment on lines +149 to +150
# Create failures at the threshold
self._create_builds(settings.RTD_BUILDS_MAX_CONSECUTIVE_FAILURES + 1, success=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have at least one more test case, when the project has at least one successful build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build: disable webhooks after N failed builds in a row

4 participants