-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Build: disable builds after 25 consecutive failed builds on default version #12624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: humitos <[email protected]>
Co-authored-by: humitos <[email protected]>
…g itertools.groupby
humitos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with some comments. Once we address this feedback we should be ready to go.
|
@copilot can you work on the feedback I gave you? |
Co-authored-by: humitos <[email protected]>
All feedback addressed in commit 1b8fdcb:
|
Co-authored-by: humitos <[email protected]>
Yes to all of this 😄 I'm using 50 on purpose because they don't currently have a way to undo this mark, so I didn't want to catch projects by mistake. We can reduce to 25 or less once we have a way for users to re-enable their project. However, I wanted to have a starting point to test this feature and grow from there. |
Co-authored-by: humitos <[email protected]>
|
I added a field in the |
ericholscher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, and is a nice feature. Will hopefully save our servers.
| @receiver(post_save, sender=Project) | ||
| def project_n_consecutive_failed_builds(instance, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can cancel the notification just once after the value has been changed from the form. Otherwise, we will run the cancel query every time an active project is saved (probably isn't bad, but still...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but then we need to cancel the notification if we change this value from the Django admin also, and maybe other places. I prefer to follow the same pattern we are following with .skip as it's proven it works fine.
| # Only check for the default version | ||
| if version_slug != project.get_default_version(): | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wonder if there is a way to know that we are dealing with the default version, so we don't call this task for every version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is tricky because it requires access to the database, so we can't perform this from the build task.
| featured = models.BooleanField(_("Featured"), default=False) | ||
|
|
||
| skip = models.BooleanField(_("Skip (disable) building this project"), default=False) | ||
| n_consecutive_failed_builds = models.BooleanField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make this a more generic name so it can be used for disabling projects based on other rules (builds_disabled or builds_enabled)? The notification would show the reason to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that was my initial thought by using Project.skip but we wanted to differentiate something we apply from automated disabled.
I think it's fine for now to have a specific field for this use case. We can refactor later if we consider that a shared field between multiple reason is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we can also do something like skip with choices of the various reasons, but I think that's a nice to have we can do later if we add more.
| # Create failures at the threshold | ||
| self._create_builds(settings.RTD_BUILDS_MAX_CONSECUTIVE_FAILURES + 1, success=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have at least one more test case, when the project has at least one successful build.
MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES) inreadthedocs/projects/notifications.pyreadthedocs/builds/signals_receivers.pyskip=True)docs/user/builds.rstreadthedocs/builds/tests/test_signals_receivers.pyOriginal prompt
This section details on the original issue you should resolve
<issue_title>Build: disable webhooks after N failed builds in a row</issue_title>
<issue_description>Analyzing the data we have, I've found that we have a bunch of projects that have been triggering builds where all of them have failed. During months. This consumes resources in our side, but also degrade the UX for valid projects since they have to wait for a build that we already know it's gonna fail, before their project's build is taken.
So, to reduce projects in this scenario, we are thinking about "auto-disable webhooks on projects that have N failed builds in a row" and communicate their maintainers. To re-enable the webhook, they would have to go to the project's admin page and re-activate/re-configure/re-add the webhook.
This is the metabase question for future reference: https://ethicalads.metabaseapp.com/question/309-project-with-all-failed-builds-in-the-last-3-months. It shows the amount of builds and minutes consumed per project.</issue_description>
<agent_instructions>We want to attach a notification to the
Projectwhen the active default version fails 50+ times to build. Besides, we want to disable the builds for this project and tell the user to contact support if it was a mistake.Also, adds documentation about this behavior.</agent_instructions>
Comments on the Issue (you are @copilot in this section)
@agjohnson Also, it would be great to have a maintainers site notification specific to this that goes out, so we can re-use this pattern when soft-disabling a project. A hard disable (setting `Project.skip = True`) can be reserved for abusive/spam projects, but that "notification" pattern is quite broken for normal users.Or perhaps I'm describing two potential notifications -- a specific notification and a generic notification?
<comment_new>@humitos
Yeah, I'm fine with better notifications. Maybe we want to make that part of the work on Consolidate build detail view notification patterns #9279 and Improve build error/warning suggestions #3399
We need to probably set two different thresholds here:
I'm trying to say that "builds failing on PRs are more acceptable than failing on branch/tags and should be handled differently"</comment_new>
<comment_new>@agjohnson
Yeah, great point on PR builds. Maybe we are strictly only concerned with failed active versions?
I agree we can rethink notifications in a larger PR. If we add something here it will be using the new notification pattern anyways. Some of the old patterns can eventually be culled, and just adding a new mechanism for disabling projects here would be enough to start that process.</comment_new>
<comment_new>@humitos
@agjohnson
It makes sense to me, yeah.</comment_new>
<comment_new>@humitos
I'd like to think about prioritizing this feature if possible because this will help us a lot to know "how many active/valid non-spam projects do we have in our platform" in an easier way. This data will be useful when migrating/deprecating things as we are doing with "building without a config file" since we will know "how many of the active/valid projects have already migrated".
Many non-spam projects will not migrate to the new configuration file just because they don't need it or have to. Mainly projects that are archived or where created just for testing/development/educational purposes, for example. I assume we have a lot of them.
The work required that I see here is:
<comment_new>@agjohnson
Because taking automated actions on projects has been hard to get right -- spam banning, build retry -- maybe we should start fairly conservatively here. I think I'm comfortable saying that any project who has a default version that has 25 consecutive failed builds is probably not monitoring their project.
We also don't have to take any automated action yet, we could start with just a notification or drip campaign.
At least disabling the webho...
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.