-
Notifications
You must be signed in to change notification settings - Fork 1.8k
authorizations: optimize queries & cache data per request #13989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
authorizations: optimize queries & cache data per request #13989
Conversation
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
- Resolved conflict by using refactored _bulk_delete_findings function from upstream/dev - Preserved optimization by using get_authorized_findings_for_queryset instead of get_authorized_findings - This maintains the queryset-based authorization filtering optimization from the branch
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Resolved conflicts: - docs/content/en/open_source/upgrading/2.55.md: Kept authorization optimization content with updated weight - unittests/test_importers_performance.py: Kept optimized query counts from authorization optimization branch - dojo/__init__.py: Added noqa comments for RUF067 on metadata attributes
Updated expected query and async task counts using update_performance_test_counts.py script. Most tests show improvements with slight reductions in queries/tasks. Product grading tests show small increases due to upstream changes in grading logic. All tests verified passing.
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
|
The ruff failures will resolve themselves after merging the ruff update PR |
Maffooch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty exciting what this will do for the API as well
mtesauro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
|
Extra context: In Pro currently the per request cache is disabled, reducing the risk there. |
In #6388 it was suggest to change the query approach for retrieving (per user) authorizations/authorized objects. Although that would be mostly beneficial for large MySQL based instances, there's also non-trivial room for improvement for Postgres queries.
Summary
get_authorized_*()query functions (46 new tests inunittests/test_authorization_queries.py)EXISTScorrelated subqueries withIN (Subquery)pattern across 24 functions (~2.4x speedup)Subqueryin 3 functions@cache_for_requestdecorator to 22 authorization functionsquerysetparameter into cached and uncached variants (*_for_queryset)dojo/reports/views.pyBackground
The Problem
Authorization queries used
EXISTSwithOuterRef()which creates correlated subqueries evaluated per-row, causing poor performance on large datasets.Reference: #6388
Old Query Pattern (EXISTS)
New Query Pattern (IN with Subquery)
Performance Results
Tested on PostgreSQL with ~195,000 findings, using
DISCARD ALLbetween runs to ensure fair cache state comparison:Consistent 2.3-2.5x speedup across all test runs.
The improvement comes from:
Caching
Added
@cache_for_requestdecorator to authorization functions. This caches query results for the duration of a single HTTP request, eliminating redundant database queries when the same authorization check is called multiple times.Functions with Direct Caching (16)
These functions do not accept a
querysetparameter and are directly cached:dojo/engagement/queries.pyget_authorized_engagementsdojo/product_type/queries.pyget_authorized_product_typesdojo/product/queries.pyget_authorized_products,get_authorized_app_analysis,get_authorized_dojo_meta,get_authorized_languages,get_authorized_engagement_presets,get_authorized_product_api_scan_configurationsdojo/test/queries.pyget_authorized_tests,get_authorized_test_importsdojo/risk_acceptance/queries.pyget_authorized_risk_acceptancesdojo/jira_link/queries.pyget_authorized_jira_projects,get_authorized_jira_issuesdojo/tool_product/queries.pyget_authorized_tool_product_settingsdojo/group/queries.pyget_authorized_groupsdojo/finding/queries.pyget_authorized_stub_findingsFunctions Split into Cached + Uncached (6)
Functions with a
querysetparameter were split to support both use cases:dojo/finding/queries.pyget_authorized_findings()get_authorized_findings_for_queryset()dojo/finding/queries.pyget_authorized_vulnerability_ids()get_authorized_vulnerability_ids_for_queryset()dojo/endpoint/queries.pyget_authorized_endpoints()get_authorized_endpoints_for_queryset()dojo/endpoint/queries.pyget_authorized_endpoint_status()get_authorized_endpoint_status_for_queryset()dojo/finding_group/queries.pyget_authorized_finding_groups()get_authorized_finding_groups_for_queryset()dojo/cred/queries.pyget_authorized_cred_mappings()get_authorized_cred_mappings_for_queryset()Expected Caching Benefits
In a typical finding list page request, authorization functions may be called multiple times:
get_authorized_findingsget_authorized_productsThe cache is automatically cleared at the end of each HTTP request, ensuring data freshness. This is a pre-existing cache mechanism already used for some of the authorization query results.
Performance Test Query Count Changes
Total query reduction: 5-10 queries saved per import/reimport operation due to request-level caching of authorization queries.
Tests
All existing and new tests pass:
unittests.test_authorization_queries- 46 tests ✓unittests.authorization.test_authorization.TestAuthorization- 52 tests ✓unittests.test_rest_framework.FindingsTest- 24 tests ✓unittests.test_rest_framework.ProductTest- 18 tests ✓