First attempt to the ACS onboarding#100
First attempt to the ACS onboarding#100p-rog wants to merge 48 commits intovalidatedpatterns:mainfrom
Conversation
|
I have to fix the ACS init secret issue:
|
- Fix indentation in values-hub.yaml (stackrox namespace) - Comment out acs-init-bundle secret (not needed for same-cluster deployment) - RHACS operator auto-generates auth for co-located Central + SecuredCluster Fixes vault namespace deployment issue. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
The secret issue is fixed. |
…ndle and integration
This commit resolves two critical issues preventing ACS Central and SecuredCluster Custom Resources from being deployed: 1. Uncommented extraValueFiles for acs-central and acs-secured-cluster applications in values-hub.yaml. This enables helm charts to receive global configuration values (localClusterDomain, secretStore, etc.) required for proper template rendering. 2. Added ExternalSecret template for central-htpasswd admin password. This syncs the admin password from Vault (hub/infra/acs) to the Kubernetes secret expected by the Central CR. With these fixes, ArgoCD will successfully render and deploy: - Central CR (Wave 10) with PostgreSQL DB and Scanner components - Init bundle job (Wave 12) to generate TLS secrets - OAuth integration job (Wave 13) for OpenShift authentication - SecuredCluster CR (Wave 15) with Sensor, Collector, and Admission Controller Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… the central-cr.yaml and secured-cluster-cr.yaml, removing the perNode duplication, adding explicit scannerV4 configuration to central-cr.yaml
The cluster only has ACM release-2.15 channel available. Changed from release-2.14 to release-2.15 to fix subscription failure. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two critical fixes to resolve ArgoCD manifest generation errors: 1. Fixed acs-central chart: Removed Helm template syntax from comment in create-cluster-init-bundle.yaml line 4. Helm parses template syntax even in comments, causing 'invalid value; expected string' error at column 98. 2. Fixed acs-secured-cluster chart: Removed quotes from clusterName override value in values-hub.yaml. The quoted template syntax caused 'key } has no value' error because ArgoCD was passing literal curly braces to helm --set command. These fixes allow both ACS applications to render manifests correctly. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed nil pointer error in ExternalSecret template by adding default
secretStore configuration to values.yaml.
Error: 'nil pointer evaluating interface {}.name'
Root cause: global.secretStore.name and global.secretStore.kind were
undefined, causing ExternalSecret template to fail.
Solution: Added default values matching validated patterns convention:
- secretStore.name: vault-backend
- secretStore.kind: ClusterSecretStore
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed Vault secret path from 'hub/infra/acs' to 'hub/infra/acs/acs-central'
to match the actual location where validated patterns framework stores
the secret.
Root cause: Framework creates secrets at {vaultPrefixes}/{name} which
results in hub/infra/acs/acs-central, not hub/infra/acs.
This fixes the error: 'Secret does not exist at hub/infra/acs'
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Disabled both scanner V3 and V4 to reduce resource requirements. This allows Central to deploy on resource-constrained clusters. Scanners can be re-enabled later when more resources are available. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This reverts commit 906d5c4.
…e to the central-htpasswd secret
…mplate labels - Changed adminPasswordSecretRef to adminPasswordSecret (correct API field) - Added labels to create-cluster-init-bundle Job template (required by Kubernetes) - Fixes authentication error preventing init-bundle generation
- Add create-htpasswd-field job to automatically generate bcrypt htpasswd entry from the plain password in central-htpasswd secret (sync-wave 6) - Modify create-cluster-init-bundle job to: * Check for existing init bundles with the same cluster name * Delete existing bundle before creating new one * Validate API response contains kubectlBundle before attempting to apply - Fixes authentication issues and init bundle conflicts
- Replace heredoc with printf for Python script (heredoc inside YAML literal block causes parse errors) - Fix quote escaping in Python one-liners (use single quotes for outer, double for inner) - Ensures YAML parses correctly in ArgoCD
- Remove output redirection to /dev/null to make errors visible - Add progress messages to help debug installation issues
- Change image registry from registry.redhat.io to registry.access.redhat.com - Remove Sync hook annotation to prevent blocking ArgoCD sync - httpd-tools package is available in ubi-9-appstream-rpms repository
- Bcrypt generates different hashes each time due to random salt - Change logic to check if valid bcrypt htpasswd entry exists (starts with admin:$2[aby]$) - This makes the job idempotent - exits successfully if valid entry already exists
Root cause analysis revealed three critical issues: 1. UBI9 base image lacks kubectl binary 2. Container runs as non-root (UID 1000810000) due to OpenShift SCC 3. Cannot install httpd-tools with dnf (requires root privileges) Solution: - Use OpenShift CLI image (has oc/kubectl and python3) - Replace htpasswd command with Python's crypt module - Python crypt.METHOD_BLOWFISH generates valid bcrypt hashes - Change kubectl to oc (both work, oc is native to image) - Set imagePullPolicy to Always for internal registry Tested successfully: - Python crypt generates valid bcrypt: admin:$2b$12$... - OpenShift CLI image runs without privilege issues - Job is now idempotent and works in restricted SCC Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem: - Central CR template was only using central.resources from values.yaml - Database and Scanner V4 resources were using operator defaults (too high) - central-db was requesting 4 CPU (exceeds 3.5 CPU node capacity) - Scanner V4 DB was requesting 1 CPU - Pods couldn't be scheduled on standard cluster nodes Solution: 1. Updated central-cr.yaml template to include ALL resource specs: - central.db.resources - scannerV4.indexer.resources, replicas, autoscaling - scannerV4.matcher.resources, replicas - scannerV4.db.resources - Persistence configurations for databases 2. Reduced resource requirements in values.yaml: - central-db: CPU limit 2000m -> 500m - scanner-indexer: 1000m/1.5Gi -> 500m/1Gi - scanner-matcher: 500m -> 250m CPU - scanner-v4-db: CPU limit 2000m -> 500m New total resource requests: ~1.75 CPU / ~7.7Gi Fits on nodes with: 3.5 CPU / 12Gi (50% CPU, 64% memory) Previous requests were: ~6+ CPU (exceeded node capacity) Tested with helm template - generates correct Central CR with all resource specifications properly configured. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The RHACS operator doesn't support explicit persistence configuration for central.db and scannerV4.db - it manages these PVCs automatically. Removing the persistence config prevents reconciliation errors: 'Failed reconciling PVC "central-db". Please remove the storageClassName and size properties from your spec' Keeping only resource specifications for databases, which is supported. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Error: Central CRD does not support: - spec.scannerV4.indexer.autoscaling (should be 'scaling') - spec.scannerV4.indexer.replicas (should be under 'scaling') - spec.scannerV4.matcher.replicas (should be under 'scaling') Fixed: - Changed 'autoscaling' to 'scaling' for indexer - Changed 'status: Enabled' to 'autoScaling: Enabled' - Moved replicas under 'scaling' section for both indexer and matcher - Used correct Central CRD API structure per kubectl explain Tested with helm template - generates correct Central CR structure. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Root Cause:
- Template used: global.clusterName (doesn't exist)
- Pattern provides: global.localClusterName = 'cluster-98djk'
- Result: CLUSTER_NAME was empty string
- ACS API rejected with: 'invalid init bundle name'
Error in pod logs:
'generating new init bundle: invalid init bundle name'
API response: code 13, message about invalid name
Fix:
- Changed template from global.clusterName to global.localClusterName
- Updated comment in values.yaml to reflect correct variable
- Tested: CLUSTER_NAME now correctly evaluates to 'cluster-98djk'
The pattern framework always provides global.localClusterName, not
global.clusterName. The acs-secured-cluster chart uses an explicit
clusterName override ('hub'), but acs-central needs to use the
global.localClusterName variable.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
OK, I fixed all errors and issues:
Ready for review :)
|
The openid scope is mandatory for OIDC authentication. Added scope definition and included it in realm default scopes and ACS client configuration. Also moved offline_access to optional scopes for ACS client. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Changed OIDC mode from "auto" to "query" to use standard authorization code flow - Added offline_access role to admin user to allow offline token requests - Prevents "code already used" and "offline tokens not allowed" errors Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add retry loop to wait for Keycloak OIDC discovery endpoint to be available before attempting to create the auth provider. This prevents 404 errors when ACS tries to validate the OIDC configuration during provider creation. Fixes timing issue where create-auth-provider job runs before Keycloak realm is fully imported and ready. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Show Keycloak discovery endpoint response before creating provider - Capture and display HTTP status codes for all API calls - Show full response bodies for debugging - Better error messages with HTTP codes This will help diagnose issues with auth provider creation and role mapping. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add "roles": "roles" to claimMappings so ACS knows to look for the roles claim in the OIDC token. Without this, ACS cannot map Keycloak roles to ACS roles, resulting in "no valid role" error. This is the critical fix for role-based authorization with Keycloak OIDC. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
An update: I fixed all ACS-Keycloak OIDC Integration issues.
Why:
Now ACS can be automatically deployed as a part of the layered-zero-trust pattern and by default uses Keycloak OIDC authentication. Let me know if I should add to the ACS deployment documentation workflow how the Keycloak integration works. |
Red Hat Advanced Cluster Security (RHACS/StackRox) consists of two main deployment types:
Central Services (Hub Cluster)
Central:
Scanner:
Secured Cluster Services (Per Cluster)
Sensor:
Admission Controller:
Collector: