Skip to content

First attempt to the ACS onboarding#100

Open
p-rog wants to merge 48 commits intovalidatedpatterns:mainfrom
p-rog:acs-onboarding
Open

First attempt to the ACS onboarding#100
p-rog wants to merge 48 commits intovalidatedpatterns:mainfrom
p-rog:acs-onboarding

Conversation

@p-rog
Copy link
Collaborator

@p-rog p-rog commented Feb 23, 2026

Red Hat Advanced Cluster Security (RHACS/StackRox) consists of two main deployment types:

Central Services (Hub Cluster)

Central:

  • Management console and API server
  • Policy engine and enforcement
  • Centralized data aggregation
  • Vulnerability database management

Scanner:

  • Vulnerability scanning for container images
  • Pulls image layers from registries
  • Identifies installed packages
  • Compares against CVE databases

Secured Cluster Services (Per Cluster)

Sensor:

  • Monitors cluster activity
  • Listens to Kubernetes API events
  • Collects data from Collectors
  • Reports cluster state to Central

Admission Controller:

  • Policy enforcement at deployment time
  • Validates resources before admission
  • Prevents policy violations
  • Configurable bypass options

Collector:

  • Per-node DaemonSet deployment
  • Runtime monitoring and network activity
  • Container activity analysis
  • Sends data to Sensor

@p-rog p-rog marked this pull request as draft February 23, 2026 16:27
@p-rog
Copy link
Collaborator Author

p-rog commented Feb 23, 2026

I have to fix the ACS init secret issue:

  1. Init bundle can ONLY be generated AFTER ACS Central is deployed and running
  2. The Validated Patterns framework processes ALL secrets BEFORE deploying applications
  3. With onMissingValue: error, installation fails if the secret doesn't exist in Vault

Przemyslaw Roguski and others added 2 commits February 23, 2026 20:17
- Fix indentation in values-hub.yaml (stackrox namespace)
- Comment out acs-init-bundle secret (not needed for same-cluster deployment)
- RHACS operator auto-generates auth for co-located Central + SecuredCluster

Fixes vault namespace deployment issue.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@p-rog
Copy link
Collaborator Author

p-rog commented Feb 23, 2026

The secret issue is fixed.
I'm working on Vault service creation issue.

Przemyslaw Roguski and others added 24 commits February 24, 2026 13:34
This commit resolves two critical issues preventing ACS Central and
SecuredCluster Custom Resources from being deployed:

1. Uncommented extraValueFiles for acs-central and acs-secured-cluster
   applications in values-hub.yaml. This enables helm charts to receive
   global configuration values (localClusterDomain, secretStore, etc.)
   required for proper template rendering.

2. Added ExternalSecret template for central-htpasswd admin password.
   This syncs the admin password from Vault (hub/infra/acs) to the
   Kubernetes secret expected by the Central CR.

With these fixes, ArgoCD will successfully render and deploy:
- Central CR (Wave 10) with PostgreSQL DB and Scanner components
- Init bundle job (Wave 12) to generate TLS secrets
- OAuth integration job (Wave 13) for OpenShift authentication
- SecuredCluster CR (Wave 15) with Sensor, Collector, and Admission Controller

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… the central-cr.yaml and secured-cluster-cr.yaml, removing the perNode duplication, adding explicit scannerV4 configuration to central-cr.yaml
The cluster only has ACM release-2.15 channel available.
Changed from release-2.14 to release-2.15 to fix subscription failure.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two critical fixes to resolve ArgoCD manifest generation errors:

1. Fixed acs-central chart: Removed Helm template syntax from comment
   in create-cluster-init-bundle.yaml line 4. Helm parses template
   syntax even in comments, causing 'invalid value; expected string'
   error at column 98.

2. Fixed acs-secured-cluster chart: Removed quotes from clusterName
   override value in values-hub.yaml. The quoted template syntax
   caused 'key } has no value' error because ArgoCD was passing
   literal curly braces to helm --set command.

These fixes allow both ACS applications to render manifests correctly.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed nil pointer error in ExternalSecret template by adding default
secretStore configuration to values.yaml.

Error: 'nil pointer evaluating interface {}.name'
Root cause: global.secretStore.name and global.secretStore.kind were
undefined, causing ExternalSecret template to fail.

Solution: Added default values matching validated patterns convention:
- secretStore.name: vault-backend
- secretStore.kind: ClusterSecretStore

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed Vault secret path from 'hub/infra/acs' to 'hub/infra/acs/acs-central'
to match the actual location where validated patterns framework stores
the secret.

Root cause: Framework creates secrets at {vaultPrefixes}/{name} which
results in hub/infra/acs/acs-central, not hub/infra/acs.

This fixes the error: 'Secret does not exist at hub/infra/acs'

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Disabled both scanner V3 and V4 to reduce resource requirements.
This allows Central to deploy on resource-constrained clusters.
Scanners can be re-enabled later when more resources are available.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…mplate labels

- Changed adminPasswordSecretRef to adminPasswordSecret (correct API field)
- Added labels to create-cluster-init-bundle Job template (required by Kubernetes)
- Fixes authentication error preventing init-bundle generation
- Add create-htpasswd-field job to automatically generate bcrypt htpasswd entry
  from the plain password in central-htpasswd secret (sync-wave 6)
- Modify create-cluster-init-bundle job to:
  * Check for existing init bundles with the same cluster name
  * Delete existing bundle before creating new one
  * Validate API response contains kubectlBundle before attempting to apply
- Fixes authentication issues and init bundle conflicts
- Replace heredoc with printf for Python script (heredoc inside YAML literal block causes parse errors)
- Fix quote escaping in Python one-liners (use single quotes for outer, double for inner)
- Ensures YAML parses correctly in ArgoCD
- Remove output redirection to /dev/null to make errors visible
- Add progress messages to help debug installation issues
- Change image registry from registry.redhat.io to registry.access.redhat.com
- Remove Sync hook annotation to prevent blocking ArgoCD sync
- httpd-tools package is available in ubi-9-appstream-rpms repository
- Bcrypt generates different hashes each time due to random salt
- Change logic to check if valid bcrypt htpasswd entry exists (starts with admin:$2[aby]$)
- This makes the job idempotent - exits successfully if valid entry already exists
Root cause analysis revealed three critical issues:
1. UBI9 base image lacks kubectl binary
2. Container runs as non-root (UID 1000810000) due to OpenShift SCC
3. Cannot install httpd-tools with dnf (requires root privileges)

Solution:
- Use OpenShift CLI image (has oc/kubectl and python3)
- Replace htpasswd command with Python's crypt module
- Python crypt.METHOD_BLOWFISH generates valid bcrypt hashes
- Change kubectl to oc (both work, oc is native to image)
- Set imagePullPolicy to Always for internal registry

Tested successfully:
- Python crypt generates valid bcrypt: admin:$2b$12$...
- OpenShift CLI image runs without privilege issues
- Job is now idempotent and works in restricted SCC

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Przemyslaw Roguski and others added 5 commits February 27, 2026 11:25
Problem:
- Central CR template was only using central.resources from values.yaml
- Database and Scanner V4 resources were using operator defaults (too high)
- central-db was requesting 4 CPU (exceeds 3.5 CPU node capacity)
- Scanner V4 DB was requesting 1 CPU
- Pods couldn't be scheduled on standard cluster nodes

Solution:
1. Updated central-cr.yaml template to include ALL resource specs:
   - central.db.resources
   - scannerV4.indexer.resources, replicas, autoscaling
   - scannerV4.matcher.resources, replicas
   - scannerV4.db.resources
   - Persistence configurations for databases

2. Reduced resource requirements in values.yaml:
   - central-db: CPU limit 2000m -> 500m
   - scanner-indexer: 1000m/1.5Gi -> 500m/1Gi
   - scanner-matcher: 500m -> 250m CPU
   - scanner-v4-db: CPU limit 2000m -> 500m

New total resource requests: ~1.75 CPU / ~7.7Gi
Fits on nodes with: 3.5 CPU / 12Gi (50% CPU, 64% memory)

Previous requests were: ~6+ CPU (exceeded node capacity)

Tested with helm template - generates correct Central CR with all
resource specifications properly configured.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The RHACS operator doesn't support explicit persistence configuration
for central.db and scannerV4.db - it manages these PVCs automatically.

Removing the persistence config prevents reconciliation errors:
'Failed reconciling PVC "central-db". Please remove the storageClassName
and size properties from your spec'

Keeping only resource specifications for databases, which is supported.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Error: Central CRD does not support:
- spec.scannerV4.indexer.autoscaling (should be 'scaling')
- spec.scannerV4.indexer.replicas (should be under 'scaling')
- spec.scannerV4.matcher.replicas (should be under 'scaling')

Fixed:
- Changed 'autoscaling' to 'scaling' for indexer
- Changed 'status: Enabled' to 'autoScaling: Enabled'
- Moved replicas under 'scaling' section for both indexer and matcher
- Used correct Central CRD API structure per kubectl explain

Tested with helm template - generates correct Central CR structure.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Root Cause:
- Template used: global.clusterName (doesn't exist)
- Pattern provides: global.localClusterName = 'cluster-98djk'
- Result: CLUSTER_NAME was empty string
- ACS API rejected with: 'invalid init bundle name'

Error in pod logs:
  'generating new init bundle: invalid init bundle name'
  API response: code 13, message about invalid name

Fix:
- Changed template from global.clusterName to global.localClusterName
- Updated comment in values.yaml to reflect correct variable
- Tested: CLUSTER_NAME now correctly evaluates to 'cluster-98djk'

The pattern framework always provides global.localClusterName, not
global.clusterName. The acs-secured-cluster chart uses an explicit
clusterName override ('hub'), but acs-central needs to use the
global.localClusterName variable.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@p-rog
Copy link
Collaborator Author

p-rog commented Feb 27, 2026

OK, I fixed all errors and issues:

  • create-htpasswd-field Job (Commit: 4b2bae6)

  • corrected resources configuration and resources reduction for testing (Commit: 2e7d6d4)

  • Cluster Init Bundle creation correction, incorrect variable usage (Commit: 661db58)

    Final Results:

    • All ACS components running successfully
    • Central route accessible
    • Secured cluster components deployed
    • Init bundle created with correct cluster name
    • ArgoCD applications healthy

Ready for review :)

(just a side note: the Keycloak integration doesn't work yet, I tested it already in a new branch, will merge it here once confirmed that everything works fine)

@p-rog p-rog marked this pull request as ready for review February 27, 2026 17:19
Przemyslaw Roguski and others added 15 commits March 2, 2026 11:55
The openid scope is mandatory for OIDC authentication. Added scope definition
and included it in realm default scopes and ACS client configuration.
Also moved offline_access to optional scopes for ACS client.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Changed OIDC mode from "auto" to "query" to use standard authorization code flow
- Added offline_access role to admin user to allow offline token requests
- Prevents "code already used" and "offline tokens not allowed" errors

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add retry loop to wait for Keycloak OIDC discovery endpoint to be available
before attempting to create the auth provider. This prevents 404 errors when
ACS tries to validate the OIDC configuration during provider creation.

Fixes timing issue where create-auth-provider job runs before Keycloak
realm is fully imported and ready.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Show Keycloak discovery endpoint response before creating provider
- Capture and display HTTP status codes for all API calls
- Show full response bodies for debugging
- Better error messages with HTTP codes

This will help diagnose issues with auth provider creation and role mapping.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add "roles": "roles" to claimMappings so ACS knows to look for the roles
claim in the OIDC token. Without this, ACS cannot map Keycloak roles to
ACS roles, resulting in "no valid role" error.

This is the critical fix for role-based authorization with Keycloak OIDC.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@p-rog
Copy link
Collaborator Author

p-rog commented Mar 4, 2026

An update:

I fixed all ACS-Keycloak OIDC Integration issues.
In short:
Changes:

  • Added openid client scope definition (mandatory OIDC scope)
  • Added openid to realm defaultDefaultClientScopes
  • Added openid to ACS client defaultClientScopes
  • Moved offline_access from ACS client default scopes to optional scopes
  • Added offline_access role to admin user's realmRoles

Why:

  • openid scope is required by the ACS, which follows OIDC specification and must be present in the realm
  • offline_access allows ACS to request refresh tokens
  • Admin user needs offline_access role to prevent "Offline tokens not allowed" error

Now ACS can be automatically deployed as a part of the layered-zero-trust pattern and by default uses Keycloak OIDC authentication. Let me know if I should add to the ACS deployment documentation workflow how the Keycloak integration works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants