Skip to content
4 changes: 4 additions & 0 deletions docs/reference/cluster_manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ Those parameters are grouped under the `metadata` top-level key.
Labels that are set here but not listed as `inherited_labels` in the operator
parameters are ignored.

* **annotations**
A map of annotations to add to the `postgresql` resource. The operator reacts to certain annotations, for instance, to trigger specific actions.
* `postgres-operator.zalando.org/action: restore-in-place`: When this annotation is present with this value, the operator will trigger an automated in-place restore of the cluster. This process requires a valid `clone` section to be defined in the manifest with a target `timestamp`. See the [user guide](../user.md#automated-restore-in-place-point-in-time-recovery) for more details.

## Top-level parameters

These parameters are grouped directly under the `spec` key in the manifest.
Expand Down
6 changes: 6 additions & 0 deletions docs/reference/operator_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ configuration.
Variable names are underscore-separated words.

### ConfigMaps-based

The configuration is supplied in a
key-value configmap, defined by the `CONFIG_MAP_NAME` environment variable.
Non-scalar values, i.e. lists or maps, are encoded in the value strings using
Expand All @@ -25,6 +26,7 @@ operator CRD, all the CRD defaults are provided in the
[operator's default configuration manifest](https://github.com/zalando/postgres-operator/blob/master/manifests/postgresql-operator-default-configuration.yaml)

### CRD-based configuration

The configuration is stored in a custom YAML
manifest. The manifest is an instance of the custom resource definition (CRD)
called `OperatorConfiguration`. The operator registers this CRD during the
Expand Down Expand Up @@ -171,6 +173,9 @@ Those are top-level keys, containing both leaf keys and groups.
* **repair_period**
period between consecutive repair requests. The default is `5m`.

* **pitr_backup_retention**
retention time for PITR (Point-In-Time-Recovery) state ConfigMaps. The operator will clean up ConfigMaps older than the configured retention. The value is a [duration string](https://pkg.go.dev/time#ParseDuration), e.g. "168h" (which is 7 days), "24h". The default is `168h`.

* **set_memory_request_to_limit**
Set `memory_request` to `memory_limit` for all Postgres clusters (the default
value is also increased but configured `max_memory_request` can not be
Expand Down Expand Up @@ -918,6 +923,7 @@ key.
```yaml
teams_api_role_configuration: "log_statement:all,search_path:'data,public'"
```

The default is `"log_statement:all"`

* **enable_team_superuser**
Expand Down
40 changes: 40 additions & 0 deletions docs/user.md
Original file line number Diff line number Diff line change
Expand Up @@ -891,6 +891,45 @@ original UID, making it possible retry restoring. However, it is probably
better to create a temporary clone for experimenting or finding out to which
point you should restore.

## Automated Restore in place (Point-in-Time Recovery)

The operator supports automated in-place restores, allowing you to restore a database to a specific point in time without changing connection strings on the application side. This feature orchestrates the deletion of the current cluster and the creation of a new one from a backup.

:warning: This is a destructive operation. The existing cluster's StatefulSet and pods will be deleted as part of the process. Ensure you have a reliable backup strategy and have tested the restore process in a non-production environment.

To trigger an in-place restore, you need to add a special annotation and a `clone` section to your `postgresql` manifest:

* **Annotate the manifest**: Add the `postgres-operator.zalando.org/action: restore-in-place` annotation to the `metadata` section.
* **Specify the recovery target**: Add a `clone` section to the `spec`, providing the `cluster` name and the `timestamp` for the point-in-time recovery. The `cluster` name **must** be the same as the `metadata.name` of the cluster you are restoring. The `timestamp` must be in RFC 3339 format and point to a time in the past for which you have WAL archives.

Here is an example manifest snippet:

```yaml
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-minimal-cluster
annotations:
postgres-operator.zalando.org/action: restore-in-place
spec:
# ... other cluster parameters
clone:
cluster: "acid-minimal-cluster" # Must match metadata.name
uid: "<original_UID>"
timestamp: "2022-04-01T10:11:12+00:00"
# ... other cluster parameters
```

When you apply this manifest, the operator will:
* See the `restore-in-place` annotation and begin the restore workflow.
* Store the restore request and the new cluster definition in a temporary `ConfigMap`.
* Delete the existing `postgresql` custom resource, which triggers the deletion of the associated StatefulSet and pods.
* Wait for the old cluster to be fully terminated.
* Create a new `postgresql` resource with a new UID but the same name.
* The new cluster will bootstrap from the latest base backup prior to the given `timestamp` and replay WAL files to recover to the specified point in time.

The process is asynchronous. You can monitor the operator logs and the state of the `postgresql` resource to follow the progress. Once the new cluster is up and running, your applications can reconnect.

## Setting up a standby cluster

Standby cluster is a [Patroni feature](https://github.com/zalando/patroni/blob/master/docs/replica_bootstrap.rst#standby-cluster)
Expand Down Expand Up @@ -1291,3 +1330,4 @@ As of now, the operator does not sync the pooler deployment automatically
which means that changes in the pod template are not caught. You need to
toggle `enableConnectionPooler` to set environment variables, volumes, secret
mounts and securityContext required for TLS support in the pooler pod.

3 changes: 3 additions & 0 deletions manifests/operatorconfiguration.crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,9 @@ spec:
repair_period:
type: string
default: "5m"
pitr_backup_retention:
type: string
default: "168h"
set_memory_request_to_limit:
type: boolean
default: false
Expand Down
1 change: 1 addition & 0 deletions manifests/postgresql-operator-default-configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ configuration:
min_instances: -1
resync_period: 30m
repair_period: 5m
pitr_backup_retention: 168h
# set_memory_request_to_limit: false
# sidecars:
# - image: image:123
Expand Down
1 change: 1 addition & 0 deletions pkg/apis/acid.zalan.do/v1/operator_configuration_type.go
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,7 @@ type OperatorConfigurationData struct {
Workers uint32 `json:"workers,omitempty"`
ResyncPeriod Duration `json:"resync_period,omitempty"`
RepairPeriod Duration `json:"repair_period,omitempty"`
PitrBackupRetention Duration `json:"pitr_backup_retention,omitempty"`
SetMemoryRequestToLimit bool `json:"set_memory_request_to_limit,omitempty"`
ShmVolume *bool `json:"enable_shm_volume,omitempty"`
SidecarImages map[string]string `json:"sidecar_docker_images,omitempty"` // deprecated in favour of SidecarContainers
Expand Down
70 changes: 65 additions & 5 deletions pkg/cluster/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package cluster
// Postgres CustomResourceDefinition object i.e. Spilo

import (
"context"
"database/sql"
"encoding/json"
"fmt"
Expand Down Expand Up @@ -32,6 +33,7 @@ import (
v1 "k8s.io/api/core/v1"
policyv1 "k8s.io/api/policy/v1"
rbacv1 "k8s.io/api/rbac/v1"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/rest"
Expand Down Expand Up @@ -431,6 +433,33 @@ func (c *Cluster) Create() (err error) {
c.logger.Errorf("could not list resources: %v", err)
}

if err := c.updatePITRResources(PitrStateLabelValueFinished); err != nil {
return fmt.Errorf("could not update pitr resources: %v", err)
}
return nil
}

// update the label to finished for PITR for the given config map
func (c *Cluster) updatePITRResources(state string) error {
cmName := fmt.Sprintf(PitrConfigMapNameTemplate, c.Name)
cmNamespace := c.Namespace
patchPayload := map[string]any{
"metadata": map[string]any{
"labels": map[string]string{
PitrStateLabelKey: state,
},
},
}

data, _ := json.Marshal(patchPayload)
if _, err := c.KubeClient.ConfigMaps(cmNamespace).Patch(context.TODO(), cmName, types.MergePatchType, data, metav1.PatchOptions{}, ""); err != nil {
// If ConfigMap doesn't exist, this is a normal cluster creation (not a restore-in-place)
if k8serrors.IsNotFound(err) {
return nil
}
c.logger.Errorf("restore-in-place: error updating config map label to state: %v", err)
return err
}
return nil
}

Expand Down Expand Up @@ -1200,6 +1229,33 @@ func syncResources(a, b *v1.ResourceRequirements) bool {
return false
}

const (
PitrStateLabelKey = "postgres-operator.zalando.org/pitr-state"
PitrStateLabelValuePending = "pending"
PitrStateLabelValueInProgress = "in-progress"
PitrStateLabelValueFinished = "finished"
PitrConfigMapNameTemplate = "pitr-state-%s"
PitrSpecDataKey = "spec"
)

func (c *Cluster) isRestoreInPlace() bool {
cmName := fmt.Sprintf(PitrConfigMapNameTemplate, c.Name)
cm, err := c.KubeClient.ConfigMaps(c.Namespace).Get(context.TODO(), cmName, metav1.GetOptions{})
if err != nil {
c.logger.Debugf("restore-in-place: Error while fetching config map: %s before deletion", cmName)
return false
}

if cm != nil {
if val, ok := cm.Labels[PitrStateLabelKey]; ok {
if val == PitrStateLabelValuePending {
return true
}
}
}
return false
}

// Delete deletes the cluster and cleans up all objects associated with it (including statefulsets).
// The deletion order here is somewhat significant, because Patroni, when running with the Kubernetes
// DCS, reuses the master's endpoint to store the leader related metadata. If we remove the endpoint
Expand All @@ -1211,6 +1267,8 @@ func (c *Cluster) Delete() error {
defer c.mu.Unlock()
c.eventRecorder.Event(c.GetReference(), v1.EventTypeNormal, "Delete", "Started deletion of cluster resources")

isRestoreInPlace := c.isRestoreInPlace()
c.logger.Debugf("restore-in-place: Deleting the cluster, verifying whether resotore-in-place is true or not: %+v\n", isRestoreInPlace)
if err := c.deleteStreams(); err != nil {
anyErrors = true
c.logger.Warningf("could not delete event streams: %v", err)
Expand All @@ -1231,7 +1289,7 @@ func (c *Cluster) Delete() error {
c.eventRecorder.Eventf(c.GetReference(), v1.EventTypeWarning, "Delete", "could not delete statefulset: %v", err)
}

if c.OpConfig.EnableSecretsDeletion != nil && *c.OpConfig.EnableSecretsDeletion {
if c.OpConfig.EnableSecretsDeletion != nil && *c.OpConfig.EnableSecretsDeletion && !isRestoreInPlace {
if err := c.deleteSecrets(); err != nil {
anyErrors = true
c.logger.Warningf("could not delete secrets: %v", err)
Expand All @@ -1256,10 +1314,12 @@ func (c *Cluster) Delete() error {
}
}

if err := c.deleteService(role); err != nil {
anyErrors = true
c.logger.Warningf("could not delete %s service: %v", role, err)
c.eventRecorder.Eventf(c.GetReference(), v1.EventTypeWarning, "Delete", "could not delete %s service: %v", role, err)
if !isRestoreInPlace {
if err := c.deleteService(role); err != nil {
anyErrors = true
c.logger.Warningf("could not delete %s service: %v", role, err)
c.eventRecorder.Eventf(c.GetReference(), v1.EventTypeWarning, "Delete", "could not delete %s service: %v", role, err)
}
}
}

Expand Down
120 changes: 120 additions & 0 deletions pkg/cluster/cluster_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ import (
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/kubernetes/fake"
k8stesting "k8s.io/client-go/testing"
"k8s.io/client-go/tools/record"
)

Expand Down Expand Up @@ -94,6 +96,7 @@ func TestCreate(t *testing.T) {
clusterNamespace := "test"

client := k8sutil.KubernetesClient{
ConfigMapsGetter: clientSet.CoreV1(),
DeploymentsGetter: clientSet.AppsV1(),
CronJobsGetter: clientSet.BatchV1(),
EndpointsGetter: clientSet.CoreV1(),
Expand Down Expand Up @@ -2202,3 +2205,120 @@ func TestGetSwitchoverSchedule(t *testing.T) {
})
}
}

func TestUpdatePITRResources(t *testing.T) {
clusterName := "test-cluster"
clusterNamespace := "default"

tests := []struct {
name string
state string
cmExists bool
patchFails bool
expectedErr bool
expectedLabel string
}{
{
"successful patch - update label to finished",
PitrStateLabelValueFinished,
true,
false,
false,
PitrStateLabelValueFinished,
},
{
"successful patch - update label to in-progress",
PitrStateLabelValueInProgress,
true,
false,
false,
PitrStateLabelValueInProgress,
},
{
"config map does not exist - no error",
PitrStateLabelValueFinished,
false,
false,
false,
"",
},
{
"patch fails with non-NotFound error",
PitrStateLabelValueFinished,
true,
true,
true,
"",
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
clientSet := fake.NewSimpleClientset()
acidClientSet := fakeacidv1.NewSimpleClientset()

if tt.cmExists {
cmName := fmt.Sprintf(PitrConfigMapNameTemplate, clusterName)
cm := &v1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: cmName,
Namespace: clusterNamespace,
Labels: map[string]string{
PitrStateLabelKey: PitrStateLabelValuePending,
},
},
}
_, err := clientSet.CoreV1().ConfigMaps(clusterNamespace).Create(context.TODO(), cm, metav1.CreateOptions{})
if err != nil {
t.Fatalf("could not create configmap: %v", err)
}
}

if tt.patchFails {
clientSet.PrependReactor("patch", "configmaps", func(action k8stesting.Action) (handled bool, ret runtime.Object, err error) {
return true, nil, fmt.Errorf("synthetic patch error")
})
}

client := k8sutil.KubernetesClient{
ConfigMapsGetter: clientSet.CoreV1(),
PostgresqlsGetter: acidClientSet.AcidV1(),
}

pg := acidv1.Postgresql{
ObjectMeta: metav1.ObjectMeta{
Name: clusterName,
Namespace: clusterNamespace,
},
}

cluster := New(
Config{
OpConfig: config.Config{
PodManagementPolicy: "ordered_ready",
},
}, client, pg, logger, eventRecorder)

err := cluster.updatePITRResources(tt.state)

if err != nil {
if !tt.expectedErr {
t.Fatalf("unexpected error: %v", err)
}
} else if tt.expectedErr {
t.Fatalf("expected error, but got none")
}

if tt.cmExists && !tt.patchFails && tt.expectedLabel != "" {
cmName := fmt.Sprintf(PitrConfigMapNameTemplate, clusterName)
updatedCm, err := clientSet.CoreV1().ConfigMaps(clusterNamespace).Get(context.TODO(), cmName, metav1.GetOptions{})
if err != nil {
t.Fatalf("could not get configmap: %v", err)
}
if updatedCm.Labels[PitrStateLabelKey] != tt.expectedLabel {
t.Errorf("expected label %q but got %q", tt.expectedLabel, updatedCm.Labels[PitrStateLabelKey])
}
}
})
}
}
Loading