Skip to content

Conversation

@hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Feb 22, 2025

Introduce a new internal-facing interface to neonvm-daemon, which can be used to resize the swap disk and to set the disk quota on the pgdata disk.

This will replace the sudo scripts that compute_ctl currently uses. They are kept around for now for backwards-compatibility, but when the compute image builds have switched over to using the new interface, they can be removed.

One noteworthy thing is that the path controlled by the quota is now fixed, to /var/db/postgres/compute. It's a CLI argument to neonvm-daemon, but the inittab always passes /var/db/postgres/compute. This could be parameterized further if needed, but I don't see the need at the moment.

Adds a README explaining neonvm-daemon's role, and the two interfaces it now has.

@hlinnaka
Copy link
Contributor Author

See neondatabase/neon#10939 for the changes to compute_ctl to use this new interface.

@github-actions
Copy link

github-actions bot commented Feb 22, 2025

No changes to the coverage.

HTML Report

Click to open

@mikhail-sakhnov
Copy link

In general looks good, a couple of questions here:

  • Why do we want to have a separate UDS for internal communication? I think it is fine to just go to the 127.0.0.1: from local (local as VM) machine. Is it to prevent some of the APIs from being called from the externals of the VM?

  • Was it communicated with the team that we would have it like that? I don't remember having it either in our backlog or in discussions recently.

@hlinnaka hlinnaka force-pushed the heikki/neonvmd-handle-swap-quota branch 3 times, most recently from 3bc0aa3 to 0e400f5 Compare February 23, 2025 20:03
@hlinnaka
Copy link
Contributor Author

In general looks good, a couple of questions here:

  • Why do we want to have a separate UDS for internal communication? I think it is fine to just go to the 127.0.0.1: from local (local as VM) machine. Is it to prevent some of the APIs from being called from the externals of the VM?

It allows some basic control of who can access it, by file permissions. Not sure what access controls we'll need on it, currently it's accessible by all users, but it seems like it could come handy.

  • Was it communicated with the team that we would have it like that? I don't remember having it either in our backlog or in discussions recently.

This is the communication and the start of the discussion :-).

Introduces a new internal-facing interface to neonvm-daemon, which can
be used to resize the swap disk and to set the disk quota on the
pgdata disk.

This will replace the sudo scripts that compute_ctl currently
uses. They are kept around for now for backwards-compatibility, but
when the compute image builds have switched over to using the new
interface, they can be removed.

One noteworthy thing is that the path controlled by the quota is now
fixed, to /var/db/postgres/compute. It's a CLI argument to
neonvm-daemon, but the inittab always passes /var/db/postgres/compute.
This could be parameterized further if needed, but I don't see the
need at the moment.

Adds a README explaining neonvm-daemon's role, and the two interfaces
it now has.
@hlinnaka hlinnaka force-pushed the heikki/neonvmd-handle-swap-quota branch from 0e400f5 to 5ba25e4 Compare February 24, 2025 11:41
@mikhail-sakhnov mikhail-sakhnov self-assigned this Feb 24, 2025
@hlinnaka hlinnaka changed the title Allow neonvm-daemon to handle resize-swap and set-disk-quota neonvm-daemon: Handle resize-swap and set-disk-quota Feb 24, 2025
@olegbbtr
Copy link
Contributor

Fixes #958?

@hlinnaka
Copy link
Contributor Author

Fixes #958?

Makes it better at least. If the swap has already been set, neonvmd prints swap was already resized, refusing to resize it again error to the log. The API caller gets an HTTP "409 Conflict" error.

One thing worth noting about this "swap can only be resized once" protection: With this PR, the flag indicating that the swap has already been resized is kept in neonvm-daemon's memory. If neonvm-daemon is restarted for some reason, it gets cleared, and resizing becomes possible again. If that's not acceptable, I can add a marker file to remember that over restarts.

@mikhail-sakhnov mikhail-sakhnov removed their assignment Mar 11, 2025
@mikhail-sakhnov
Copy link

it looks reasonably good to me, but I'd like to have second voice opinion. @neondatabase/autoscaling

@olegbbtr
Copy link
Contributor

Fixes #958?

Makes it better at least. If the swap has already been set, neonvmd prints swap was already resized, refusing to resize it again error to the log. The API caller gets an HTTP "409 Conflict" error.

Sounds as a pretty clear indication, so I think it solves it.

Can it also return the error in the response body?

One thing worth noting about this "swap can only be resized once" protection: With this PR, the flag indicating that the swap has already been resized is kept in neonvm-daemon's memory. If neonvm-daemon is restarted for some reason, it gets cleared, and resizing becomes possible again. If that's not acceptable, I can add a marker file to remember that over restarts.

The reasoning behind having resize script deleted was to handle the situation when compue_ctl restarts mid-execution (e.g. memory pressure). So it is reasonable to protect against the same thing happening to neonvm-daemon. We should have persistent file which would remember that.

Or, alternatively, it can be handled through the same means in compute_ctl, and then we can get rid of this protection in neonvm-daemon totally.

Copy link
Contributor

@olegbbtr olegbbtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

# Define the expected values for Project and BlockHardLimit
expected_project="#0"
expected_block_hard_limit="1048576" # 1 GiB
expected_block_hard_limit="3145728" # 3 GiB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep 01-* step the same, and use 3 GiB in the new step?

COPY --from=postgres-exporter /bin/postgres_exporter /bin/postgres_exporter
COPY --from=pgbouncer /usr/local/pgbouncer/bin/pgbouncer /usr/local/bin/pgbouncer
RUN apt update && apt install --no-install-recommends -y curl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline.


func setDiskQuota(logger *zap.Logger, path string, sizeBytes uint64) error {
// setquota -P <project_id> <block-softlimit> <block-hardlimit> <inode-softlimit> <inode-hardlimit> <filesystem>
output, err := exec.Command("setquota", "-P", "0", "0", fmt.Sprintf("%d", sizeBytes), "0", "0", path).CombinedOutput()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original script has size in KB, should it be the same here?

defer r.Body.Close()

size_str := string(body)
size_bytes, err := strconv.ParseUint(size_str, 10, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have JSON-based API, for extensibility and multiple arguments.

Should we use https://github.com/OpenAPITools/openapi-generator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for an ad-hoc JSON-based API

::respawn:/neonvm/bin/chronyd -n -f /neonvm/config/chrony.conf -l /var/log/chrony/chrony.log
::respawn:/neonvm/bin/sshd -E /var/log/ssh.log -f /neonvm/config/sshd_config
::respawn:/neonvm/bin/neonvmd --addr=0.0.0.0:25183
::respawn:env PATH=/neonvm/bin neonvmd --addr=0.0.0.0:25183 --quota-path=/var/db/postgres/compute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think quota path should be specified in the request by compute_ctl, same as does now.

My reasoning is this maintains the isolation between neonvm and compute image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants