Skip to content

Conversation

@mmat11
Copy link
Contributor

@mmat11 mmat11 commented Nov 27, 2025

Checklist

TODO

  • Documentation

@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 10.99476% with 170 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.87%. Comparing base (5e12ca9) to head (080bacc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/internal/ebpf/logenricher/logenricher.go 0.00% 159 Missing ⚠️
pkg/ebpf/common/common_linux.go 61.11% 4 Missing and 3 partials ⚠️
pkg/appolly/discover/finder.go 57.14% 2 Missing and 1 partial ⚠️
pkg/appolly/discover/attacher.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #942      +/-   ##
==========================================
- Coverage   56.27%   55.87%   -0.40%     
==========================================
  Files         255      257       +2     
  Lines       22180    22357     +177     
==========================================
+ Hits        12482    12493      +11     
- Misses       8852     9015     +163     
- Partials      846      849       +3     
Flag Coverage Δ
integration-test 22.67% <5.23%> (-0.23%) ⬇️
integration-test-arm 0.00% <0.00%> (ø)
integration-test-vm-${ARCH}-${KERNEL_VERSION} 0.00% <0.00%> (?)
k8s-integration-test 2.66% <0.00%> (-0.03%) ⬇️
oats-test 0.00% <0.00%> (ø)
unittests 47.32% <6.28%> (-0.34%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mmat11 mmat11 force-pushed the matt/trace-log branch 6 times, most recently from 38b68a9 to 2da8e85 Compare December 4, 2025 16:03
@mmat11 mmat11 force-pushed the matt/trace-log branch 2 times, most recently from 1cf9a17 to 46ecdcc Compare December 5, 2025 14:40
@mmat11 mmat11 changed the title RFC: wip: trace-log correlation Implement trace-log correlation Dec 5, 2025
@mmat11 mmat11 marked this pull request as ready for review December 5, 2025 14:40
@mmat11 mmat11 requested a review from a team as a code owner December 5, 2025 14:40
Signed-off-by: Mattia Meleleo <[email protected]>
}
}

func (p *Tracer) Load() (*ebpf.CollectionSpec, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use bpf_probe_write_user, can we please check like we do in the gotracer if this is allowed:

func (p *Tracer) supportsContextPropagation() bool {
	return !ebpfcommon.IntegrityModeOverride && ebpfcommon.SupportsContextPropagationWithProbe(p.log)
}

If possible, when this is not allowed we should not load the program at all and tell people with WARN that log enrinchment will not work because they are on a locked kernel.

}
}

f, err := os.OpenFile(filePath, os.O_WRONLY, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned this might be slow, opening and closing the file every time? Would it make sense to cache file descriptors with some sort of a timeout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe we need a different approach here - opening and closing it for each event can cause a massive overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, I can add a fd cache

Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pass. We should run some scalability tests here before being able to merge this (e.g. deploy it in a big cluster and gather performance data). I am worried about a few points:

  • pressure on the ringbuffer: the shared ringbuffer will now be flooded by lines of logs of every tracked process - that alone is concerning, and it gets worse if we take into account the same ring buffer is used for other kinds of events
  • overhead of context-switching for each _write kprobe / CPU usage and ksoftirq handlers - you can gather performance data using perf for that

The handler for the logs should probably grab the event, and put it into a processing queue to be picked up by another thread or thread pool - doing heavy work like deserialising json or opening and closing files (i.e. I/O) is very slow and will block the ring buffer. That handler should be doing the minimum work possible and return (e.g. copy the event, enqueue it and return).

We should also track the target process stdout (or log sink), open it once for writing, and keep the file descriptor open until either the process or OBI itself terminates, as @grcevski suggested.

Comment on lines +63 to +64
size_t count = BPF_CORE_READ(from, count);
long offset = bpf_core_field_offset(struct iov_iter, count) - 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could these also be const?

struct iovec iov = {};
bpf_probe_read(&iov, sizeof(iov), (char *)from + offset);

u64 pid_tgid = bpf_get_current_pid_tgid();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

Comment on lines +91 to +96
if (fd == 0) {
struct path path = BPF_CORE_READ(iocb, ki_filp, f_path);
resolve_path((char *)e->file_path, &path, task);
} else {
e->file_path[0] = '\0';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to add a comment explaining what this block does (or even splitting it into its own function if you think that is feasible)?

if (e->len > 0) {
// from this point on, the responsibility of writing to stdout is on us,
// so if something fails, we must always fallback to writing the original data
long err = bpf_ringbuf_output(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also be const


// reapply bounds for the verifier
const u16 zeroed_count = e->len & k_log_event_max_log_mask;
if (zeroed_count) { // this is never zero, but the verifier is not smart enough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because it's theoretically not guaranteed that e->len hasn't been modified after your first check to if (e->len > 0) - it's a pointer to a map element - so theoretically userspace on another CPU could modify this element before this instruction point is reached.

You can try to store the len on the stack and give this a shot:

const size_t effective_len = count & k_log_event_max_log_mask;
e->len = effective_len;

/* ... use effective_len instead of e->len henceforth ... */

if (effective_len > 0) {
    /* ... */
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, I'll try next week

Comment on lines +38 to +43
buf[0] = '\0';

if (!(dentry_arr = bpf_map_lookup_elem(&path_resolver_scratch, &(u32){0}))) {
bpf_dbg_printk("resolve_path: could not get path resolver scratch area");
goto out_err;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buf[0] = '\0';
if (!(dentry_arr = bpf_map_lookup_elem(&path_resolver_scratch, &(u32){0}))) {
bpf_dbg_printk("resolve_path: could not get path resolver scratch area");
goto out_err;
}
buf[0] = '\0';
dentry_arr = bpf_map_lookup_elem(&path_resolver_scratch, &(u32){0});
if (!dentry_arr) {
bpf_dbg_printk("resolve_path: could not get path resolver scratch area");
return false;
}

rationale: clarity (and no need to set buf[0] twice)

// Resolve a struct path to a string. Returns a boolean indicating whether the
// path construction errored/was truncated or was successful.
static __always_inline bool
resolve_path(char *buf, struct path *path, const struct task_struct *task) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try to pass the buf_size here instead of implicitly relying on k_pts_file_path_len_max

// Loop 1, follow the dentry chain (up to a maximum of
// k_path_resolver_max_components) and store pointers to each dentry in
// dentry_arr
for (int i = 0; i < k_path_resolver_max_components; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could simplify this into a single loop by simply walking backwards and write directly to buf. Something like this (pseudo-code):

static __always_inline size_t
resolve_path(char *buf, size_t buflen, struct path *path, const struct task_struct *task) {
    /* ... setup task_struct, etc ... */

    /* these are the main pointers, we iterate backwards into dentry, and then
     * once we hit the current mnt_root, we swap the pointers, rinse and
     * repeat
     */
    struct vfsmount *mnt;
    struct dentry *dentry;

    char *end = buf + buflen;

    *--end = '\0';

    /* so here we start from the leaf path part */
    mnt = path->mnt;
    dentry = path->dentry;

    for (u32 i = 0; i < k_path_resolver_max_components; ++i) {
        struct dentry *parent;

        /* while we are in the samen mnt, we keep prepending the path parts
         * for each dentry
         */

        while (dentry != mnt->mnt_root) {
            int len = dentry->d_name.len;

            /* account for '/' + name */
            if (end - buf < len + 1) {
                /* fail returning 0 bytes written */
                return 0u;
            }

            /* copy name backwards - this is pseudo / regular C, need to read
             * the len, qstr, etc... with bpf_probe_read_kernel() or
             * BPF_CORE_READ()
             */
            end -= len;

            /* read the path part  and preprent '/' */
            bpf_probe_read_kernel(end, len, dentry->d_name.name);

            *--end = '/';

            /* move to parent dentry */
            parent = dentry->d_parent;

            if (parent == dentry)
                break;

            dentry = parent;
        }

        /* if we getting here, we got to the current's dentry chain mnt_root -
         * so all we do is rinse and repeat and switch the dentry and mnt
         * pointers to the next level
         */
        if (mnt == mnt->mnt_parent)
            break;

        dentry = mnt->mnt_mountpoint;
        mnt = mnt->mnt_parent;
    }

    // If we didn't write anything but the terminator, path is "/"
    if (*end == '\0') {
        if (end == buf)
            return 0;
        *--end = '/';
    }

    /* return the number of bytes read, you can use 0 to check if anything was
     * written
     */
    return buflen - end + buf;
}

or something like that - then in userspace you can simply iterate bytes_written event from the end of the buffer to extract the path. This makes it simpler and does ithout the need for the map to collect dentry pointers

if (bpf_core_enum_value_exists(enum tty_driver_type__new, TTY_DRIVER_TYPE_PTY)) {
int typ = bpf_core_enum_value(enum tty_driver_type__new, TTY_DRIVER_TYPE_PTY);
return BPF_CORE_READ(tty, driver, type) == typ;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should remove this else, it's redundant (we should probably enable -Wredundant-parens in clang-tidy).

}
}

f, err := os.OpenFile(filePath, os.O_WRONLY, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe we need a different approach here - opening and closing it for each event can cause a massive overhead.

@mmat11
Copy link
Contributor Author

mmat11 commented Dec 5, 2025

Another pass. We should run some scalability tests here before being able to merge this (e.g. deploy it in a big cluster and gather performance data). I am worried about a few points:

pressure on the ringbuffer: the shared ringbuffer will now be flooded by lines of logs of every tracked process - that alone is concerning, and it gets worse if we take into account the same ring buffer is used for other kinds of events

I am using a dedicated ring buffer (see log_events), we can discuss whether we want its size configurable (or auto-adjusting depending on the number of tracked services)

overhead of context-switching for each _write kprobe / CPU usage and ksoftirq handlers - you can gather performance data using perf for that

Will do that 👍🏻

The handler for the logs should probably grab the event, and put it into a processing queue to be picked up by another thread or thread pool - doing heavy work like deserialising json or opening and closing files (i.e. I/O) is very slow and will block the ring buffer. That handler should be doing the minimum work possible and return (e.g. copy the event, enqueue it and return).

I didn't want to add additional complexity without having measured some metrics or observed some bottlenecks, maybe we can do this after that step?

We should also track the target process stdout (or log sink), open it once for writing, and keep the file descriptor open until either the process or OBI itself terminates, as @grcevski suggested.

👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants