-
Notifications
You must be signed in to change notification settings - Fork 60
Implement trace-log correlation #942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #942 +/- ##
==========================================
- Coverage 56.27% 55.87% -0.40%
==========================================
Files 255 257 +2
Lines 22180 22357 +177
==========================================
+ Hits 12482 12493 +11
- Misses 8852 9015 +163
- Partials 846 849 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
38b68a9 to
2da8e85
Compare
1cf9a17 to
46ecdcc
Compare
46ecdcc to
9ca4e7a
Compare
Signed-off-by: Mattia Meleleo <[email protected]>
9ca4e7a to
080bacc
Compare
| } | ||
| } | ||
|
|
||
| func (p *Tracer) Load() (*ebpf.CollectionSpec, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we use bpf_probe_write_user, can we please check like we do in the gotracer if this is allowed:
func (p *Tracer) supportsContextPropagation() bool {
return !ebpfcommon.IntegrityModeOverride && ebpfcommon.SupportsContextPropagationWithProbe(p.log)
}
If possible, when this is not allowed we should not load the program at all and tell people with WARN that log enrinchment will not work because they are on a locked kernel.
| } | ||
| } | ||
|
|
||
| f, err := os.OpenFile(filePath, os.O_WRONLY, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we concerned this might be slow, opening and closing the file every time? Would it make sense to cache file descriptors with some sort of a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I believe we need a different approach here - opening and closing it for each event can cause a massive overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, I can add a fd cache
rafaelroquetto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another pass. We should run some scalability tests here before being able to merge this (e.g. deploy it in a big cluster and gather performance data). I am worried about a few points:
- pressure on the ringbuffer: the shared ringbuffer will now be flooded by lines of logs of every tracked process - that alone is concerning, and it gets worse if we take into account the same ring buffer is used for other kinds of events
- overhead of context-switching for each
_writekprobe / CPU usage and ksoftirq handlers - you can gather performance data usingperffor that
The handler for the logs should probably grab the event, and put it into a processing queue to be picked up by another thread or thread pool - doing heavy work like deserialising json or opening and closing files (i.e. I/O) is very slow and will block the ring buffer. That handler should be doing the minimum work possible and return (e.g. copy the event, enqueue it and return).
We should also track the target process stdout (or log sink), open it once for writing, and keep the file descriptor open until either the process or OBI itself terminates, as @grcevski suggested.
| size_t count = BPF_CORE_READ(from, count); | ||
| long offset = bpf_core_field_offset(struct iov_iter, count) - 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could these also be const?
| struct iovec iov = {}; | ||
| bpf_probe_read(&iov, sizeof(iov), (char *)from + offset); | ||
|
|
||
| u64 pid_tgid = bpf_get_current_pid_tgid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
| if (fd == 0) { | ||
| struct path path = BPF_CORE_READ(iocb, ki_filp, f_path); | ||
| resolve_path((char *)e->file_path, &path, task); | ||
| } else { | ||
| e->file_path[0] = '\0'; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to add a comment explaining what this block does (or even splitting it into its own function if you think that is feasible)?
| if (e->len > 0) { | ||
| // from this point on, the responsibility of writing to stdout is on us, | ||
| // so if something fails, we must always fallback to writing the original data | ||
| long err = bpf_ringbuf_output( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can also be const
|
|
||
| // reapply bounds for the verifier | ||
| const u16 zeroed_count = e->len & k_log_event_max_log_mask; | ||
| if (zeroed_count) { // this is never zero, but the verifier is not smart enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because it's theoretically not guaranteed that e->len hasn't been modified after your first check to if (e->len > 0) - it's a pointer to a map element - so theoretically userspace on another CPU could modify this element before this instruction point is reached.
You can try to store the len on the stack and give this a shot:
const size_t effective_len = count & k_log_event_max_log_mask;
e->len = effective_len;
/* ... use effective_len instead of e->len henceforth ... */
if (effective_len > 0) {
/* ... */
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, I'll try next week
| buf[0] = '\0'; | ||
|
|
||
| if (!(dentry_arr = bpf_map_lookup_elem(&path_resolver_scratch, &(u32){0}))) { | ||
| bpf_dbg_printk("resolve_path: could not get path resolver scratch area"); | ||
| goto out_err; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| buf[0] = '\0'; | |
| if (!(dentry_arr = bpf_map_lookup_elem(&path_resolver_scratch, &(u32){0}))) { | |
| bpf_dbg_printk("resolve_path: could not get path resolver scratch area"); | |
| goto out_err; | |
| } | |
| buf[0] = '\0'; | |
| dentry_arr = bpf_map_lookup_elem(&path_resolver_scratch, &(u32){0}); | |
| if (!dentry_arr) { | |
| bpf_dbg_printk("resolve_path: could not get path resolver scratch area"); | |
| return false; | |
| } |
rationale: clarity (and no need to set buf[0] twice)
| // Resolve a struct path to a string. Returns a boolean indicating whether the | ||
| // path construction errored/was truncated or was successful. | ||
| static __always_inline bool | ||
| resolve_path(char *buf, struct path *path, const struct task_struct *task) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should try to pass the buf_size here instead of implicitly relying on k_pts_file_path_len_max
| // Loop 1, follow the dentry chain (up to a maximum of | ||
| // k_path_resolver_max_components) and store pointers to each dentry in | ||
| // dentry_arr | ||
| for (int i = 0; i < k_path_resolver_max_components; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could simplify this into a single loop by simply walking backwards and write directly to buf. Something like this (pseudo-code):
static __always_inline size_t
resolve_path(char *buf, size_t buflen, struct path *path, const struct task_struct *task) {
/* ... setup task_struct, etc ... */
/* these are the main pointers, we iterate backwards into dentry, and then
* once we hit the current mnt_root, we swap the pointers, rinse and
* repeat
*/
struct vfsmount *mnt;
struct dentry *dentry;
char *end = buf + buflen;
*--end = '\0';
/* so here we start from the leaf path part */
mnt = path->mnt;
dentry = path->dentry;
for (u32 i = 0; i < k_path_resolver_max_components; ++i) {
struct dentry *parent;
/* while we are in the samen mnt, we keep prepending the path parts
* for each dentry
*/
while (dentry != mnt->mnt_root) {
int len = dentry->d_name.len;
/* account for '/' + name */
if (end - buf < len + 1) {
/* fail returning 0 bytes written */
return 0u;
}
/* copy name backwards - this is pseudo / regular C, need to read
* the len, qstr, etc... with bpf_probe_read_kernel() or
* BPF_CORE_READ()
*/
end -= len;
/* read the path part and preprent '/' */
bpf_probe_read_kernel(end, len, dentry->d_name.name);
*--end = '/';
/* move to parent dentry */
parent = dentry->d_parent;
if (parent == dentry)
break;
dentry = parent;
}
/* if we getting here, we got to the current's dentry chain mnt_root -
* so all we do is rinse and repeat and switch the dentry and mnt
* pointers to the next level
*/
if (mnt == mnt->mnt_parent)
break;
dentry = mnt->mnt_mountpoint;
mnt = mnt->mnt_parent;
}
// If we didn't write anything but the terminator, path is "/"
if (*end == '\0') {
if (end == buf)
return 0;
*--end = '/';
}
/* return the number of bytes read, you can use 0 to check if anything was
* written
*/
return buflen - end + buf;
}
or something like that - then in userspace you can simply iterate bytes_written event from the end of the buffer to extract the path. This makes it simpler and does ithout the need for the map to collect dentry pointers
| if (bpf_core_enum_value_exists(enum tty_driver_type__new, TTY_DRIVER_TYPE_PTY)) { | ||
| int typ = bpf_core_enum_value(enum tty_driver_type__new, TTY_DRIVER_TYPE_PTY); | ||
| return BPF_CORE_READ(tty, driver, type) == typ; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should remove this else, it's redundant (we should probably enable -Wredundant-parens in clang-tidy).
| } | ||
| } | ||
|
|
||
| f, err := os.OpenFile(filePath, os.O_WRONLY, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I believe we need a different approach here - opening and closing it for each event can cause a massive overhead.
I am using a dedicated ring buffer (see
Will do that 👍🏻
I didn't want to add additional complexity without having measured some metrics or observed some bottlenecks, maybe we can do this after that step?
👍🏻 |
Checklist
TODO