Add crashdump example and include snapshot/scratch in core dumps#1264
Add crashdump example and include snapshot/scratch in core dumps#1264jsturtevant wants to merge 6 commits intohyperlight-dev:mainfrom
Conversation
|
@jsturtevant I believe you were working on another version of this that explicitly walks through the whole guest virtual address space? |
Yes, been working with @dblnz on getting it working. Should have something today |
ba9bd81 to
4181d57
Compare
Co-authored-by: Doru Blânzeanu <dblnz@pm.me> Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
6ab40fe to
21b7183
Compare
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
The runtime config was passed by reference into set_up_hypervisor_partition then immediately cloned, but no caller needs it afterward so it is now passed by value. The entry point field uses Option<u64> instead of a bare zero default so a missing value is detectable rather than silently producing a bogus AT_ENTRY. Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
generate_crashdump_to_dir accepts the output directory as a parameter instead of requiring callers to set the HYPERLIGHT_CORE_DUMP_DIR environment variable. This removes the need for unsafe std::env::set_var in tests while preserving the existing env-var fallback path for the automatic dump and the no-argument generate_crashdump method. Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
| } else if gpa >= SandboxMemoryLayout::BASE_ADDRESS as u64 { | ||
| } else if gpa >= SandboxMemoryLayout::BASE_ADDRESS as u64 | ||
| && gpa < SandboxMemoryLayout::BASE_ADDRESS as u64 + snap.mem_size() as u64 | ||
| { |
There was a problem hiding this comment.
I remember this was a bug fix for some regions that are not in the snapshot and not in the scratch region either, but in between, but the previous logic returned the snapshot memory
There was a problem hiding this comment.
this makes it explicit that it will look for a GPA in both were as previously it was also looking at values that where past the end of either. Here snap is
hyperlight/src/hyperlight_host/src/mem/layout.rs
Lines 20 to 35 in 7741d51
scratch is hyperlight/src/hyperlight_host/src/mem/layout.rs
Lines 48 to 61 in 7741d51
There was a problem hiding this comment.
Yes, in the case of the snapshot code, which was formerly the only user of this code, this was being handled in the function guest_page, and the entire call to access_gpa was avoided. Now that there are more users of this VA enumeration, and crashdump/gdb are likely to continue to need it, we should see if the interface can be factored out a bit better (made to not rely on exclusive memory access, but to take advantage of it where available; and made to unify scratch/snapshot handling and mapped region handling).
| }); | ||
| } else { | ||
| // GPA not in snapshot/scratch — check dynamic mmap regions | ||
| resolve_from_mmap_regions( |
There was a problem hiding this comment.
I am not 100%, but from what I see, this function creates a region that exactly matches the mapping in size.
I am wondering, wouldn't the inclusion of dynamic mmap regions below take care of this by including the whole mmap region?
Note: I remember that when I was working on this, the dumps wouldn't correctly load without this function call, but I'm just saying, maybe I'm missing something.
There was a problem hiding this comment.
this makes sure the dump preserves the GVA (virt_base, virt_end) instead of the physical address which is what the other loop does. The below mmap regions includes the GVA locations just for full dump but likely won't be used in gdb
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
dblnz
left a comment
There was a problem hiding this comment.
LGTM! Having participated in these changes, I'd feel more comfortable if other people had a look/approved this 😄
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
ludfjig
left a comment
There was a problem hiding this comment.
this is great, i like the added tests!
| /// `usize` host addresses. The crash dump path only reads host memory | ||
| /// through raw pointers, so it never needs the file-mapping metadata | ||
| /// stored in [`HostRegionBase`] on Windows. | ||
| #[cfg(feature = "crashdump")] |
There was a problem hiding this comment.
The new types in memory_region.rs and helpers in mgr.rs use #[cfg(feature = "crashdump")] while most of the rest of the crate the #[cfg(crashdump)] alias. Maybe we can remove the alias since it doens't serve a purpose (I seem to remember it was only usable in debug profile before, but that no longer seems to be the case), OR switch to using the alias everywhere
| /// With `init-paging` enabled, walks the guest page tables to discover | ||
| /// GVA→GPA mappings and translates them to host-backed regions. Also | ||
| /// includes any dynamic mmap regions that may not be covered by the | ||
| /// page table walk, and a sentinel page at the stack top for clean |
There was a problem hiding this comment.
outdated doc since the sentinel page was removed i believe
| if !gdb_is_available() { | ||
| eprintln!("Skipping test: {GDB_COMMAND} not found on PATH"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Was this needed in CI? Do we not want the test to fail here?
syntactically
left a comment
There was a problem hiding this comment.
The direction of this looks good! I have a few comments, most of which are minor nits or just signposting things I thought were interesting points to discuss; the only thing that I'm actually particularly concerned about with the present state of the code is the mmap regions being present in the dump at VAs where they are not actually present in the sandbox.
At a higher level, it occurs to me that an alternative for crashdumps might be to just take a snapshot, and then write a separate utility that prepares a crashdump from a snapshot. That's probably worse, since in the future we might not be able to take a snapshot at arbitrary times, but only when I/O to the guest is quiet? However, I wanted to mention the idea to see what anyone else thought.
| /// back to the host as `GuestAborted` errors. | ||
| /// | ||
| /// However, some faults are intercepted by the **hypervisor** before the | ||
| /// CPU delivers them to the guest. For example, when the guest writes to |
There was a problem hiding this comment.
Minor nit, but this example is not the hypervisor "intercepting a fault that was originally destined for the guest": the fault was always targeted at the hypervisor, because it was the hypervisor's own page tables that caused it. There are intercepts that allow us to redirect faults to the hv (on amd64), but we don't use a tonne of them.
|
|
||
| // This call triggers a ud2 instruction in the guest. The guest's IDT | ||
| // catches the #UD exception and reports it back to the host as a | ||
| // GuestAborted error via I/O. This does NOT trigger an automatic crash |
There was a problem hiding this comment.
Why do we have this distinction? We should be clear about exactly which exceptions result in dumps and which do not, and whether there is any relationship between that and which faults poison the sandbox.
| /// so GDB can compute the correct load offset for PIE binaries. | ||
| #[cfg(crashdump)] | ||
| pub(crate) entry_point: u64, | ||
| } |
There was a problem hiding this comment.
This is not new to this PR, so feel free to file an issue and defer, but I wanted to see if you had thoughts in this context first. This addition brings to my attention the question: what is the expected lifecycle of SandboxRuntimeConfig? Some parts, like debug_info and guest_core_dump, seem like they are config of a specific sandbox instance (regardless of what guest/snapshot is being run inside of that instance), while things like entry_point and guest_core_dump seem like they are related fundamentally to the snapshot that's underlying the sandbox, and would need to be reset on restore to have sensible behaviour for a single sandbox that urns multiple different guests at different times.
| ) -> Option<(&'a ExclusiveSharedMemory, usize)> { | ||
| let scratch_base = scratch_base_gpa(scratch_size); | ||
| if gpa >= scratch_base { | ||
| if gpa >= scratch_base && gpa < scratch_base + scratch_size as u64 { |
There was a problem hiding this comment.
I think this change makes sense, since it's good for the contract here to be that the returned offset exists in the memory (and an error would happen fast if not).
However, I'm curious if this change was motivated by actually finding a case where it would make a difference or not---it feels like a bug if there is a pointer to above the scratch region (only 1 unmapped page exists there), so if that was the cause of the change I'd be curious where the access was coming from.
| } else if gpa >= SandboxMemoryLayout::BASE_ADDRESS as u64 { | ||
| } else if gpa >= SandboxMemoryLayout::BASE_ADDRESS as u64 | ||
| && gpa < SandboxMemoryLayout::BASE_ADDRESS as u64 + snap.mem_size() as u64 | ||
| { |
There was a problem hiding this comment.
Yes, in the case of the snapshot code, which was formerly the only user of this code, this was being handled in the function guest_page, and the entire call to access_gpa was avoided. Now that there are more users of this VA enumeration, and crashdump/gdb are likely to continue to need it, we should see if the interface can be factored out a bit better (made to not rely on exclusive memory access, but to take advantage of it where available; and made to unify scratch/snapshot handling and mapped region handling).
| if *executable { | ||
| flags |= MemoryRegionFlags::EXECUTE; | ||
| } | ||
| (flags, MemoryRegionType::Code) |
There was a problem hiding this comment.
This MemoryRegionType doesn't seem right, but then again I'm not sure if we're actually using the MemoryRegionTypes for anything, and they were already a bit inconsistent/nonsensical...
| }) | ||
| })???; | ||
|
|
||
| // Include dynamic mmap regions at their GVA addresses as a fallback |
There was a problem hiding this comment.
This doesn't seem quite right. The mmap region guest_regions are GPAs not GVAs, and the rest of the crashdump is (much like a snapshot) handled entirely in GVA space. Code could well be doing something important at some GVA that is equal to a GPA where something entirely different is present, and this would overwrite the former with the contents of the latter as far as I understand it. Regions that are mapped into some guest physical address space, but not mapped from there to anywhere in VA space, are basically inaccessible to any code (unless it bothers to map them in first), since we don't have a complete physmap anyway...
| let host_base = HostGuestMemoryRegion::to_addr(rgn.host_region.start) + offset; | ||
| let host_end = host_base + mapping.len as usize; | ||
|
|
||
| regions.push(CrashDumpRegion { |
There was a problem hiding this comment.
Should the coalesce logic above run here as well?
Core dumps generated by Hyperlight were missing the snapshot and scratch memory regions, making post-mortem debugging with GDB incomplete — register state was present but the guest's code, stack, heap, and page tables were absent. This adds the snapshot and scratch regions to the ELF core dump alongside any dynamically mapped regions so that GDB can show full backtraces, disassemble at the crash site, and inspect guest memory. A new runnable crashdump example demonstrates automatic dumps (VM-level faults), on-demand dumps (guest-caught exceptions), and per-sandbox opt-out, with GDB-based integration tests that validate register and memory content in the generated ELF files. The debugging docs are also updated with practical GDB commands for inspecting crash dumps.