-
Notifications
You must be signed in to change notification settings - Fork 338
DAOS-17893 vos: check objects before freeing #17293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/1/execution/node/279/log |
|
Test stage Build on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/1/execution/node/287/log |
|
Test stage Build on EL 9.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/1/execution/node/371/log |
|
Errors are Unable to load ticket data |
|
Test stage Build on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/1/execution/node/405/log |
6bbd198 to
0824861
Compare
|
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/2/execution/node/301/log |
|
Test stage Build on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/2/execution/node/317/log |
|
Test stage Build on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/2/execution/node/411/log |
0824861 to
1508a5e
Compare
|
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/3/execution/node/301/log |
|
Test stage Build on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/3/execution/node/317/log |
Signed-off-by: Jan Michalski <[email protected]>
assert + memory dump for debug. Signed-off-by: Jan Michalski <[email protected]>
1508a5e to
3ad502c
Compare
|
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/4/execution/node/302/log |
|
Test stage Build on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/4/execution/node/318/log |
|
Test stage Build on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/4/execution/node/412/log |
Signed-off-by: Jan Michalski <[email protected]>
|
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17293/5/testReport/ |
|
Test stage Unit Test on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/5/execution/node/668/log |
|
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17293/5/testReport/ |
- add d_log_memory_ut to utest.yaml - change D_EMIT to D_FATAL so it is easily visible - move d_log_memory() call before d_alt_assert() call Signed-off-by: Jan Michalski <[email protected]>
- fix DTX_ACT_BLOB_MAGIC assert - remove the ILOG assert (left a comment to spare making the mistake again) Signed-off-by: Jan Michalski <[email protected]>
|
Test stage Unit Test on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17293/6/execution/node/668/log |
|
Test stage Unit Test bdev on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17293/6/testReport/ |
|
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17293/6/testReport/ |
|
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17293/6/testReport/ |
- fix the magic assert for committed DBDs Signed-off-by: Jan Michalski <[email protected]>
| * Since the key's structure does not have a magic value and the ilog root (which has | ||
| * a magic value) is already destroyed at this stage there is no way to verify the pointer | ||
| * actually points to a valid data. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the assert is removed? It was incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It was incorrect. As I have written in the comment, at this stage the key's ILOG is already destroyed, so we cannot use it. A pity.
Nasf-Fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence, this PR introduces a few asserts into the GC code so whenever it is possible to validate the offset GC is about to free whether it actually points to an object we expect to live there we assert it actually is as expected and we dump its contents if not for further investigation.
The commit message is out of date. Please update when you have other chance to refresh the patch. Thanks.
| D_FATAL("Assertion '%s' failed: " fmt, #cond, ##__VA_ARGS__); \ | ||
| d_log_memory((uint8_t *)ptr, size); \ | ||
| if (d_alt_assert != NULL) \ | ||
| d_alt_assert(0, #cond, __FILE__, __LINE__); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"else assert(0);" ?
DAOS-17893 is a ticket reporting a crash in DAVv2 happened while freeing an allocation which happened because the provided offset was not a beginning of a memory block but the memory block at hand was not a run. The allocator itself could check for this kind of discrepancies and report before the process will be terminated by a SIGFPE signal. But the higher the issue would be caught the more information we could potentially recover from the crash.
One such place, notorious to be involved in this kind of incidents is the VOS garbage collector (e.g. DAOS-18049). Possibly not because it is more buggy than any other piece of DAOS rather it is a place where we enumerate large chunks of the VOS metadata in order to free the requested objects and all their descendants.
Hence, this PR introduces a few asserts into the GC code so whenever it is possible to validate the offset GC is about to free whether it actually points to an object we expect to live there we assert it actually is as expected and we dump its contents if not for further investigation.
Steps for the author:
After all prior steps are complete: