-
Notifications
You must be signed in to change notification settings - Fork 10
[LTS 9.4] hugetlb: CVE-2025-38084, CVE-2025-38085, CVE-2024-57883 #819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
pvts-mat
wants to merge
5
commits into
ctrliq:ciqlts9_4
Choose a base branch
from
pvts-mat:ciqlts9_4-CVE-batch-18
base: ciqlts9_4
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+100
−29
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jira VULN-71578 cve-pre CVE-2025-38084 commit-author James Houghton <[email protected]> commit b30c14c PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; however, it is possible that HugeTLB VMAs are split without unsharing the PMDs first. Without this fix, it is possible to hit the uffd-wp-related WARN_ON_ONCE in hugetlb_change_protection [1]. The key there is that hugetlb_unshare_all_pmds will not attempt to unshare PMDs in non-PUD_SIZE-aligned sections of the VMA. It might seem ideal to unshare in hugetlb_vm_op_open, but we need to unshare in both the new and old VMAs, so unsharing in hugetlb_vm_op_split seems natural. [1]: https://lore.kernel.org/linux-mm/CADrL8HVeOkj0QH5VZZbRzybNE8CG-tEGFshnA+bG9nMgcWtBSg@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 6dfeaff ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp") Signed-off-by: James Houghton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit b30c14c) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-71578 cve CVE-2025-38084 commit-author Jann Horn <[email protected]> commit 081056d upstream-diff Used linux-5.15.y backport 366298f2b04d2bf1f2f2b7078405bdf9df9bd5d0 as a base. Modified `hugetlb_unshare_pmds()' to wrap in the `take_lock == true' branches what was there originally. This change is the equivalent of applying upstream 081056d to the `hugetlb_unshare_pmds()' function and linux-5.15.y backport 366298f to everything else. Currently, __split_vma() triggers hugetlb page table unsharing through vm_ops->may_split(). This happens before the VMA lock and rmap locks are taken - which is too early, it allows racing VMA-locked page faults in our process and racing rmap walks from other processes to cause page tables to be shared again before we actually perform the split. Fix it by explicitly calling into the hugetlb unshare logic from __split_vma() in the same place where THP splitting also happens. At that point, both the VMA and the rmap(s) are write-locked. An annoying detail is that we can now call into the helper hugetlb_unshare_pmds() from two different locking contexts: 1. from hugetlb_split(), holding: - mmap lock (exclusively) - VMA lock - file rmap lock (exclusively) 2. hugetlb_unshare_all_pmds(), which I think is designed to be able to call us with only the mmap lock held (in shared mode), but currently only runs while holding mmap lock (exclusively) and VMA lock Backporting note: This commit fixes a racy protection that was introduced in commit b30c14c ("hugetlb: unshare some PMDs when splitting VMAs"); that commit claimed to fix an issue introduced in 5.13, but it should actually also go all the way back. [[email protected]: v2] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 39dde65 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Jann Horn <[email protected]> Cc: Liam Howlett <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> [b30c14c: hugetlb: unshare some PMDs when splitting VMAs] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 366298f2b04d2bf1f2f2b7078405bdf9df9bd5d0) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-46930 cve CVE-2024-57883 commit-author Liu Shixin <[email protected]> commit 59d9094 upstream-diff Stable 6.1 backport 02333ac1c35370517a19a4a131332a9690c6a5c7 was used for the actual (clean) cherry pick. Additionally the `atomic_t pt_share_count' field in `include/linux/mm_types.h' was wrapped in RH_KABI_BROKEN_INSERT macro to avoid kABI checker complains. It's justified, because the inserted field (it's included, as CONFIG_ARCH_WANT_HUGE_PMD_SHARE gets enabled for at least `kernel-x86_64-rhel.config') is placed within a union which already contained a field of the same type `atomic_t pt_frag_refcount', so the size of it cannot change. Moreover this union serves as a scratch space for the subsystems using the struct page. Upon releasing the ownership to buddy allocator the union contents no longer matter. When the page is allocated again the scratch space will be used by the new owner in its own way. The folio refcount may be increased unexpectly through try_get_folio() by caller such as split_huge_pages. In huge_pmd_unshare(), we use refcount to check whether a pmd page table is shared. The check is incorrect if the refcount is increased by the above caller, and this can cause the page table leaked: BUG: Bad page state in process sh pfn:109324 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324 flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff) page_type: f2(table) raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000 page dumped because: nonzero mapcount ... CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G B 6.13.0-rc2master+ ctrliq#7 Tainted: [B]=BAD_PAGE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x80/0xf8 dump_stack+0x18/0x28 bad_page+0x8c/0x130 free_page_is_bad_report+0xa4/0xb0 free_unref_page+0x3cc/0x620 __folio_put+0xf4/0x158 split_huge_pages_all+0x1e0/0x3e8 split_huge_pages_write+0x25c/0x2d8 full_proxy_write+0x64/0xd8 vfs_write+0xcc/0x280 ksys_write+0x70/0x110 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x34/0x128 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x190/0x198 The issue may be triggered by damon, offline_page, page_idle, etc, which will increase the refcount of page table. 1. The page table itself will be discarded after reporting the "nonzero mapcount". 2. The HugeTLB page mapped by the page table miss freeing since we treat the page table as shared and a shared page table will not be unmapped. Fix it by introducing independent PMD page table shared count. As described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390 gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv pmds, so we can reuse the field as pt_share_count. Link: https://lkml.kernel.org/r/[email protected] Fixes: 39dde65 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Liu Shixin <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Ken Chen <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Jane Chu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 02333ac1c35370517a19a4a131332a9690c6a5c7) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-71587 cve CVE-2025-38085 commit-author Jann Horn <[email protected]> commit 1013af4 upstream-diff Stable 6.1 b7754d3aa7bf9f62218d096c0c8f6c13698fac8b was used for the actual (clean) cherry pick huge_pmd_unshare() drops a reference on a page table that may have previously been shared across processes, potentially turning it into a normal page table used in another process in which unrelated VMAs can afterwards be installed. If this happens in the middle of a concurrent gup_fast(), gup_fast() could end up walking the page tables of another process. While I don't see any way in which that immediately leads to kernel memory corruption, it is really weird and unexpected. Fix it with an explicit broadcast IPI through tlb_remove_table_sync_one(), just like we do in khugepaged when removing page tables for a THP collapse. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 39dde65 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit b7754d3aa7bf9f62218d096c0c8f6c13698fac8b) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-46930 cve-bf CVE-2024-57883 commit-author Jane Chu <[email protected]> commit 14967a9 upstream-diff This commit fixes `mm: hugetlb: independent PMD page table shared count' which was included in ciqlts9_4 by cherry-picking stable-6.1 backport 02333ac1c35370517a19a4a131332a9690c6a5c7 of kernel-mainline 59d9094. Differences between 02333ac and 59d9094 were driving the diffs between this commit and the upstrem 14967a9. include/linux/mm_types.h Removed the definition of `ptdesc_pmd_is_shared()' function in alignment with stable-5.15 backport 8410996 (it omits the definition of `ptdesc_pmd_pts_*()' functions family, to which `ptdesc_pmd_is_shared()' belongs). mm/hugetlb.c copy_hugetlb_page_range() 1. Used CONFIG_ARCH_WANT_HUGE_PMD_SHARE instead of CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING, because the latter was introduced only in the non-backported commit 188cac5. 2. Since `ptdesc_pmd_is_shared()' was not defined, read the `pt_share_count' field directly, as is done in the stable-5.15 backport 8410996. (Compare changes to `huge_pmd_unshare()' in `mm/hugetlb.c' between upstream 59d9094 and stable-5.15 8410996.) huge_pmd_unshare() No change to the conditional. It was arguably not needed in the upstream as well, probably introduced only for the sake of clarity in the presence of `ptdesc_pmd_is_shared()' function, which is missing here. commit 59d9094 ("mm: hugetlb: independent PMD page table shared count") introduced ->pt_share_count dedicated to hugetlb PMD share count tracking, but omitted fixing copy_hugetlb_page_range(), leaving the function relying on page_count() for tracking that no longer works. When lazy page table copy for hugetlb is disabled, that is, revert commit bcd51a3 ("hugetlb: lazy page table copies in fork()") fork()'ing with hugetlb PMD sharing quickly lockup - [ 239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s! [ 239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0 [ 239.446631] Call Trace: [ 239.446633] <TASK> [ 239.446636] _raw_spin_lock+0x3f/0x60 [ 239.446639] copy_hugetlb_page_range+0x258/0xb50 [ 239.446645] copy_page_range+0x22b/0x2c0 [ 239.446651] dup_mmap+0x3e2/0x770 [ 239.446654] dup_mm.constprop.0+0x5e/0x230 [ 239.446657] copy_process+0xd17/0x1760 [ 239.446660] kernel_clone+0xc0/0x3e0 [ 239.446661] __do_sys_clone+0x65/0xa0 [ 239.446664] do_syscall_64+0x82/0x930 [ 239.446668] ? count_memcg_events+0xd2/0x190 [ 239.446671] ? syscall_trace_enter+0x14e/0x1f0 [ 239.446676] ? syscall_exit_work+0x118/0x150 [ 239.446677] ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0 [ 239.446681] ? clear_bhb_loop+0x30/0x80 [ 239.446684] ? clear_bhb_loop+0x30/0x80 [ 239.446686] entry_SYSCALL_64_after_hwframe+0x76/0x7e There are two options to resolve the potential latent issue: 1. warn against PMD sharing in copy_hugetlb_page_range(), 2. fix it. This patch opts for the second option. While at it, simplify the comment, the details are not actually relevant anymore. Link: https://lkml.kernel.org/r/[email protected] Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: Jane Chu <[email protected]> Reviewed-by: Harry Yoo <[email protected]> Acked-by: Oscar Salvador <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Jann Horn <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 14967a9) Signed-off-by: Marcin Wcisło <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[LTS 9.4]
CVE-2025-38084 VULN-71578
CVE-2025-38085 VULN-71587
CVE-2024-57883 VULN-46930
About
This PR is the LTS 9.4 version of #731. While CVE-2025-38084 and CVE-2025-38085 are not interdependent, their fixes appeared on upstream in the same branch d3c82f6 and are usually backported together as well (CentOS 9 462b3c3, CentOS 10 42421eb, stable 6.1 b7754d3aa7bf9f62218d096c0c8f6c13698fac8b, stable 5.10 952596b08c74e8fe9e2883d1dc8a8f54a37384ec, stable 5.15 a3d864c901a300c295692d129159fc3001a56185).
Relation to the LTS 9.2 fix
The fix is, for the most part, the same as in LTS 9.2 #731, with some minor differences.
mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race) and it's prerequisite (mm: hugetlb: independent PMD page table shared count) were taken fromlinux-6.1.yinstead oflinux-5.15.y. This was only to avoid context conflicts, aslinux-6.1.yhas more similar history of hugepages module tociqlts9_4than thelinux-5.15.ydoes. The commits diffs are practically the same.mm/hugetlb: make detecting shared pte more reliablewasn't backported as the prerequisite for the bugfix of CVE-2025-38085 fix (themm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_countcommit) because it was already backported tociqlts9_4as 643137f.mm/khugepaged: fix GUP-fast interaction by sending IPIas prerequisite for CVE-2025-38085 fix (mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race) was done, because it was already backported tociqlts9_4in f4c1e18.mm/hugetlb: unshare page tables during VMA split, not before) was taken fromlinux-5.15.yto minimize conflicts compared to the upstream, just like in LTS 9.2 case, however, the end result differs from thelinux-5.15.ypick for the LTS 9.2 version ad741c4 - see functionhugetlb_unshare_pmds().CVE-2025-38085 fix discussion
The LTS 9.2 PR #731 raised suspicion over the kABI breakage, which was eventually resolved. The same situation can be found in this patch set. An attempt was made to avoid the
mm: hugetlb: independent PMD page table shared countcommit requiring the use ofRH_KABI_BROKEN_INSERTas it may not have been strictly required for the fix of CVE-2025-38085. However, it was decided to backport it to LTS 9.4 as well, because only one solution was found - the CentOS 9 fix 12a6db3 - which didn't incorporate this commit as prerequisite. All other analyzed solutions did (CentOS 10 41f7eb5, stable 6.6 fe684290418ef9ef76630072086ee530b92f02b8, stable 6.1 b7754d3aa7bf9f62218d096c0c8f6c13698fac8b, stable 5.15 a3d864c901a300c295692d129159fc3001a56185, stable 5.10 952596b08c74e8fe9e2883d1dc8a8f54a37384ec).The upstream fix 1013af4 message explicitly mentions
which refers to the line right after the introduced
tlb_remove_table_sync_one()call:In
ciqlts9_4withoutmm: hugetlb: independent PMD page table shared countthis line would bejust like it was in CentOS 9 fix 12a6db3. It could not have been determined whether these two situations were similar enough to warrant the exact same fix, so the more established solution was used, even though 12a6db3 cherry-picked cleanly.
Additionally, this prerequisite has its own CVE-2024-57883 associated, which will probably have to be solved eventually anyway.
Commits
CVE-2025-38084
CVE-2025-38085 (+ CVE-2024-57883)
kABI check: passed
Boot test: passed
boot-test.log
Appendix: Backports Overview