Skip to content

Conversation

@Donny9
Copy link
Contributor

@Donny9 Donny9 commented Dec 30, 2025

Summary

This PR contains four bug fixes and improvements to the NuttX file system layer, specifically focusing on file descriptor management and event handling:

1. Fix fd allocation exceeding OPEN_MAX (fs/inode/fs_files.c)

The OPEN_MAX boundary check in fdlist_extend() was incorrectly using orig_rows instead of (orig_rows + 1) when validating whether the new allocation would exceed limits. This allowed file descriptor allocation to succeed even when it would exceed OPEN_MAX, potentially causing memory corruption or exceeding system resource limits.

Example: With OPEN_MAX=256 and CONFIG_NFILE_DESCRIPTORS_PER_BLOCK=32, when orig_rows=8 (already at 256 fds), the old code would check 32 * 8 > 256 (false, allows extension to 288), while the fixed code correctly checks 32 * 9 > 256 (true, blocks extension).

2. Fix fd tag loss during parent-to-child inheritance (fs/inode/fs_files.c)

When child processes inherited file descriptors from parent via fdlist_copy(), the fd tags (fd_tag_fdsan and fd_tag_fdcheck) were not being preserved. This caused assertion failures when child processes closed inherited fds, as the fdcheck/fdsan subsystems expected matching tags.

The fix adds a copy parameter to fdlist_install() to distinguish between new fd allocation (initialize fresh tags) and fd inheritance (preserve parent tags). This is critical for any application using fork/clone with inherited file descriptors when CONFIG_FDSAN or CONFIG_FDCHECK are enabled.

3. Fix busy loop in eventfd/timerfd (fs/vfs/fs_eventfd.c, fs/vfs/fs_timerfd.c)

Updated semaphore handling to properly advance to the next waiting task, preventing busy loops in event and timer file descriptor operations. Without this fix, certain polling scenarios could cause the system to spin unnecessarily.

Impact

Users:

  • Applications using fork/clone with CONFIG_FDSAN or CONFIG_FDCHECK will no longer crash when closing inherited file descriptors
  • Systems can no longer exceed OPEN_MAX limit, preventing potential memory corruption
  • Debug builds will clearly identify file descriptor leaks instead of obscure failures
  • Event and timer fd operations will be more efficient without busy loops

Build Process:

  • No build process changes
  • All changes are runtime behavior fixes

Hardware:

  • No hardware-specific changes
  • Fixes apply to all architectures

Documentation:

  • No documentation changes required

Security:

  • Prevents potential buffer overflow/memory corruption from exceeding OPEN_MAX
  • Improves system stability and resource management

Compatibility:

  • All changes are backward compatible
  • Fixes broken functionality without changing APIs
  • May expose existing bugs in applications that were exceeding OPEN_MAX undetected

Testing

Test 1: OPEN_MAX Boundary Check

Test Procedure:

// Test program that attempts to open more than OPEN_MAX files
int fds[300];
int count = 0;
for (int i = 0; i < 300; i++) {
    fds[i] = open("/dev/null", O_RDONLY);
    if (fds[i] < 0) {
        printf("Failed at count %d, errno=%d\n", count, errno);
        break;
    }
    count++;
}

Before Fix:

Could allocate up to 288 fds (exceeded OPEN_MAX=256 by 32)
Potential memory corruption
After Fix:

Correctly stops at or before OPEN_MAX (253-256 depending on stdio fds)
Returns -EMFILE when limit reached
No memory corruption

Test 2: Fork with Inherited FDs

Test Procedure:

// Parent opens files
int fd1 = open("/tmp/test.txt", O_RDWR);
int fd2 = socket(AF_INET, SOCK_STREAM, 0);

// Fork child process
pid_t pid = fork();
if (pid == 0) {
    // Child closes inherited fds
    close(fd1);
    close(fd2);
    exit(0);
}

After Fix:

Child successfully closes inherited fds
fd_tag_fdsan and fd_tag_fdcheck properly preserved
No crashes, all tests pass
Test Log:
Ran complete test suite with uv_run_tests_main including:

run_helper_tcp4_echo_server
Multiple fork/close cycles
File descriptor inheritance scenarios
All tests PASSED without assertions

OSTest Verification:

Ran full OSTest suite with all patches applied:

nsh> ostest
...

All tests PASSED

Before this fix, the check for OPEN_MAX was performed using orig_rows
instead of the new row count after extension. This caused the check to
pass even when the allocation would exceed OPEN_MAX limit.

For example:
- OPEN_MAX = 256
- CONFIG_NFILE_DESCRIPTORS_PER_BLOCK = 32
- orig_rows = 8 (8 * 32 = 256, at the limit)

The old code checked: 32 * 8 > 256 (false, allows extension)
The new code checks: 32 * (8 + 1) > 256 (true, correctly blocks)

Without this fix, the system could allocate more file descriptors than
OPEN_MAX allows, potentially causing memory corruption or exceeding
system limits.

This fix ensures the check evaluates the new total count (orig_rows + 1)
before allowing the extension to proceed.

Signed-off-by: dongjiuzhu1 <[email protected]>
…nheritance

When a child process inherits file descriptors from its parent via
fdlist_copy(), the fd tags (fd_tag_fdsan and fd_tag_fdcheck) were not
being copied. This caused assertion failures when the child process
closed inherited file descriptors, because the fdcheck/fdsan subsystems
expected the tags to match.

Root Cause:
----------
The fdlist_install() function was not preserving the fd tags during
fd duplication. When copying fds from parent to child in fdlist_copy(),
the tags were lost, resulting in:
- fd_tag_fdsan: Used for file descriptor ownership tracking (FDSAN)
- fd_tag_fdcheck: Used for fd validity checking (FDCHECK)

Both tags being reset to 0/NULL instead of copied from parent.

Symptom:
--------
Child processes would crash with assertion failure in fdcheck_restore()
when closing inherited file descriptors:

  fdcheck_restore+0x69/0xa0
  fdlist_get2+0x11/0x48
  fdlist_close+0xd/0x94
  close+0x15/0x30

This occurred because fdcheck_restore() validates that the fd_tag_fdcheck
matches the expected value, and the mismatch triggered an assertion.

Solution:
---------
1. Add a 'copy' parameter to fdlist_install() to distinguish between:
   - New fd allocation (copy=false): Initialize fresh tags
   - Fd duplication (copy=true): Preserve tags from source fd

2. Add fdp parameter to fdlist_install() to access source fd tags

3. In fdlist_copy(), pass copy=true to preserve parent's fd tags

4. In fdlist_dup3(), pass copy=false since dup operations should
   create independent fd tracking (not copy parent tags)

Changes:
--------
- fdlist_install(): Added 'fdp' and 'copy' parameters
- When copy=true, preserve fd_tag_fdsan and fd_tag_fdcheck from source
- fdlist_copy(): Pass copy=true to preserve parent tags
- fdlist_dup3(): Pass copy=false for normal dup behavior

Impact:
-------
This fix ensures that file descriptor ownership tracking and validity
checking work correctly across fork/clone operations, preventing
crashes when child processes close inherited file descriptors.

Without this fix, any program using fork() with inherited fds would
crash if CONFIG_FDSAN or CONFIG_FDCHECK were enabled.

Issue backtrace:
backtrace_unwind+0x105/0x108
sched_backtrace+0x6f/0x80
sched_dumpstack+0x33/0x80
_assert+0x229/0x510
arm_syscall+0x81/0x98
up_assert+0xd/0x18
__assert+0x1d/0x24
fdcheck_restore+0x69/0xa0
fdlist_get2+0x11/0x48
fdlist_close+0xd/0x94
close+0x15/0x30
closefd+0x5/0x30
notify_parent_process+0x2b/0x40
run_helper_tcp4_echo_server+0x75/0x114
run_test_part+0x5f/0x68
uv_run_tests_main+0x35/0x60
run_test_part+0x5f/0x68
uv_run_tests_main+0x35/0x60
nxtask_startup+0x13/0x2c
nxtask_start+0x4d/0x64

Signed-off-by: dongjiuzhu1 <[email protected]>
@github-actions github-actions bot added Area: File System File System issues Size: S The size of the change in this PR is small labels Dec 30, 2025
fix bug about busy loop

Signed-off-by: dongjiuzhu1 <[email protected]>
@Donny9 Donny9 requested review from acassis and linguini1 January 2, 2026 00:50
@acassis acassis merged commit 12079a2 into apache:master Jan 2, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: File System File System issues Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants