Skip to content

Conversation

@Donny9
Copy link
Contributor

@Donny9 Donny9 commented Dec 30, 2025

Summary

Bug Description:
The original code always used conn->pollinfo[0] (the first element) to store new poll setup context, regardless of whether that slot was already in use. This caused multiple poll operations on the same CAN socket to overwrite each other's context, leading to:

  • Lost poll waiters when multiple threads poll the same socket
  • Memory corruption in pollfd structures
  • Undefined behavior when poll_teardown tries to clean up

Root Cause:
The code directly assigned info = conn->pollinfo without checking if the slot was available, effectively always using pollinfo[0]. When a second thread called poll() on the same socket, it would overwrite the first thread's poll context.

Solution:

  1. Initialize info to NULL instead of conn->pollinfo
  2. Before setting up poll, iterate through all CONFIG_NET_CAN_NPOLLWAITERS slots to find the first free slot (where fds == NULL)
  3. Return -EBUSY if no free slots are available
  4. During teardown, properly mark the slot as free by setting fds = NULL

Additional Changes:

  • Added CONFIG_NET_CAN_NPOLLWAITERS Kconfig option (default 4) to make the maximum number of concurrent poll waiters configurable
  • Changed hardcoded array size from 4 to CONFIG_NET_CAN_NPOLLWAITERS
  • Fixed lock ordering in teardown to ensure fds is cleared before unlock

Impact:

Impact

  • Enables multiple threads to safely poll the same CAN socket concurrently
  • Prevents poll context corruption in multi-threaded applications
  • Provides proper resource management with -EBUSY when all slots are full
  • Makes the number of supported concurrent pollers configurable per use case

Testing

can_send
can_dump
pass base on qemu

sudo ./prebuilts/qemu/linux-x86_64/bin/qemu-system-arm -L ./prebuilts/qemu/linux-x86_64/share/qemu -M virt,virtualization=on,highmem=off -semihosting -nographic -cpu cortex-r52 -device loader,file=./out/qemu_caros_bmp/nuttx -device loader,file=./out/qemu_caros_bmp/nuttx_user -smp 4 -device virtio-net-device,netdev=b1 -netdev bridge,br=nuttx0,id=b1,helper=/usr/lib/qemu/qemu-bridge-helper -object can-bus,id=canbus0-bus  -object can-host-socketcan,if=can0,canbus=canbus0-bus,id=canbus0-socketcan -device ctucan_pci,canbus0=canbus0-bus,canbus1=canbus0-bus
image

Bug Description:
The original code always used conn->pollinfo[0] (the first element) to store
new poll setup context, regardless of whether that slot was already in use.
This caused multiple poll operations on the same CAN socket to overwrite each
other's context, leading to:
- Lost poll waiters when multiple threads poll the same socket
- Memory corruption in pollfd structures
- Undefined behavior when poll_teardown tries to clean up

Root Cause:
The code directly assigned `info = conn->pollinfo` without checking if the
slot was available, effectively always using pollinfo[0]. When a second
thread called poll() on the same socket, it would overwrite the first
thread's poll context.

Solution:
1. Initialize info to NULL instead of conn->pollinfo
2. Before setting up poll, iterate through all CONFIG_NET_CAN_NPOLLWAITERS
   slots to find the first free slot (where fds == NULL)
3. Return -EBUSY if no free slots are available
4. During teardown, properly mark the slot as free by setting fds = NULL

Additional Changes:
- Added CONFIG_NET_CAN_NPOLLWAITERS Kconfig option (default 4) to make the
  maximum number of concurrent poll waiters configurable
- Changed hardcoded array size from 4 to CONFIG_NET_CAN_NPOLLWAITERS
- Fixed lock ordering in teardown to ensure fds is cleared before unlock

Impact:
- Enables multiple threads to safely poll the same CAN socket concurrently
- Prevents poll context corruption in multi-threaded applications
- Provides proper resource management with -EBUSY when all slots are full
- Makes the number of supported concurrent pollers configurable per use case

Signed-off-by: dongjiuzhu1 <[email protected]>
@github-actions github-actions bot added Area: Networking Effects networking subsystem Size: S The size of the change in this PR is small labels Dec 30, 2025
@xiaoxiang781216 xiaoxiang781216 merged commit b8585b9 into apache:master Dec 30, 2025
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Networking Effects networking subsystem Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants