Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
Description
(from inspection, could be mistaken)
This concerns routines sys_mbox_post() and sys_arch_mbox_fetch() in (sdk/)lib/drivers/lwip/src/sys_arch.c
based on assumption that sys_mbox_post() and sys_arch_mbox_fetch() are actually running in different threads (which I currently believe to be the case but have not actually observed.)
Scenario:
There is already at least one event in the mbox list (->ListHead) which has not yet been removed by sys_arch_mbox_fetch().
A call is made to sys_mbox_post() adding a second entry,
ExInterlockedInsertTailList(&mbox->ListHead,
&Container->ListEntry,
&mbox->Lock);
with the adding thread being swapped out after the add is complete (and mbox->Locked released) but BEFORE the immediately following call
KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);
This allows the main tcpipthread to run, coming into sys_arch_mbox_fetch() once, and then coming in a second time (the thread adding the entry has NOT run again yet), and the second time removes the 2nd and final entry, sees that the list is empty, and thus clears the event
if (IsListEmpty(&mbox->ListHead))
KeClearEvent(&mbox->Event);
At some point after this the thread with sys_mbox_post() continues then executing the
KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);
thus indicating there are items in the queue, when THERE ARE NO ITEMS.
The main thread can run again, sees the event is set, but will fail to retrieve an entry, and either ASSERT() in debug mode or in release mode, go on and attempt to access the Entry at
Container = CONTAINING_RECORD(Entry, LWIP_MESSAGE_CONTAINER, ListEntry);
presumably failing because there is no Entry.
I believe a solution would be to change this code in sys_mbox_post()
ExInterlockedInsertTailList(&mbox->ListHead,
&Container->ListEntry,
&mbox->Lock);
KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);
into something like
KeAcquireSpinLock(&mbox->Lock, &OldIrql);
InsertTailList(&mbox->ListHead, &Container->ListEntry);
KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);
KeReleaseSpinLock(&mbox->Lock, OldIrql);
preventing sys_arch_mbox_fetch() from accessing the list and clearing the event until AFTER sys_mbox_post() had finished with its additem/setevent sequence, thus avoiding the race condition.
***This code appears to be active in current trunk, as well as multiple branches, one of which is the GSoC 2016 lwip-tcpip branch.