Uploaded image for project: 'Core ReactOS'
  1. Core ReactOS
  2. CORE-12928

possible race condition between sys_mbox_post()/sys_arch_mbox_fetch()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Networking, NTCore
    • None

    Description

      (from inspection, could be mistaken)

      This concerns routines sys_mbox_post() and sys_arch_mbox_fetch() in (sdk/)lib/drivers/lwip/src/sys_arch.c

      based on assumption that sys_mbox_post() and sys_arch_mbox_fetch() are actually running in different threads (which I currently believe to be the case but have not actually observed.)

      Scenario:

      There is already at least one event in the mbox list (->ListHead) which has not yet been removed by sys_arch_mbox_fetch().

      A call is made to sys_mbox_post() adding a second entry,

          ExInterlockedInsertTailList(&mbox->ListHead,
                                      &Container->ListEntry,
                                      &mbox->Lock);

      with the adding thread being swapped out after the add is complete (and mbox->Locked released) but BEFORE the immediately following call

          KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);

      This allows the main tcpipthread to run, coming into sys_arch_mbox_fetch() once, and then coming in a second time (the thread adding the entry has NOT run again yet), and the second time removes the 2nd and final entry, sees that the list is empty, and thus clears the event

              if (IsListEmpty(&mbox->ListHead))
                  KeClearEvent(&mbox->Event);

      At some point after this the thread with sys_mbox_post() continues then executing the

          KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);

       thus indicating there are items in the queue, when THERE ARE NO ITEMS.

      The main thread can run again, sees the event is set, but will fail to retrieve an entry, and either ASSERT() in debug mode or in release mode, go on and attempt to access the Entry at

              Container = CONTAINING_RECORD(Entry, LWIP_MESSAGE_CONTAINER, ListEntry);

      presumably failing because there is no Entry.

      I believe a solution would be to change this code in sys_mbox_post()

          ExInterlockedInsertTailList(&mbox->ListHead,
                                      &Container->ListEntry,
                                      &mbox->Lock);

          KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);

       

      into something like

              KeAcquireSpinLock(&mbox->Lock, &OldIrql);
              InsertTailList(&mbox->ListHead, &Container->ListEntry);

              KeSetEvent(&mbox->Event, IO_NO_INCREMENT, FALSE);        

              KeReleaseSpinLock(&mbox->Lock, OldIrql);   

      preventing sys_arch_mbox_fetch() from accessing the list and clearing the event until AFTER sys_mbox_post() had finished with its additem/setevent sequence, thus avoiding the race condition.

      ***This code appears to be active in current trunk, as well as multiple branches, one of which is the GSoC 2016 lwip-tcpip branch.

      Attachments

        Activity

          People

            bug zilla Bug Zilla
            curiousone curiousone
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: