< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index Next in Thread >

RE: [reSIProcate] AbstractFifo bug?


Both getNext and getNext(int) look wrong.

Being woken up by the condition is not a guarantee that there is a message
available -- some other thread may have grabbed the element.

Need to check again after the condition returns and loop around the condition.

I'll try and fix it and test...

david



Quoting Scott Godin <slgodin@xxxxxxxxxxxx>:

I?ve noticed this problem too.   I?m pretty sure it?s Windows specific.
There is relatively new code in the Condition class for Windows.  Also
AbstractFifo.getNext(timeout) with a timeout hasn?t been used much inside
the stack itself.



Note:  The while (mFifo.empty()) line you are referencing is from getNext()
with no parameters ? but the assertion happens in getNext(timeout).



The temporary fix I?ve been using is to change line 44 from

If (mFifo.empty())  to

While (mFifo.empty())

This more closely matches what getNext() without a timeout is doing.
Although I?m not entirely sure why getNext is even using a while statement.
Can anyone explain that?



My best guess it that there is a bug a with the windows Condition ? such
that it will signal, even though nothing has been added to the Fifo.



I plan to



look at this closer this week.



Note:  The mSize counter is always adjusted within a Mutex ? so its value
should be properly maintained and protected.



Scott



 _____

From: Micky Kaufmann [mailto:micky@xxxxxxxxxxx]
Sent: Sunday, May 22, 2005 5:16 AM
To: Derek MacDonald; resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] AbstractFifo bug?



I?m working on windows, visual studio 7.1   (Maybe the problem is with the
Mutex being recursive?)

Still what troubles me in the AbstractFifo is the fact that the while
condition is ? while (mFifo.empty()) ? but the assert is not on mFifo. So it
seems that the while condition should be ? while (mSize!=0) ?.

The bug appears when a client tries to reach a non registered contact - the
server tries to skip remaining stages of a transaction and terminate it but
it fails...



Here?s the stack trace:

         repro.exe!_NMSG_WRITE(int rterrnum=10)  Line 195      C

           repro.exe!abort()  Line 44 + 0x7  C

           repro.exe!_assert(const char * expr=0x00a652b0, const char *
filename=0x00a65298, unsigned int lineno=57)  Line 306           C

           repro.exe!resip::AbstractFifo::getNext(int ms=100)  Line 57 +
0x1a            C++

           repro.exe!resip::TimeLimitFifo<resip::Message>::getNext(int
ms=100)  Line 143 + 0xc            C++

           repro.exe!repro::Proxy::thread()  Line 67 + 0xd    C++

           repro.exe!threadWrapper(void * threadParm=0x000e06dc)  Line 34 +
0xd C++

           kernel32.dll!7c80b50b()

           ntdll.dll!7c96e0d4()

           kernel32.dll!7c8399f3()



And here is the debug info produces by repro:



INFO | 20050522-110727.926 | c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724
| proxy.cxx:133 | Inserting new RequestContext
tid=28CA0A6D1F57474F96CCF80478461139 -> RequestContext:  identity=
candidates=[] count=1 final=0

INFO | 20050522-110727.941 | c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724
| locationserver.cxx:50 | LocationServer monkey : no registered target for
sip:33@xxxxxxxxxxxxxxxxxxx send 480

INFO | 20050522-110732.705 | c:\sip\repro\Debug\repro.exe |
RESIP:TRANSACTION | 3356 | TransactionState.cxx:357 | discarding stray
response: SipResp: 200 tid=c0a800190131c9b142904bd40000751c0000000a
cseq=OPTIONS / 10 from(wire)

INFO | 20050522-110732.986 | c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724
| requestcontext.cxx:49 | RequestContext::process(TransactionTerminated)
28CA0A6D1F57474F96CCF80478461139 : RequestContext:  identity= candidates=[]
count=1 final=1

Assertion failed: mSize != 0, file .\os\AbstractFifo.cxx, line 57

INFO | 20050522-110752.699 | c:\sip\repro\Debug\repro.exe |
RESIP:TRANSACTION | 3356 | TransactionState.cxx:357 | discarding stray
response: SipResp: 200 tid=c0a800190131c9b142904be800006a170000000b
cseq=OPTIONS / 11 from(wire)

INFO | 20050522-110812.724 | c:\sip\repro\Debug\repro.exe |
RESIP:TRANSACTION | 3356 | TransactionState.cxx:357 | discarding stray
response: SipResp: 200 tid=c0a800190131c9b142904bfc000056860000000c
cseq=OPTIONS / 12 from(wire)



 _____

From: Derek MacDonald [mailto:derek@xxxxxxxx]
Sent: ? 19 ??? 2005 20:20
To: Micky Kaufmann; resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] AbstractFifo bug?



Can you get a stack trace this? AbstractFifo is used a lot in resip, but it
does have several subclasses.  The size is incremeneted in the add methods
of the subclasses, which lock before adding.



Also, run resiprocate/test/testFifo.cxx on your configuration.



--Derek



PS -- What is your platform/compiler?



CONFIDENTIALITY NOTICE

This email and any files transmitted with it contains proprietary
information and, unless expressly stated otherwise, all contents and
attachments are confidential. This email is intended for the addressee(s)
only and access by anyone else is unauthorized. If you are not an addressee,
any disclosure, distribution, printing or copying of the contents of this
email or its attachments, or any action taken in reliance on it, is
unauthorized and may be unlawful. If you are not an addressee, please inform
the sender immediately and then delete this email and any copies of it.
Thank you for your co-operation.

 _____

From: resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Micky
Kaufmann
Sent: Thursday, May 19, 2005 6:31 AM
To: resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: [reSIProcate] AbstractFifo bug?



Hi. All



While running repro I go an assert error from the AbstractFifo::getNext
method:



void* AbstractFifo::getNext() {

  Lock lock(mMutex); (void)lock;



  while (mFifo.empty())

  {

     mCondition.wait(mMutex);

  }

  void* firstMessage = mFifo.front();

  mFifo.pop_front();

  assert(mSize != 0);

  mSize--;

  return firstMessage;

}



As I see it the problem is caused probably because non thread safe adding of
elements to the Fifo. Meaning mSize++ is missing somewhere and from what I
investigated the only reason for using mSize instead of using
std::deque::size is because we want to make sure that mSize < mMaxSize.

Is there another reason?



Is it ok to replace the lines:

  assert(mSize != 0);

  mSize--;

with

  mSize = mFifo.size();



Is it ok to replace the line:

  while (mFifo.empty())

with

  while (mSize==0)  //this.empty()



Any other solution?