< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index Next in Thread >

RE: [reSIProcate] AbstractFifo bug?


I think the mutex being recursive (at least on windows) makes the order of the incrementing of mSize important. (Same thread can lock it more than once)

I also think that the ‘while’ used in non-timed getNext should have been ‘if’ because mCondition’s role is to tell us the Fifo is no longer empty.

The main problem I think is the usage of mFifo.empty().

The whole point for creating AbstaractFifo instead of using std::deque directly is the fact that std::deque is not synchronized, so checking it’s site in conditions of the synchronized version seems wrong.

 

 


From: Scott Godin [mailto:slgodin@xxxxxxxxxxxx]
Sent: ג 24 מאי 2005 17:15
To: Micky Kaufmann; Scott Godin; Derek MacDonald; resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] AbstractFifo bug?

 

I don’t think the while SHOULD be required – but it will cause the code to check mFifo.empty() after the condition is signaled – that’s why it will fix the problem.  It is a temporary solution.  If someone can explain why the while() is used in non-timed getNext and it makes sense – then maybe this will be a permanent solution – I just don’t understand it’s purpose yet (if everything is working as planned).

 

I don’t see how the order of the incrementing of mSize is important – it is incremented within a mutex and later check within a mutex.  See the first line of the fn’s  Lock lock(mMutex);  (void) lock;  - this will cause a mutex to be locked until the variable lock goes out of scope.

 

Scott

 

 


From: Micky Kaufmann [mailto:micky@xxxxxxxxxxx]
Sent: Tuesday, May 24, 2005 11:56 AM
To: Scott Godin; Derek MacDonald; resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] AbstractFifo bug?

 

Thanks for the answer and thanks for pointing me to the right getNext(int ms).

 

However now I’m more confused:

If I understand correctly you’re not asking why there’s a while in getNext() but int your fixed getnext(int ms).

As I see it getNext(int ms) pops an element from the fifo if and only if such an element exists in the timeslot described by ‘ms’.

So the change from ‘if’ to ‘while’ seems wrong because if mfifo is not empty we don’t have to wait for a signal.

If line 44 should be changed from ‘if (mfifo.empty())’ should be changed to ‘if (mSize==0)’ otherwise there’s no point in using ‘mSize’.

(I still think that line 23 should be similarly changed from ‘while (mFifo.empty())’ to while (mSize==0))

Another solution is to make sure ‘mSize++’ should always appear a line before the line that adds an element to ‘mFifo’.

 

I think that even if the bug is in Condition class for windows, the AbstractFifo should also be changed ASAP.

If I’m wrong I’ll be glad to understand where my mistake is.

 

Thanks again,

Micky

 


From: Scott Godin [mailto:slgodin@xxxxxxxxxxxx]
Sent: ג 24 מאי 2005 15:47
To: Micky Kaufmann; Derek MacDonald; resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] AbstractFifo bug?

 

I’ve noticed this problem too.   I’m pretty sure it’s Windows specific.  There is relatively new code in the Condition class for Windows.  Also AbstractFifo.getNext(timeout) with a timeout hasn’t been used much inside the stack itself.

 

Note:  The while (mFifo.empty()) line you are referencing is from getNext() with no parameters – but the assertion happens in getNext(timeout).

 

The temporary fix I’ve been using is to change line 44 from

If (mFifo.empty())  to

While (mFifo.empty())    

This more closely matches what getNext() without a timeout is doing.  Although I’m not entirely sure why getNext is even using a while statement.  Can anyone explain that?

 

My best guess it that there is a bug a with the windows Condition – such that it will signal, even though nothing has been added to the Fifo.

 

I plan to look at this closer this week.

 

Note:  The mSize counter is always adjusted within a Mutex – so its value should be properly maintained and protected.

 

Scott

 


From: Micky Kaufmann [mailto:micky@xxxxxxxxxxx]
Sent: Sunday, May 22, 2005 5:16 AM
To: Derek MacDonald; resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] AbstractFifo bug?

 

I’m working on windows, visual studio 7.1   (Maybe the problem is with the Mutex being recursive?)

Still what troubles me in the AbstractFifo is the fact that the while condition is ‘ while (mFifo.empty()) ’ but the assert is not on mFifo. So it seems that the while condition should be ‘ while (mSize!=0) ’.

The bug appears when a client tries to reach a non registered contact - the server tries to skip remaining stages of a transaction and terminate it but it fails...

 

Here’s the stack trace:

>          repro.exe!_NMSG_WRITE(int rterrnum=10)  Line 195      C

            repro.exe!abort()  Line 44 + 0x7  C

            repro.exe!_assert(const char * expr=0x00a652b0, const char * filename=0x00a65298, unsigned int lineno=57)  Line 306           C

            repro.exe!resip::AbstractFifo::getNext(int ms=100)  Line 57 + 0x1a            C++

            repro.exe!resip::TimeLimitFifo<resip::Message>::getNext(int ms=100)  Line 143 + 0xc            C++

            repro.exe!repro::Proxy::thread()  Line 67 + 0xd    C++

            repro.exe!threadWrapper(void * threadParm=0x000e06dc)  Line 34 + 0xd C++

            kernel32.dll!7c80b50b()

            ntdll.dll!7c96e0d4()        

            kernel32.dll!7c8399f3()  

 

And here is the debug info produces by repro:

 

INFO | 20050522-110727.926 | c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724 | proxy.cxx:133 | Inserting new RequestContext tid=28CA0A6D1F57474F96CCF80478461139 -> RequestContext:  identity= candidates=[] count=1 final=0

INFO | 20050522-110727.941 | c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724 | locationserver.cxx:50 | LocationServer monkey : no registered target for sip:33@xxxxxxxxxxxxxxxxxxx send 480

INFO | 20050522-110732.705 | c:\sip\repro\Debug\repro.exe | RESIP:TRANSACTION | 3356 | TransactionState.cxx:357 | discarding stray response: SipResp: 200 tid=c0a800190131c9b142904bd40000751c0000000a cseq=OPTIONS / 10 from(wire)

INFO | 20050522-110732.986 | c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724 | requestcontext.cxx:49 | RequestContext::process(TransactionTerminated) 28CA0A6D1F57474F96CCF80478461139 : RequestContext:  identity= candidates=[] count=1 final=1

Assertion failed: mSize != 0, file .\os\AbstractFifo.cxx, line 57

INFO | 20050522-110752.699 | c:\sip\repro\Debug\repro.exe | RESIP:TRANSACTION | 3356 | TransactionState.cxx:357 | discarding stray response: SipResp: 200 tid=c0a800190131c9b142904be800006a170000000b cseq=OPTIONS / 11 from(wire)

INFO | 20050522-110812.724 | c:\sip\repro\Debug\repro.exe | RESIP:TRANSACTION | 3356 | TransactionState.cxx:357 | discarding stray response: SipResp: 200 tid=c0a800190131c9b142904bfc000056860000000c cseq=OPTIONS / 12 from(wire)

 


From: Derek MacDonald [mailto:derek@xxxxxxxx]
Sent: ה 19 מאי 2005 20:20
To: Micky Kaufmann; resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] AbstractFifo bug?

 

Can you get a stack trace this? AbstractFifo is used a lot in resip, but it does have several subclasses.  The size is incremeneted in the add methods of the subclasses, which lock before adding.

 

Also, run resiprocate/test/testFifo.cxx on your configuration.

 

--Derek

 

PS -- What is your platform/compiler?

 

CONFIDENTIALITY NOTICE

This email and any files transmitted with it contains proprietary information and, unless expressly stated otherwise, all contents and attachments are confidential. This email is intended for the addressee(s) only and access by anyone else is unauthorized. If you are not an addressee, any disclosure, distribution, printing or copying of the contents of this email or its attachments, or any action taken in reliance on it, is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately and then delete this email and any copies of it. Thank you for your co-operation.


From: resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Micky Kaufmann
Sent: Thursday, May 19, 2005 6:31 AM
To: resiprocate-devel@xxxxxxxxxxxxxxxxxxx
Subject: [reSIProcate] AbstractFifo bug?

 

Hi. All

 

While running repro I go an assert error from the AbstractFifo::getNext method:

 

void* AbstractFifo::getNext() {

   Lock lock(mMutex); (void)lock;

 

   while (mFifo.empty())

   {

      mCondition.wait(mMutex);

   }

   void* firstMessage = mFifo.front();

   mFifo.pop_front();

   assert(mSize != 0);

   mSize--;

   return firstMessage;

}

 

As I see it the problem is caused probably because non thread safe adding of elements to the Fifo. Meaning mSize++ is missing somewhere and from what I investigated the only reason for using mSize instead of using std::deque::size is because we want to make sure that mSize < mMaxSize.

Is there another reason?

 

Is it ok to replace the lines:

   assert(mSize != 0);

   mSize--;

with

   mSize = mFifo.size();

 

Is it ok to replace the line:

   while (mFifo.empty())

with

   while (mSize==0)  //this.empty()

 

Any other solution?