< Previous by Date | Date Index | Next by Date > |
< Previous in Thread | Thread Index | Next in Thread > |
I think the mutex being recursive (at
least on windows) makes the order of the incrementing of mSize important. (Same
thread can lock it more than once) I also think that the ‘while’ used in
non-timed getNext should have been ‘if’ because mCondition’s role is to tell us
the Fifo is no longer empty. The main problem I think is the usage of
mFifo.empty(). The whole point for creating AbstaractFifo
instead of using std::deque directly is the fact that std::deque is not
synchronized, so checking it’s site in conditions of the synchronized version
seems wrong. From: Scott Godin
[mailto:slgodin@xxxxxxxxxxxx] I don’t think the while SHOULD be required
– but it will cause the code to check mFifo.empty() after the condition is
signaled – that’s why it will fix the problem. It is a temporary
solution. If someone can explain why the while() is used in non-timed
getNext and it makes sense – then maybe this will be a permanent solution – I
just don’t understand it’s purpose yet (if everything is working as planned). I don’t see how the order of the
incrementing of mSize is important – it is incremented within a mutex and later
check within a mutex. See the first line of the fn’s Lock
lock(mMutex); (void) lock; - this will cause a mutex to be locked
until the variable lock goes out of scope. Scott From: Micky Kaufmann
[mailto:micky@xxxxxxxxxxx] Thanks for the answer and thanks for
pointing me to the right getNext(int ms). However now I’m more confused: If I understand correctly you’re not
asking why there’s a while in getNext() but int your fixed getnext(int ms). As I see it getNext(int ms) pops an
element from the fifo if and only if such an element exists in the timeslot
described by ‘ms’. So the change from ‘if’ to ‘while’ seems
wrong because if mfifo is not empty we don’t have to wait for a signal. If line 44 should be changed from ‘if
(mfifo.empty())’ should be changed to ‘if (mSize==0)’ otherwise there’s no
point in using ‘mSize’. (I still think that line 23 should be
similarly changed from ‘while (mFifo.empty())’ to while (mSize==0)) Another solution is to make sure ‘mSize++’
should always appear a line before the line that adds an element to ‘mFifo’. I think that even if the bug is in
Condition class for windows, the AbstractFifo should also be changed ASAP. If I’m wrong I’ll be glad to understand
where my mistake is. Thanks again, Micky From: Scott Godin
[mailto:slgodin@xxxxxxxxxxxx] I’ve noticed this problem too.
I’m pretty sure it’s Windows specific. There is relatively new code in
the Condition class for Windows. Also AbstractFifo.getNext(timeout) with
a timeout hasn’t been used much inside the stack itself. Note: The while (mFifo.empty()) line
you are referencing is from getNext() with no parameters – but the assertion
happens in getNext(timeout). The temporary fix I’ve been using is to
change line 44 from If (mFifo.empty()) to While
(mFifo.empty()) This more closely matches what getNext()
without a timeout is doing. Although I’m not entirely sure why getNext is
even using a while statement. Can anyone explain that? My best guess it that there is a bug a
with the windows Condition – such that it will signal, even though nothing has
been added to the Fifo. I plan to look at this closer this week. Note: The mSize counter is always
adjusted within a Mutex – so its value should be properly maintained and
protected. Scott From: Micky Kaufmann
[mailto:micky@xxxxxxxxxxx] I’m working on windows, visual studio 7.1
(Maybe the problem is with the Mutex being recursive?) Still
what troubles me in the AbstractFifo is the fact that the while condition is ‘ while (mFifo.empty()) ’ but the assert is not on mFifo. So it seems that the while
condition should be ‘ while (mSize!=0)
’. The bug appears when a client tries to
reach a non registered contact - the server tries to skip remaining stages of a
transaction and terminate it but it fails... Here’s the stack trace: >
repro.exe!_NMSG_WRITE(int rterrnum=10) Line
195 C
repro.exe!abort() Line 44 + 0x7 C
repro.exe!_assert(const char * expr=0x00a652b0, const char *
filename=0x00a65298, unsigned int lineno=57) Line
306 C
repro.exe!resip::AbstractFifo::getNext(int ms=100) Line 57 +
0x1a C++
repro.exe!resip::TimeLimitFifo<resip::Message>::getNext(int ms=100)
Line 143 +
0xc C++
repro.exe!repro::Proxy::thread() Line 67 + 0xd C++
repro.exe!threadWrapper(void * threadParm=0x000e06dc) Line 34 + 0xd C++
kernel32.dll!7c80b50b()
ntdll.dll!7c96e0d4()
kernel32.dll!7c8399f3() And here is the debug info produces by
repro: INFO | 20050522-110727.926 |
c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724 | proxy.cxx:133 | Inserting new
RequestContext tid=28CA0A6D1F57474F96CCF80478461139 -> RequestContext:
identity= candidates=[] count=1 final=0 INFO | 20050522-110727.941 |
c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724 | locationserver.cxx:50 |
LocationServer monkey : no registered target for sip:33@xxxxxxxxxxxxxxxxxxx
send 480 INFO | 20050522-110732.705 |
c:\sip\repro\Debug\repro.exe | RESIP:TRANSACTION | 3356 |
TransactionState.cxx:357 | discarding stray response: SipResp: 200
tid=c0a800190131c9b142904bd40000751c0000000a cseq=OPTIONS / 10 from(wire) INFO | 20050522-110732.986 |
c:\sip\repro\Debug\repro.exe | REPRO:APP | 1724 | requestcontext.cxx:49 |
RequestContext::process(TransactionTerminated) 28CA0A6D1F57474F96CCF80478461139
: RequestContext: identity= candidates=[] count=1 final=1 Assertion failed: mSize != 0, file
.\os\AbstractFifo.cxx, line 57 INFO | 20050522-110752.699 |
c:\sip\repro\Debug\repro.exe | RESIP:TRANSACTION | 3356 |
TransactionState.cxx:357 | discarding stray response: SipResp: 200
tid=c0a800190131c9b142904be800006a170000000b cseq=OPTIONS / 11 from(wire) INFO | 20050522-110812.724 |
c:\sip\repro\Debug\repro.exe | RESIP:TRANSACTION | 3356 |
TransactionState.cxx:357 | discarding stray response: SipResp: 200
tid=c0a800190131c9b142904bfc000056860000000c cseq=OPTIONS / 12 from(wire) From: Can you get a stack trace this?
AbstractFifo is used a lot in resip, but it does have several subclasses.
The size is incremeneted in the add methods of the subclasses, which lock
before adding. Also, run resiprocate/test/testFifo.cxx on
your configuration. --Derek PS -- What is your platform/compiler? CONFIDENTIALITY NOTICE This email and any files transmitted with it contains
proprietary information and, unless expressly stated otherwise, all contents
and attachments are confidential. This email is intended for the addressee(s)
only and access by anyone else is unauthorized. If you are not an addressee,
any disclosure, distribution, printing or copying of the contents of this email
or its attachments, or any action taken in reliance on it, is unauthorized and
may be unlawful. If you are not an addressee, please inform the sender
immediately and then delete this email and any copies of it. Thank you for your
co-operation. From:
resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Micky Kaufmann Hi. All While running repro I go an assert error from the AbstractFifo::getNext
method: void* AbstractFifo::getNext() { Lock
lock(mMutex); (void)lock; while (mFifo.empty()) {
mCondition.wait(mMutex); } void* firstMessage =
mFifo.front();
mFifo.pop_front(); assert(mSize !=
0); mSize--; return firstMessage; } As I see it the problem is
caused probably because non thread safe adding of elements to the Fifo. Meaning
mSize++ is missing somewhere and from what I investigated the only reason for
using mSize instead of using std::deque::size is because we want to make sure
that mSize < mMaxSize. Is there another reason? Is it ok to replace the
lines: assert(mSize !=
0); mSize--; with mSize =
mFifo.size(); Is it ok to replace the
line: while (mFifo.empty()) with while (mSize==0)
//this.empty() Any other solution? |