I find a way to trace Leak TIDs by adding time stamp in the
TransactionState class locally and print out the error msg if its still
exists for more than two minutes.
I can figure out why the problem occurs and fix my codes later on.
Deeply appreciate your help.
Frank Yuan
Byron Campen wrote:
Okay,
I will see about adding this feature, but the more pressing concern I
have right now is the (apparent) leakage of client TransactionStates.
Client transactions have a fixed lifetime, as opposed to server
transactions. If no response is received in a client transaction, that
transaction is supposed to die after 32s (regardless of what the TU
does). There is only one exception that I know of; when we have sent an
INVITE, and received a provisional response, there are only two things
that will cause the client transaction to be torn down:
1. We receive a final response (in this case, the transaction is
cleaned up after 64*T1, or 32s)
2. The TU sends a CANCEL to the stack (this will cause a 128*T1,
or 64s, timer to be set on the INVITE transaction).
So,
if we are not getting final responses, it is up to the TU to decide
when it no longer wants to pursue the call. In proxies, this is done
with Timer C (See RFC 3261 Sec 16.6 bullet 11, Sec 16.7 bullet 2, and
Sec 16.8). In endpoints, it is assumed the human user will eventually
give up on the call. I am not aware of any defined mechanism for B2BUAs.
Since
you are running under heavy load, it is very likely stuff is getting
dropped by the kernel, so it seems likely that we are winding up in a
situation where the only thing that can keep us from leaking is CANCEL
from the TU.
Best regards,
Byron Campen
Yes, very useful.
Could you add debug msg concerning the potential leak of TID if there
is no response from TU when the timer expires?
Could you list the conditions where TIDs may get leaked in Transaction
Map?
case 1: For None INVITE Requests (Internal and Externam), no resposne
from remote and its own TU etc.
case 2: ...............
.............................
Frank Yuan
Emergent-Netsolutions.com
972-359-6600
Byron Campen wrote:
There
is not (at present) any way to detect if particular transactions have
been up for an unusually long time. We could add a debug feature that
would have the stack start a timer when it sends a request to the TU,
and if that timer fires and finds that the transaction is still in its
initial state (ie, it hasn't gotten anything from the TU), log
information about that transaction, and maybe even poke the TU. Would
this be useful?
Best regards,
Byron Campen
Your input is very helpful.
Are there ways for TU to detect TID leaks and free the lost TIDs?
If yes, how? If no, it is tough to use this stateful stack.
Could the stateless resip stack be used in order to get rid of TIDs
issue and hughe memory holding (32 to 64 seconds holding time for each
request)?
Thanks
Frank Yuan
Emergent-Netsolutions.com
972-359-6600
Byron Campen wrote:
I
disagree with your first point. If the message gets to the TU, it is
the TU's responsibility to handle it. I have in the past proposed
something like a SipStack::defer(const resip::Data& tid, MethodType
method) that would tell the stack to clean up state for that tid, but
this idea did not take on.
As
for the call volume question, you can set your TU's fifo to have size
and/or time-depth limitations (by re-initializing your TU's fifo in the
c'tor). If the stack detects that your fifo is overfull, it will not
forward new requests to you, but will instead statelessly respond with
503s. Responses and ACKs will still make it through, but if your TU
ignores these, it could end up leaking memory itself (since it never
saw the response for the request it is holding onto state for).
However,
there comes a point when the stack's ability to send out 503s fast
enough is overwhelmed, and really bad stuff starts happening then.
Also, if we are statelessly sending 503s, any ACKs to those 503s end up
being indistinguishable from ACK/200 (since we never remembered the
tid), and therefore get sent up to the TU, further compounding the
problem.
Best regards,
Byron Campen
Case 1: if the call volume is very
high, some messages may get lost or dropped, and the sip stack should
have self protection to prevent this problem.
Other way: Are there mechanisms for the application to
identify dangling TIDs and free them?
Does RESIP stack have the func intefaces for application
to use.
Question: if the call volume is too high, is there any mechanism for
resip stack to detect it and discard any new request messages if the
computer cannot handle it?
Thanks
Frank Yuan
Emergent-Netsolutions.com
972-359-6600
Byron Campen wrote:
TU
summary: 0 TRANSPORT 0 TRANSACTION 0 CLIENTTX 1998 SERVERTX 10690
TIMERS 0
Transaction summary: reqi 1225266 reqo 1200525 rspi 955555 rspo 1229993
Details: INVi 383069/S322324/F36145 INVo 344348/S322414/F0 ACKi 312462
ACKo 3223
60 BYEi 507150/S507141/F0 BYEo 307875/S307111/F0 CANi 22517/S507141/F0
CANo 2223
/S1550/F593 MSGi 0/S0/F0 MSGo 0/S0/F0 OPTi 0/S0/F0 OPTo 0/S0/F0 REGi
68/S64/F0 R
EGo 0/S0/F0 PUBi 0/S0/F0 PUBo 0/S0/F0 SUBi 0/S0/F0 SUBo 0/S0/F0 NOTi
0/S0/F0 NOT
o 0/S0/F0
Ok, the CLIENTTX and SERVERTX fields in the above logging statement
indicate that there are lots and lots of TransactionStates lying
around. Further, there are no timers left in the TimerQueue, so we
aren't likely to clean any of these up. Lets talk about the server
transactions first. There are a couple of likely possibilities:
1. The TU is failing
to respond to some of the requests that the stack passes it; the stack
will wait indefinitely for a response from the TU. It is the TU's
responsibility to respond to EVERY request that is passed to it, no
matter how malformed the request might be. The TU should never
elect to "quietly" drop a request. Doing so is guaranteed to leak
exactly one server TransactionState.
2. High load conditions (note the number of
retransmissions) have caused the stack to leak transactions (I will
take a closer look at this)
As for the client TransactionStates, this worries me
more. There are fewer things that the TU can do wrong that will cause
the stack to leak client TransactionStates. I will try to figure out
what might be happening here.
So, are you using
your own TU? If so, try putting a simple counter that gets incremented
for each request that comes from the stack (excepting ACKs), and
decremented for every final response sent to the stack. If this
counter ends up being non-zero, you have a bug in your TU.
Best regards,
Byron Campen
On Sep 21, 2006, at 1:36 PM, FrankYuan wrote:
After call generator stopped for
10 minutes, I found that the resip statistics did not have any problem
on these FIFO queues.
So I created core file and print the size of Transaction map.
There are still lot of TIDs in the transaction map. At least it is part
of culprit to hold memory.
Should there be a grarbage collection to free these lost TIDs?
Here are the log files:
20060921-125408.091 | TuSelector.cxx:71 | Stats message
20060921-125408.091 | StatisticsMessage.cxx:153 | RESIP:TRANSACTION
TU summary: 0 TRANSPORT 0 TRANSACTION 0 CLIENTTX 1998 SERVERTX 10690
TIMERS 0
Transaction summary: reqi 1225266 reqo 1200525 rspi 955555 rspo 1229993
Details: INVi 383069/S322324/F36145 INVo 344348/S322414/F0 ACKi 312462
ACKo 3223
60 BYEi 507150/S507141/F0 BYEo 307875/S307111/F0 CANi 22517/S507141/F0
CANo 2223
/S1550/F593 MSGi 0/S0/F0 MSGo 0/S0/F0 OPTi 0/S0/F0 OPTo 0/S0/F0 REGi
68/S64/F0 R
EGo 0/S0/F0 PUBi 0/S0/F0 PUBo 0/S0/F0 SUBi 0/S0/F0 SUBo 0/S0/F0 NOTi
0/S0/F0 NOT
o 0/S0/F0
Retransmissions: INVx 116463 BYEx 105757 CANx 1499 MSGx 0 OPTx 0 REGx 0
finx 0 n
onx 0 PUBx 0 SUBx 0 NOTx 0
20060921-125708.084 | TuSelector.cxx:71 | Stats message
20060921-125708.084 | StatisticsMessage.cxx:153 | RESIP:TRANSACTION
TU summary: 0 TRANSPORT 0 TRANSACTION 0 CLIENTTX 1998 SERVERTX 10690
TIMERS 0
Transaction summary: reqi 1225268 reqo 1200525 rspi 955555 rspo 1229995
Details: INVi 383069/S322324/F36145 INVo 344348/S322414/F0 ACKi 312462
ACKo 3223
60 BYEi 507150/S507141/F0 BYEo 307875/S307111/F0 CANi 22517/S507141/F0
CANo 2223
/S1550/F593 MSGi 0/S0/F0 MSGo 0/S0/F0 OPTi 0/S0/F0 OPTo 0/S0/F0 REGi
70/S66/F0 R
EGo 0/S0/F0 PUBi 0/S0/F0 PUBo 0/S0/F0 SUBi 0/S0/F0 SUBo 0/S0/F0 NOTi
0/S0/F0 NOT
o 0/S0/F0
Retransmissions: INVx 116463 BYEx 105757 CANx 1499 MSGx 0 OPTx 0 REGx 0
finx 0 n
onx 0 PUBx 0 SUBx 0 NOTx 0
20060921-130008.078 | TuSelector.cxx:71 | Stats message
20060921-130008.085 | StatisticsMessage.cxx:153 | RESIP:TRANSACTION
TU summary: 0 TRANSPORT 0 TRANSACTION 0 CLIENTTX 1998 SERVERTX 10690
TIMERS 0
Transaction summary: reqi 1225270 reqo 1200525 rspi 955555 rspo 1229997
Details: INVi 383069/S322324/F36145 INVo 344348/S322414/F0 ACKi 312462
ACKo 3223
60 BYEi 507150/S507141/F0 BYEo 307875/S307111/F0 CANi 22517/S507141/F0
CANo 2223
/S1550/F593 MSGi 0/S0/F0 MSGo 0/S0/F0 OPTi 0/S0/F0 OPTo 0/S0/F0 REGi
72/S68/F0 R
EGo 0/S0/F0 PUBi 0/S0/F0 PUBo 0/S0/F0 SUBi 0/S0/F0 SUBo 0/S0/F0 NOTi
0/S0/F0 NOT
o 0/S0/F0
Retransmissions: INVx 116463 BYEx 105757 CANx 1499 MSGx 0 OPTx 0 REGx 0
finx 0 n
onx 0 PUBx 0 SUBx 0 NOTx 0
(gdb) p
(EnSipStack->myStack->mTransactionController->mClientTransactionMap)
warning: can't find class named `resip::SipStack', as given by C++ RTTI
$1 = {mMap = {_M_ht = {_M_node_allocator = {<No data fields>},
_M_hash = {<No data fields>},
_M_equals =
{<binary_function<resip::Data,resip::Data,bool>> = {<No
data f
ields>}, <No data fields>},
_M_get_key = {<unary_function<std::pair<const
resip::Data, resip::Transact
ionState*>,const resip::Data>> = {<No data fields>},
<No data fields>},
_M_buckets =
{<_Vector_base<__gnu_cxx::_Hashtable_node<std::pair<const
res
ip::Data, resip::TransactionState*>
>*,std::allocator<resip::TransactionState*>
>> =
{<_Vector_alloc_base<__gnu_cxx::_Hashtable_node<std::pair<const
resip::Data
, resip::TransactionState*>
>*,std::allocator<resip::TransactionState*>,true>> =
{_M_start = 0x920bdd10, _M_finish = 0x920c9d14,
_M_end_of_storage = 0x920c9d14}, <No data fields>},
<No data fields>
}, _M_num_elements = 1998}}}
(gdb) p
(EnSipStack->myStack->mTransactionController->mServerTransactionMap)
warning: can't find class named `resip::SipStack', as given by C++ RTTI
$2 = {mMap = {_M_ht = {_M_node_allocator = {<No data fields>},
_M_hash = {<No data fields>},
_M_equals =
{<binary_function<resip::Data,resip::Data,bool>> = {<No
data f
ields>}, <No data fields>},
_M_get_key = {<unary_function<std::pair<const
resip::Data, resip::Transact
ionState*>,const resip::Data>> = {<No data fields>},
<No data fields>},
_M_buckets =
{<_Vector_base<__gnu_cxx::_Hashtable_node<std::pair<const
res
ip::Data, resip::TransactionState*>
>*,std::allocator<resip::TransactionState*>
>> =
{<_Vector_alloc_base<__gnu_cxx::_Hashtable_node<std::pair<const
resip::Data
, resip::TransactionState*>
>*,std::allocator<resip::TransactionState*>,true>> =
{_M_start = 0x8cc3e790, _M_finish = 0x8cc567d4,
_M_end_of_storage = 0x8cc567d4}, <No data fields>},
<No data fields>
}, _M_num_elements = 10691}}}
Thanks
Frank Yuan
Emergent-Netsolutions.com
972-359-6600
FrankYuan wrote:
I am still working on it and will let you know as soon as I find
anything related.
Thanks
Frank Yuan
Emergent-Netsolutions.com
972-359-6600
Byron Campen wrote:
This code was written long before my time here at resiprocate, so
I do not know. To those who are in the know, is this a relic that can
be safely done away with?
Did you verify whether or not you had a genuine memory leak (this is
something I am very interested to know)?
Best regards,
Byron Campen
My question why NoSize(0U-1) is used for mSize when clear func is
called.
mStateMachineFifo.size() may return either 0 or NoSize if the queue
is empty.
It should alway return 0 if the queue is empty and NoSize should not
be used.
NoSize causes confusion and is error prone.
Thanks
Frank Yuan
Emergent-Netsolutions.com
972-359-6600
Jason Fischl wrote:
On 9/20/06, Byron Campen <bcampen@xxxxxxxxxxxx> wrote:
As for your questions about AbstractFifo, I am unsure why
mSize is
needed. Can anyone answer this (or, answer why clear is a no-op)?
The clear method is virtual and gets defined in the subclasses.
I believe that mSize is there as an optimization.
_______________________________________________
resiprocate-devel mailing list
resiprocate-devel@xxxxxxxxxxxxxxxxxxx
https://list.sipfoundry.org/mailman/listinfo/resiprocate-devel
|