< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index Next in Thread >

Re: [reSIProcate-users] Controlling TCP connection reuse


> But as soon as the last session in that NetworkAssociation is eliminated, the KeepAlive for that connection is cancelled.

Good point.  In this case the next message sent will start the process to determine that the connection is dead, but the transaction will likely timeout before the socket can determine it's down.

I originally thought the SO_KEEPALIVE option would help here, but it appears the default time is 2 hours and the time can only be modified system wide and not on a per connection basis:  
http://dev.fyicenter.com/Interview-Questions/Socket-4/Why_does_it_take_so_long_to_detect_that_the_peer.html

The only thing I can think of to help this situation is to:
1.  build a keep alive mechanism into the stack itself and not to use the NetworkAssociation functionality in DUM.  Or 
2.  to use the KeepAliveManager::add method directly from your app, so that the keepalives continue even though no dialog exists.

Scott


On Tue, May 19, 2009 at 4:04 PM, Paul Kurmas <pkurmas@xxxxxxxxxxxxx> wrote:
I understand the rationale behind the TCP connection reuse & won't make any argument against it.  The DUM KeepAliveManager seems to work fine when there is an active session.  But as soon as the last session in that NetworkAssociation is eliminated, the KeepAlive for that connection is cancelled.  This leaves me exposed to the problem still - an open socket connection that fails isn't detected.  Is the argument that the next write on that socket should expose that failure & cause the socket to be reopened?  That's not what happens for my application... our application stalls until the INVITE that is sent expires.

PK
________________________________________
From: slgodin@xxxxxxxxx [mailto:slgodin@xxxxxxxxx] On Behalf Of Scott Godin
Sent: Sunday, May 17, 2009 12:05 PM
To: Paul Kurmas
Cc: resiprocate-users@xxxxxxxxxxxxxxx
Subject: Re: [reSIProcate-users] Controlling TCP connection reuse

Right now the stack will not automatically close any TCP connections, unless it get's an error sending or receiving, or the OS has run out of TCP socket descriptors (TcpBaseTransport.cxx line 153).   Dead TCP connections are not normally detected until you try to send data on them.  Using the KeepAliveManager will help to cleanup dead connections, since it will ensure there is some data sent on each connection periodically.  However on some OS's, it can still take up to 2 mins to discover the connection is dead, after attempting to send data on it.  

Closing the TCP connection after each transaction does not sound like a good way to go, since it will be difficult to ensure that each TCP connection is only used for one transaction at a time, TCP connections are reasonably expensive, and this suggestion appears to go against RFC3261:
o In RFC 2543, closure of a TCP connection was made equivalent to a CANCEL. This was nearly impossible to implement (and wrong) for TCP connections between proxies. This has been eliminated, so that there is no coupling between TCP connection state and SIP processing.

I'm not sure that there is a better/faster way to recover from dead TCP connections.  Does anyone else have any ideas?

Scott
On Fri, May 15, 2009 at 11:16 AM, Paul Kurmas <pkurmas@xxxxxxxxxxxxx> wrote:
I'm chasing an issue with stale connections to a remote endpoint that
was shutdown incorrectly.  The TCP socket remains open, and there are no
keep-alives (either TCP or application (via DUM's KeepAliveManager).
When the remote endpoint restarts, the 1st INVITE is sent over that
stale connection, and the remote endpoint stack returns a TCP RST.  On
the local endpoint there is no immediate reaction -- the application
stalls until the INVITE expires.  The next request works fine because a
new connection must be opened.

I have activated DUM's KeepAliveManager and it does seem to clear the
connection after some time.  That's good, but I'd prefer something more
responsive.  It seems to me the best solution is to close the connection
after a much shorter period of time.  This could be an immediate closure
of the TCP connection after a transaction is complete or a pathetically
low value for the aging of the cached connections.

I'd appreciate any feedback you could provide.  By the way, we're
running Resiprocate v1.3.4 at this time.
PK
_______________________________________________
resiprocate-users mailing list
resiprocate-users@xxxxxxxxxxxxxxx
List Archive: http://list.resiprocate.org/archive/resiprocate-users/