< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index  

Re: [reSIProcate] [reSIProcate-users] Transmission to broken TCP connection


I had some offlist discussions with Paul Kurmas who attempted to make such a change in the stack, however in the end it turned out to be a very difficult problem to solve.  >From what I remember, in his particular case there was no error in the socket write operation, the socket error came some time later and manifested itself in a read error.  However, once the write is successful the transport discards the copy of the message it is supposed to send, and by the time the read error is seen there is no way to associate it to any one message that was written anyway.  Our conclusion was that resip is really doing all that it can here, at least without any major overhauls.

Note:  There is a Tcp connection terminated callback that you can register with the stack to get notification of when TCP connections are disconnected, you may be able to think of a way to use this and be more reactive at the application layer.

One quick (potentially off the wall) thought I had:  What if we enabled some form of retransmit timer for ClientInviteTransations on reliable (TCP/TLS) transports?  I know this is out of spec, but it in resip's case, if the first sent failed due to a bad socket connection, and this condition was detected before the retransmit interval, then the retransmit would cause the transports to form a new connection on the 2nd attempt, and we would recover.  The retransmit time could be relatively high (ie. 2000ms) and would be disabled on reception of a 100, or other response.

Scott

On Thu, Oct 15, 2009 at 11:23 AM, Adam Roach <adam@xxxxxxxxxxx> wrote:
On 10/15/09 03:27, Oct 15, Mats Behre wrote:

So, it seems that when the message is sent the read side detects that the connection is down (with a code that
is unrecognised by TCP, but decoded as WSAECONNRESET in Transport::error), but it doesn't appear as if our
application is notified.
This may be kind of a grey area; RFC 3261 (18.4) specifies that if the result of sending a request
is a connection failure, the transport user SHOULD be informed, and 17.1.4 says the the TU SHOULD
be notified. I believe our application takes the role of the TU, and is responsible for further actions.
Is this situation covered by "the result is a connection failure"? I think it can be argued that it is.

The problem from our point of view is that it takes a long time (transaction timeout) before the application
finds out about the failure. Is there anything we can do to avoid this?

[I'm copying the devel list on this, as it has recently come up there as well]

This is a known shortcoming of the current TCP transport design. Ideally, a socket failure on write would cause the TCP transport to re-initiate a connection and re-attempt sending the message that failed (with care not to get into a loop of try/fail/try/fail, ad infinitum). If the second attempt fails, the transport should then inform the TU. I haven't done any analysis to see how much work this would take, but it sounds like a fairly easy fix -- unfortunately, I don't have any cycles to work on it myself.

If you'd like to dig into things and propose a patch, I would start with resip/stack/TcpTransport.{cxx,hxx} and resip/stack/TcpBaseTransport.{cxx,hxx}.

/a

_______________________________________________
resiprocate-devel mailing list
resiprocate-devel@xxxxxxxxxxxxxxx
https://list.resiprocate.org/mailman/listinfo/resiprocate-devel