< Previous by Date Date Index Next by Date >
  Thread Index Next in Thread >

[reSIProcate-users] TCP Connection Resets in long calls


Hi,

 

  We are going through an acceptance test to check if SIP calls can endure for at least one week in our system.

  Unfortunately, after some hours calls are dropped.

 

  The setup is:

    - A repro Proxy V.1.9.8 running on Windows 2008 Server.

    - UserAgents written around reSIProcate V.1.8.14, running on Windows 7 Pro.

    - repro communicates to a Cisco UCM.

   

  We are using TCP between UAs and repro.

 

  The pattern of call drop is like this:

 

  1. A TCP connection is established by repro with the UA.

  2. A re-INVITE is sent over the new connection and is acknowledged by the UA, (INVITE, 200, ACK)

     (no session-timers were negotiated, just a refreshing re-INVITE by initiative of the remote UA, each 12 hours)

  3. After that, for some time, each 3 minutes UA and repro follow keep-alive sequence:

          repro   <--- Continuation keep-alive with CRLFCRLF <---   UserAgent

          repro   ---- Continuation keep-alive with CRLF     --->   UserAgent

          repro   <--- TCP ACK with 0 payload length         <---   UserAgent

  4. The keepalive sequence above happens 9 times (24 minutes) then UA ‘gives up’ doing keepalives

  5. About 11 hours later a refreshing re-INVITE coming from CUCM is forwarded by repro in that same TCP connection to UA and:

          repro   ---> re-INVITE arrived from CUCM           --->   UserAgent

          repro   <--- TCP RESET with wrong (!=0) ACK number <---   UserAgent

  6. repro stays quiet

  7. 32 seconds after, CUCM sends BYE

  8. repro creates a new TCP connection to UserAgent, sends BYE and that is normally acknowledged.  Call was dropped !

  9. Note that other TCP connections between repro and the same UA remain alive, what allows for registration refreshes.

  

  

  It seems that the UserAgent 'is the culprit'.

  Any insights, suggestions for config and test ?

 

  Could some network box in the middle be cause of such behaviour ?

 

  Below we see patterns of successful and failed keepalives from the reSIProcate log file of the UserAgent.

  Looks as internally reSIProcate keeps trying to do keepalives but fails (we can count 230 failures, one each 3 minutes, almost 12 hours).

 

Thanks and Best regards.

Julio Cabezas.

 

   

Sucessful keepalive: ------------------------------------------------------------------------------------------------

 

18/04/2015 12:52:40 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:52525 TCP target domain=unspecified mFlowKey=4312 ](12)

18/04/2015 12:53:12 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](17)

18/04/2015 12:53:18 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](18)

18/04/2015 12:53:18 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](19)

18/04/2015 12:53:18 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:99 | Refreshing keepalive for id=19: [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ], interval=180s, supportsOutbound=false, refCount=1

18/04/2015 12:53:18 [Debug] RESIP:TRANSACTION | 2532 | TimerQueue.cxx:124 | Adding application timer: KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](19) ms=180000

18/04/2015 12:53:18 [Debug] RESIP:TRANSPORT | 4340 | TransportSelector.cxx:1162 | Transmitting to [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ] tlsDomain= via [ V4 10.225.33.70:5060 TCP target domain=unspecified mFlowKey=0 ]

sigcomp id=

18/04/2015 12:53:18 [Debug] RESIP:TRANSPORT | 4340 | TcpBaseTransport.cxx:263 | Processing write for [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ]

18/04/2015 12:53:18 [Debug] RESIP:TRANSPORT | 4340 | ConnectionManager.cxx:59 | Found fd 4288

18/04/2015 12:53:18 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:115 | In State: NewMessage

18/04/2015 12:53:18 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:336 | Received response CRLF (aka pong).

18/04/2015 12:53:18 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:297 | Connection::performReads()  read=2

18/04/2015 12:53:18 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:672 | Creating buffer for CONN_BASE: 189DABE0 [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ]

18/04/2015 12:53:18 [Debug] RESIP:DUM | 2532 | DialogUsageManager.cxx:1361 | keepalive pong received from [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ]

18/04/2015 12:53:18 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:153 | Received pong response for keep alive id=19: [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ]

18/04/2015 12:53:41 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ](16)

18/04/2015 12:53:41 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:99 | Refreshing keepalive for id=16: [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ], interval=180s, supportsOutbound=false, refCount=3

18/04/2015 12:53:41 [Debug] RESIP:TRANSACTION | 2532 | TimerQueue.cxx:124 | Adding application timer: KeepAliveTimeout[ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ](16) ms=180000

18/04/2015 12:53:41 [Debug] RESIP:TRANSPORT | 4340 | TransportSelector.cxx:1162 | Transmitting to [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ] tlsDomain= via [ V4 10.225.33.70:5060 TCP target domain=unspecified mFlowKey=0 ]

sigcomp id=

18/04/2015 12:53:41 [Debug] RESIP:TRANSPORT | 4340 | TcpBaseTransport.cxx:263 | Processing write for [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ]

18/04/2015 12:53:41 [Debug] RESIP:TRANSPORT | 4340 | ConnectionManager.cxx:59 | Found fd 4376

18/04/2015 12:53:41 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:115 | In State: NewMessage

18/04/2015 12:53:41 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:336 | Received response CRLF (aka pong).

18/04/2015 12:53:41 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:297 | Connection::performReads()  read=2

18/04/2015 12:53:41 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:672 | Creating buffer for CONN_BASE: 18FE07C8 [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ]

18/04/2015 12:53:41 [Debug] RESIP:DUM | 2532 | DialogUsageManager.cxx:1361 | keepalive pong received from [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ]

18/04/2015 12:53:41 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:153 | Received pong response for keep alive id=16: [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ]

18/04/2015 12:56:18 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](19)

18/04/2015 12:56:18 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:99 | Refreshing keepalive for id=19: [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ], interval=180s, supportsOutbound=false, refCount=1

xx:70 | Send to TU: TU: DialogUsageManager size=0

 

 

Transition from successful to failed keepalives: -----------------------------------------------------------------------

 

18/04/2015 13:20:18 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](19)

18/04/2015 13:20:18 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:99 | Refreshing keepalive for id=19: [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ], interval=180s, supportsOutbound=false, refCount=1

18/04/2015 13:20:18 [Debug] RESIP:TRANSACTION | 2532 | TimerQueue.cxx:124 | Adding application timer: KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](19) ms=180000

18/04/2015 13:20:18 [Debug] RESIP:TRANSPORT | 4340 | TransportSelector.cxx:1162 | Transmitting to [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ] tlsDomain= via [ V4 10.225.33.70:5060 TCP target domain=unspecified mFlowKey=0 ]

sigcomp id=

18/04/2015 13:20:18 [Debug] RESIP:TRANSPORT | 4340 | TcpBaseTransport.cxx:263 | Processing write for [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ]

18/04/2015 13:20:18 [Debug] RESIP:TRANSPORT | 4340 | ConnectionManager.cxx:59 | Found fd 4288

18/04/2015 13:20:37 [Error] RESIP:TRANSPORT | 4340 | TcpConnection.cxx:61 | Some other error, code = 10054

18/04/2015 13:20:37 [Info] RESIP:TRANSPORT | 4340 | TcpConnection.cxx:65 | Failed read on 4288 Unknown error

18/04/2015 13:20:37 [Info] RESIP:TRANSPORT | 4340 | Transport.cxx:152 | Connection reset

18/04/2015 13:20:37 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:302 | Closing connection bytesRead=-1

18/04/2015 13:20:37 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:93 | ConnectionBase::~ConnectionBase 189DABE0

18/04/2015 13:20:37 [Info] RESIP:TRANSACTION | 4340 | TuSelector.cxx:102 | Sending ConnectionTerminated [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ] to TUs

18/04/2015 13:20:37 [Debug] RESIP:DUM | 2532 | DialogUsageManager.cxx:1453 | connection terminated message

18/04/2015 13:20:41 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ](16)

18/04/2015 13:20:41 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:99 | Refreshing keepalive for id=16: [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ], interval=180s, supportsOutbound=false, refCount=3

18/04/2015 13:20:41 [Debug] RESIP:TRANSACTION | 2532 | TimerQueue.cxx:124 | Adding application timer: KeepAliveTimeout[ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ](16) ms=180000

18/04/2015 13:20:41 [Debug] RESIP:TRANSPORT | 4340 | TransportSelector.cxx:1162 | Transmitting to [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ] tlsDomain= via [ V4 10.225.33.70:5060 TCP target domain=unspecified mFlowKey=0 ]

sigcomp id=

18/04/2015 13:20:41 [Debug] RESIP:TRANSPORT | 4340 | TcpBaseTransport.cxx:263 | Processing write for [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ]

18/04/2015 13:20:41 [Debug] RESIP:TRANSPORT | 4340 | ConnectionManager.cxx:59 | Found fd 4376

18/04/2015 13:21:00 [Error] RESIP:TRANSPORT | 4340 | TcpConnection.cxx:61 | Some other error, code = 10054

18/04/2015 13:21:00 [Info] RESIP:TRANSPORT | 4340 | TcpConnection.cxx:65 | Failed read on 4376 Unknown error

18/04/2015 13:21:00 [Info] RESIP:TRANSPORT | 4340 | Transport.cxx:152 | Connection reset

18/04/2015 13:21:00 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:302 | Closing connection bytesRead=-1

18/04/2015 13:21:00 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:93 | ConnectionBase::~ConnectionBase 18FE07C8

18/04/2015 13:21:00 [Info] RESIP:TRANSACTION | 4340 | TuSelector.cxx:102 | Sending ConnectionTerminated [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ] to TUs

18/04/2015 13:21:00 [Debug] RESIP:DUM | 2532 | DialogUsageManager.cxx:1453 | connection terminated message

 

Failed keepalive: -----------------------------------------------------------------------------------------------------

 

18/04/2015 13:26:18 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](19)

18/04/2015 13:26:18 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:99 | Refreshing keepalive for id=19: [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ], interval=180s, supportsOutbound=false, refCount=1

18/04/2015 13:26:18 [Debug] RESIP:TRANSACTION | 2532 | TimerQueue.cxx:124 | Adding application timer: KeepAliveTimeout[ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ](19) ms=180000

18/04/2015 13:26:18 [Debug] RESIP:TRANSPORT | 4340 | TransportSelector.cxx:1162 | Transmitting to [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ] tlsDomain= via [ V4 10.225.33.70:5060 TCP target domain=unspecified mFlowKey=0 ]

sigcomp id=

18/04/2015 13:26:18 [Debug] RESIP:TRANSPORT | 4340 | TcpBaseTransport.cxx:263 | Processing write for [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ]

18/04/2015 13:26:18 [Debug] RESIP:TRANSPORT | 4340 | ConnectionManager.cxx:71 | fd 4288 does not exist.

18/04/2015 13:26:18 [Debug] RESIP:TRANSPORT | 4340 | TcpBaseTransport.cxx:290 | Failed to create/get connection: [ V4 10.225.33.6:52549 TCP target domain=unspecified mFlowKey=4288 ]

18/04/2015 13:26:41 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4376 ](16)

18/04/2015 13:26:41 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

KeepAliveTimeout[ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ](20)

18/04/2015 13:26:41 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:99 | Refreshing keepalive for id=20: [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ], interval=180s, supportsOutbound=false, refCount=3

18/04/2015 13:26:41 [Debug] RESIP:TRANSACTION | 2532 | TimerQueue.cxx:124 | Adding application timer: KeepAliveTimeout[ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ](20) ms=180000

18/04/2015 13:26:41 [Debug] RESIP:TRANSPORT | 4340 | TransportSelector.cxx:1162 | Transmitting to [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ] tlsDomain= via [ V4 10.225.33.70:5060 TCP target domain=unspecified mFlowKey=0 ]

sigcomp id=

18/04/2015 13:26:41 [Debug] RESIP:TRANSPORT | 4340 | TcpBaseTransport.cxx:263 | Processing write for [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ]

18/04/2015 13:26:41 [Debug] RESIP:TRANSPORT | 4340 | ConnectionManager.cxx:59 | Found fd 4468

18/04/2015 13:26:41 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:115 | In State: NewMessage

18/04/2015 13:26:41 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:336 | Received response CRLF (aka pong).

18/04/2015 13:26:41 [Debug] RESIP:TRANSPORT | 4340 | Connection.cxx:297 | Connection::performReads()  read=2

18/04/2015 13:26:41 [Debug] RESIP:TRANSPORT | 4340 | ConnectionBase.cxx:672 | Creating buffer for CONN_BASE: 0E2945F8 [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ]

18/04/2015 13:26:41 [Debug] RESIP:DUM | 2532 | DialogUsageManager.cxx:1361 | keepalive pong received from [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ]

18/04/2015 13:26:41 [Debug] RESIP:DUM | 2532 | KeepAliveManager.cxx:153 | Received pong response for keep alive id=20: [ V4 10.225.33.6:5060 TCP target domain=10.225.33.6 mFlowKey=4468 ]

18/04/2015 13:29:18 [Debug] RESIP:TRANSACTION | 4340 | TuSelector.cxx:70 | Send to TU: TU: DialogUsageManager size=0

 

-----------------------------------------------------------------------------------------------------------------------