< Previous by Date Date Index Next by Date >
  Thread Index  

Re: [reSIProcate] [repro-users] multiply DNS SRV records


Ah yes - I just tried an address outside my domain and I see the 408 after 32-seconds.  

I'm not sure about the correct way to handle this.  We could certainly change the code to try the next DNS entry after a TimerB expirey.  But not sure what the best practice would be to shorten the failover time.  

Does anyone have any feedback on either of the below 2 items?
1 - triggering a DNS failover after TimerB expirey.
2 - having some form of TCP connection timeout that is lower than Timer B?  Note:  UDP would still be prone to 32 second failover timing though.

Regards,
Scott

On Sat, Mar 21, 2015 at 2:10 PM, Nikolay Shopik <shopik@xxxxxxxxxx> wrote:
Hi Scott,

You've tested in same L2 segment when no machine running. Look for error
code "Connection.cxx:486 | Exception on socket 41 code: 113"

Error code 113 is - No route to host in Linux, and since you are running
in same L2 domain, OS arp (or IPv6 ND) timeout produce such errors.
Which helps quickly detect no machine running by this IP (mostly Linux
feature, Windows net stack doesn't have this).

You can easily check this in Linux just ping any non-existing host in
same L2 domain you will get "Destination Host Unreachable" under 2 seconds.


On 21.03.2015 20:46, Scott Godin wrote:
> Hi Nikolay,
>
> I just tested this on my linux box and the TCP connection error is seen
> pretty much immediately.  Are you sure you don't have some kind of
> intermediate load balancer or reverse proxy in the path that is accepting
> the TCP connection.  Try checking the wireshark trace to confirm.
>
> For reference here are my STACK level log outputs for both attempting an IP
> address for a machine that is running but not listening on the TCP port and
> for an IP address where there is no machine running at all.  Each time a
> Socket exception is received.  Note:  these logs are from an older version
> - however I also tested the latest trunk code with the same results.
>
> Scott
>
>
> *** Machine running - nothing listening on TCP port 5783
>
> .. stuff cut..
 >
> *** No Machine running at 192.168.1.199
>
>
> DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3033856832 |
> TransportSelector.cxx:1288 | Transmitting to [ V4 192.168.1.199:5783 TCP
> targetDomain=192.168.1.199 transportKey=3 ] tlsDomain= via [ V4
> 192.168.1.110:5060 TCP targetDomain=192.168.1.199 ]
>
> INVITE sip:9999@192.168.1.199:5783;transport=TCP SIP/2.0
> stuff cut.... 
> DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> TcpBaseTransport.cxx:308 | Processing write for [ V4 192.168.1.199:5783 TCP
> targetDomain=192.168.1.199 transportKey=3 ]
> DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> ConnectionManager.cxx:93 | Could not find a connection for [ V4
> 192.168.1.199:5783 TCP targetDomain=192.168.1.199 transportKey=3 ]
> DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> InternalTransport.cxx:121 | Creating fd=41 V4/TCP
> DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> TcpBaseTransport.cxx:241 | Opening new connection to [ V4 192.168.1.199:5783
> TCP targetDomain=192.168.1.199 transportKey=3 ]
> DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> ConnectionBase.cxx:71 | ConnectionBase::ConnectionBase, who: [ V4
> 192.168.1.199:5783 TCP targetDomain=192.168.1.199 transportKey=3 ] 0x8169ff0
> DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> ConnectionBase.cxx:84 | No compression library available: 0x8169ff0
> INFO | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> Connection.cxx:42 | Connection::Connection: new connection created to who:
> [ V4 192.168.1.199:5783 TCP targetDomain=192.168.1.199 flowKey=41
> transportKey=3 ]
> DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> ConnectionManager.cxx:187 | ConnectionManager::addConnection() 41:[ V4
> 192.168.1.199:5783 TCP targetDomain=192.168.1.199 flowKey=41 transportKey=3
> ], totalConnections=0
> DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> TcpConnection.cxx:18 | Creating TCP connection [ V4 192.168.1.199:5783 TCP
> targetDomain=192.168.1.199 transportKey=3 ] on 41
> STACK | 20150321-125144.329 | lt-repro | RESIP:TRANSACTION | 3033856832 |
.. stuff cut... 
> INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> Connection.cxx:486 | Exception on socket 41 code: 113; closing connection

> DEBUG | 20150321-125147.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> ConnectionBase.cxx:115 | ConnectionBase::~ConnectionBase 0x8169ff0
> INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> TuSelector.cxx:102 | Sending ConnectionTerminated [ V4 192.168.1.199:5783
> TCP targetDomain=192.168.1.199 flowKey=41 transportKey=3 ] to TUs
> DEBUG | 20150321-125147.277 | lt-repro | RESIP:DUM | 3008678720 |
> DialogUsageManager.cxx:1451 | connection terminated message
> STACK | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> TransactionState.cxx:733 | Found matching transaction for TransportFailure:
> ca32e8752c0e5101 -> tid=ca32e8752c0e5101 [ ClientInvite/Calling reliable
> target=[ V4 0.0.0.0:0 UNKNOWN_TRANSPORT ]]
> STACK | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> TransactionState.cxx:1159 | TransactionState::processClientInvite:
> TransportFailure: ca32e8752c0e5101 tid=ca32e8752c0e5101 [
> ClientInvite/Calling reliable target=[ V4 0.0.0.0:0 UNKNOWN_TRANSPORT ]]
> DEBUG | 20150321-125147.277 | lt-repro | RESIP:DNS | 3042249536 |
> DnsResult.cxx:187 | Remove vip 192.168.1.199(1)
> INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> TransactionState.cxx:2292 | Try sending request to a different dns result
> STACK | 20150321-125147.277 | lt-repro | RESIP:DNS | 3033856832 |
> DnsResult.cxx:619 | Priming []
> INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> TransactionState.cxx:2314 | No DNS results remain.
> DEBUG | 20150321-125147.277 | lt-repro | RESIP | 3033856832 |
> Helper.cxx:374 | Helper::makeResponse(SipReq:  INVITE
> 9999@192.168.1.199:5783 tid=ca32e8752c0e5101 cseq=2 INVITE contact=
> 1002@192.168.1.117:20750 / 2 from(tu) code=503 reason=
> INFO | 20150321-125147.278 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> TransactionState.cxx:2121 | Ran out of dns entries for 192.168.1.199. Send
> 503
> DEBUG | 20150321-125147.278 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> TuSelector.cxx:70 | Send to TU: TU: Proxy size=0
>
> SIP/2.0 503 Transport failure: no transports left to try
> Via: SIP/2.0/TCP 192.168.1.110:5060
> ;branch=z9hG4bK-524287-1---ca32e8752c0e5101;rport
> Via: SIP/2.0/UDP 192.168.1.117:20750
> ;branch=z9hG4bK-d8754z-e354944e8a5c960f-1---d8754z-;rport=20750
> To: <sip:9999@Domain1>;tag=6c6b864d
> From: <sip:1002@Domain1>;tag=a706ab3b
> Call-ID: NWY2ZGE3OWE3NDA1MjlkMzQ1MDdhNDllMjZiODcwYjI.
> CSeq: 2 INVITE
> Warning: 399 spectrum-ubuntu "No other DNS entries to try (7,113)"
> Content-Length: 0
>
>
>
> On Sat, Mar 21, 2015 at 10:33 AM, <slgodin@xxxxxxxxx> wrote:
>
>> TimerB of 32 seconds is based on  T1 which defaults to 500ms.  TimerB = t1
>> * 64.  T1 is adjustable.   I don't recommend changing these defaults
>> though.  Either way - if resip was properly detecting the tcp connection
>> failure (should be very fast) then it would failover well before TimerB
>> kicked in anyway.  UDP is another story and would take the full 32 seconds.
>>
>> Scott
>>
>>
>>
>> Sent from my iPhone
>>
>>> On Mar 20, 2015, at 6:40 PM, Nikolay Shopik <shopik@xxxxxxxxxx> wrote:
>>>
>>> I've also checked config it seems there is no way to change TimerB
>>> currently.
>>>
>> https://github.com/resiprocate/resiprocate/blob/master/rutil/Timer.cxx#L39
>>>
>>> 32 seconds timeout for invite is too much. And if you have multiply DNS
>>> SRV records and no just 1 of them down it will take forever to
>> switchover.
>>>
>>> It will be nice to have it exposed it via config.
>>>
>>>> On 20.03.2015 23:59, Scott Godin wrote:
>>>> Yes that's a transport error.  If your server is fully down it seems
>> that
>>>> resip may not be properly detecting the TCP connection errors then.
>> Seems
>>>> like this will require a deeper investigation.
>>>>
>>>> Scott
>>>>
>>>>> On Fri, Mar 20, 2015 at 4:48 PM, Nikolay Shopik <shopik@xxxxxxxxxx>
>> wrote:
>>>>>
>>>>> In that case TCP fail to establish (no answer to SYN) isn't that
>>>>> transport error? In my case remote side is down, so TCP never can be
>>>>> established.
>>>>>
>>>>>> On 20.03.2015 23:34, Scott Godin wrote:
>>>>>> It appears there was no actual transport error - it seems the TCP
>>>>>> connection formed fine and that the server just didn't return any SIP
>>>>>> response (we timed out waiting for the response).  If we fail to form
>> the
>>>>>> actual TCP connection to the server it should failover to the other
>> SRV
>>>>>> record correctly.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>>> On Fri, Mar 20, 2015 at 4:01 PM, Nikolay Shopik <shopik@xxxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is debug level log
>>>>>>>
>>>>>>> https://gist.github.com/nshopik/daadbc9669484a37c10a
>>>>>>>
>>>>>>>> On 20.03.2015 22:38, Scott Godin wrote:
>>>>>>>> I believe that should work fine.  A STACK level (DEBUG level MAY be
>> OK
>>>>> as
>>>>>>>> well) should help shed light on what is going on.
>>>>>>>>
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>> On Fri, Mar 20, 2015 at 3:21 PM, Nikolay Shopik <shopik@xxxxxxxxxx
>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>> On 20.03.2015 22:13, Daniel Pocock wrote:
>>>>>>>>>> Which repro version are you using, is it one of the packages?
>>>>>>>>>>
>>>>>>>>>> Can you see the DNS lookups in the logs?
>>>>>>>>>>
>>>>>>>>>> If you run tcpdump or wireshark, can you see all the DNS lookups?
>>>>>>>>>
>>>>>>>>> This is debian package from wheezy-backports (1.9.7-4~bpo70+2)
>>>>>>>>>
>>>>>>>>> I didn't check logs but DNS cache in repro says it got all DNS
>> records
>>>>>>>>> correctly.
>>>>>>>>> _______________________________________________
>>>>>>>>> repro-users mailing list
>>>>>>>>> repro-users@xxxxxxxxxxxxxxx
>>>>>>>>> https://list.resiprocate.org/mailman/listinfo/repro-users
>>>>
>>
>