[reSIProcate] [repro-users] multiply DNS SRV records

Scott Godin sgodin at sipspectrum.com
Sat Mar 21 13:33:15 CDT 2015


Ah yes - I just tried an address outside my domain and I see the 408 after
32-seconds.

I'm not sure about the correct way to handle this.  We could certainly
change the code to try the next DNS entry after a TimerB expirey.  But not
sure what the best practice would be to shorten the failover time.

Does anyone have any feedback on either of the below 2 items?
1 - triggering a DNS failover after TimerB expirey.
2 - having some form of TCP connection timeout that is lower than Timer B?
Note:  UDP would still be prone to 32 second failover timing though.

Regards,
Scott

On Sat, Mar 21, 2015 at 2:10 PM, Nikolay Shopik <shopik at inblock.ru> wrote:

> Hi Scott,
>
> You've tested in same L2 segment when no machine running. Look for error
> code "Connection.cxx:486 | Exception on socket 41 code: 113"
>
> Error code 113 is - No route to host in Linux, and since you are running
> in same L2 domain, OS arp (or IPv6 ND) timeout produce such errors.
> Which helps quickly detect no machine running by this IP (mostly Linux
> feature, Windows net stack doesn't have this).
>
> You can easily check this in Linux just ping any non-existing host in
> same L2 domain you will get "Destination Host Unreachable" under 2 seconds.
>
>
> On 21.03.2015 20:46, Scott Godin wrote:
> > Hi Nikolay,
> >
> > I just tested this on my linux box and the TCP connection error is seen
> > pretty much immediately.  Are you sure you don't have some kind of
> > intermediate load balancer or reverse proxy in the path that is accepting
> > the TCP connection.  Try checking the wireshark trace to confirm.
> >
> > For reference here are my STACK level log outputs for both attempting an
> IP
> > address for a machine that is running but not listening on the TCP port
> and
> > for an IP address where there is no machine running at all.  Each time a
> > Socket exception is received.  Note:  these logs are from an older
> version
> > - however I also tested the latest trunk code with the same results.
> >
> > Scott
> >
> >
> > *** Machine running - nothing listening on TCP port 5783
> >
> > .. stuff cut..
>
 >

> > *** No Machine running at 192.168.1.199
> >
> >
> > DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3033856832 |
> > TransportSelector.cxx:1288 | Transmitting to [ V4 192.168.1.199:5783 TCP
> > targetDomain=192.168.1.199 transportKey=3 ] tlsDomain= via [ V4
> > 192.168.1.110:5060 TCP targetDomain=192.168.1.199 ]
> >
> > INVITE sip:9999 at 192.168.1.199:5783;transport=TCP SIP/2.0
>
> stuff cut....

> > DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > TcpBaseTransport.cxx:308 | Processing write for [ V4 192.168.1.199:5783
> TCP
> > targetDomain=192.168.1.199 transportKey=3 ]
> > DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > ConnectionManager.cxx:93 | Could not find a connection for [ V4
> > 192.168.1.199:5783 TCP targetDomain=192.168.1.199 transportKey=3 ]
> > DEBUG | 20150321-125144.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > InternalTransport.cxx:121 | Creating fd=41 V4/TCP
> > DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > TcpBaseTransport.cxx:241 | Opening new connection to [ V4
> 192.168.1.199:5783
> > TCP targetDomain=192.168.1.199 transportKey=3 ]
> > DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > ConnectionBase.cxx:71 | ConnectionBase::ConnectionBase, who: [ V4
> > 192.168.1.199:5783 TCP targetDomain=192.168.1.199 transportKey=3 ]
> 0x8169ff0
> > DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > ConnectionBase.cxx:84 | No compression library available: 0x8169ff0
> > INFO | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > Connection.cxx:42 | Connection::Connection: new connection created to
> who:
> > [ V4 192.168.1.199:5783 TCP targetDomain=192.168.1.199 flowKey=41
> > transportKey=3 ]
> > DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > ConnectionManager.cxx:187 | ConnectionManager::addConnection() 41:[ V4
> > 192.168.1.199:5783 TCP targetDomain=192.168.1.199 flowKey=41
> transportKey=3
> > ], totalConnections=0
> > DEBUG | 20150321-125144.278 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > TcpConnection.cxx:18 | Creating TCP connection [ V4 192.168.1.199:5783
> TCP
> > targetDomain=192.168.1.199 transportKey=3 ] on 41
> > STACK | 20150321-125144.329 | lt-repro | RESIP:TRANSACTION | 3033856832 |
>
.. stuff cut...

> > INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > Connection.cxx:486 | Exception on socket 41 code: 113; closing connection
> > DEBUG | 20150321-125147.277 | lt-repro | RESIP:TRANSPORT | 3025464128 |
> > ConnectionBase.cxx:115 | ConnectionBase::~ConnectionBase 0x8169ff0
> > INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> > TuSelector.cxx:102 | Sending ConnectionTerminated [ V4
> 192.168.1.199:5783
> > TCP targetDomain=192.168.1.199 flowKey=41 transportKey=3 ] to TUs
> > DEBUG | 20150321-125147.277 | lt-repro | RESIP:DUM | 3008678720 |
> > DialogUsageManager.cxx:1451 | connection terminated message
> > STACK | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> > TransactionState.cxx:733 | Found matching transaction for
> TransportFailure:
> > ca32e8752c0e5101 -> tid=ca32e8752c0e5101 [ ClientInvite/Calling reliable
> > target=[ V4 0.0.0.0:0 UNKNOWN_TRANSPORT ]]
> > STACK | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> > TransactionState.cxx:1159 | TransactionState::processClientInvite:
> > TransportFailure: ca32e8752c0e5101 tid=ca32e8752c0e5101 [
> > ClientInvite/Calling reliable target=[ V4 0.0.0.0:0 UNKNOWN_TRANSPORT ]]
> > DEBUG | 20150321-125147.277 | lt-repro | RESIP:DNS | 3042249536 |
> > DnsResult.cxx:187 | Remove vip 192.168.1.199(1)
> > INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> > TransactionState.cxx:2292 | Try sending request to a different dns result
> > STACK | 20150321-125147.277 | lt-repro | RESIP:DNS | 3033856832 |
> > DnsResult.cxx:619 | Priming []
> > INFO | 20150321-125147.277 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> > TransactionState.cxx:2314 | No DNS results remain.
> > DEBUG | 20150321-125147.277 | lt-repro | RESIP | 3033856832 |
> > Helper.cxx:374 | Helper::makeResponse(SipReq:  INVITE
> > 9999 at 192.168.1.199:5783 tid=ca32e8752c0e5101 cseq=2 INVITE contact=
> > 1002 at 192.168.1.117:20750 / 2 from(tu) code=503 reason=
> > INFO | 20150321-125147.278 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> > TransactionState.cxx:2121 | Ran out of dns entries for 192.168.1.199.
> Send
> > 503
> > DEBUG | 20150321-125147.278 | lt-repro | RESIP:TRANSACTION | 3033856832 |
> > TuSelector.cxx:70 | Send to TU: TU: Proxy size=0
> >
> > SIP/2.0 503 Transport failure: no transports left to try
> > Via: SIP/2.0/TCP 192.168.1.110:5060
> > ;branch=z9hG4bK-524287-1---ca32e8752c0e5101;rport
> > Via: SIP/2.0/UDP 192.168.1.117:20750
> > ;branch=z9hG4bK-d8754z-e354944e8a5c960f-1---d8754z-;rport=20750
> > To: <sip:9999 at Domain1>;tag=6c6b864d
> > From: <sip:1002 at Domain1>;tag=a706ab3b
> > Call-ID: NWY2ZGE3OWE3NDA1MjlkMzQ1MDdhNDllMjZiODcwYjI.
> > CSeq: 2 INVITE
> > Warning: 399 spectrum-ubuntu "No other DNS entries to try (7,113)"
> > Content-Length: 0
> >
> >
> >
> > On Sat, Mar 21, 2015 at 10:33 AM, <slgodin at gmail.com> wrote:
> >
> >> TimerB of 32 seconds is based on  T1 which defaults to 500ms.  TimerB =
> t1
> >> * 64.  T1 is adjustable.   I don't recommend changing these defaults
> >> though.  Either way - if resip was properly detecting the tcp connection
> >> failure (should be very fast) then it would failover well before TimerB
> >> kicked in anyway.  UDP is another story and would take the full 32
> seconds.
> >>
> >> Scott
> >>
> >>
> >>
> >> Sent from my iPhone
> >>
> >>> On Mar 20, 2015, at 6:40 PM, Nikolay Shopik <shopik at inblock.ru> wrote:
> >>>
> >>> I've also checked config it seems there is no way to change TimerB
> >>> currently.
> >>>
> >>
> https://github.com/resiprocate/resiprocate/blob/master/rutil/Timer.cxx#L39
> >>>
> >>> 32 seconds timeout for invite is too much. And if you have multiply DNS
> >>> SRV records and no just 1 of them down it will take forever to
> >> switchover.
> >>>
> >>> It will be nice to have it exposed it via config.
> >>>
> >>>> On 20.03.2015 23:59, Scott Godin wrote:
> >>>> Yes that's a transport error.  If your server is fully down it seems
> >> that
> >>>> resip may not be properly detecting the TCP connection errors then.
> >> Seems
> >>>> like this will require a deeper investigation.
> >>>>
> >>>> Scott
> >>>>
> >>>>> On Fri, Mar 20, 2015 at 4:48 PM, Nikolay Shopik <shopik at inblock.ru>
> >> wrote:
> >>>>>
> >>>>> In that case TCP fail to establish (no answer to SYN) isn't that
> >>>>> transport error? In my case remote side is down, so TCP never can be
> >>>>> established.
> >>>>>
> >>>>>> On 20.03.2015 23:34, Scott Godin wrote:
> >>>>>> It appears there was no actual transport error - it seems the TCP
> >>>>>> connection formed fine and that the server just didn't return any
> SIP
> >>>>>> response (we timed out waiting for the response).  If we fail to
> form
> >> the
> >>>>>> actual TCP connection to the server it should failover to the other
> >> SRV
> >>>>>> record correctly.
> >>>>>>
> >>>>>> Scott
> >>>>>>
> >>>>>>> On Fri, Mar 20, 2015 at 4:01 PM, Nikolay Shopik <shopik at inblock.ru
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Here is debug level log
> >>>>>>>
> >>>>>>> https://gist.github.com/nshopik/daadbc9669484a37c10a
> >>>>>>>
> >>>>>>>> On 20.03.2015 22:38, Scott Godin wrote:
> >>>>>>>> I believe that should work fine.  A STACK level (DEBUG level MAY
> be
> >> OK
> >>>>> as
> >>>>>>>> well) should help shed light on what is going on.
> >>>>>>>>
> >>>>>>>> Scott
> >>>>>>>>
> >>>>>>>>> On Fri, Mar 20, 2015 at 3:21 PM, Nikolay Shopik <
> shopik at inblock.ru
> >>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>> On 20.03.2015 22:13, Daniel Pocock wrote:
> >>>>>>>>>> Which repro version are you using, is it one of the packages?
> >>>>>>>>>>
> >>>>>>>>>> Can you see the DNS lookups in the logs?
> >>>>>>>>>>
> >>>>>>>>>> If you run tcpdump or wireshark, can you see all the DNS
> lookups?
> >>>>>>>>>
> >>>>>>>>> This is debian package from wheezy-backports (1.9.7-4~bpo70+2)
> >>>>>>>>>
> >>>>>>>>> I didn't check logs but DNS cache in repro says it got all DNS
> >> records
> >>>>>>>>> correctly.
> >>>>>>>>> _______________________________________________
> >>>>>>>>> repro-users mailing list
> >>>>>>>>> repro-users at resiprocate.org
> >>>>>>>>> https://list.resiprocate.org/mailman/listinfo/repro-users
> >>>>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20150321/ba32d9e1/attachment.htm>


More information about the resiprocate-devel mailing list