< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index  

Re: [repro-users] DNS SRV failover

This change was committed today.  Thanks for your patience.


On Fri, Dec 2, 2016 at 9:52 AM, Scott Godin <sgodin@xxxxxxxxxxxxxxx> wrote:
Hi Nikolay,

I know it's a year later, but I've finally had a chance to look into this and I understand why we are not trying the next DNS entry.  I will be committing a solution, likely sometime next week.

As far as a standard 408 error that occurs after 32-seconds - typically over UDP.  It doesn't make sense for the stack to try another DNS entry, since the 32 second transaction time has already expired.  Stack users expect some form of response within 32-seconds after issuing a request.  For requests that 408 after 32 seconds, the application will need to be responsible for re-issuing the request if it is desired.

With my changes coming, as long as TCP connection timeout occurs before the 32-second transaction timeout, the stack will try the next DNS entry.

Best Regards,

On Tue, Dec 1, 2015 at 5:57 AM, Nikolay Shopik <shopik@xxxxxxxxxx> wrote:
Hi Scott,

Any chance you was able to look into this? My future testing confirms
that it not related to introducing tcpconnecttimeout option, issue was
exist before it. Nobody just able to reproduce since nobody actually
waiting for 32sec timeouts before then

tcpconnecttimeout is awesome but as global setting it doesn't fit every
situation, per transport option will be much better. So I hope
eventually this could be implemented too.


On 09/10/15 17:27, Scott Godin wrote:
> Thanks for reporting this.  I did not specifically test out DNS failover
> when I added the TCP connect timeout.  This will need to be investigated.
> Unfortunately I'm travelling for the next 1.5 weeks and probably won't have
> any time in the short term to take a look.  Please let us know if you are
> able to troubleshoot this further.
> Thanks,
> Scott
> On Fri, Oct 9, 2015 at 10:14 AM, Nikolay Shopik <shopik@xxxxxxxxxx> wrote:
>> This is continuation of this thread -
>> http://list.resiprocate.org/archive/repro-users/msg00875.html
>> I'm trying out tcpconnecttimeout option (thanks Scott for adding this),
>> where I have 3 SRV TCP records with different priority, where high
>> priority(lower value) peer always down.
>> But my tcpdump show that after tcpconnecttimeout timer is passed it
>> notify me with request timeout 408, not even trying next peer.
>> So I'm set tcpconnecttimeout to 0 and wait for 32 seconds and still get
>> - 408 Request Timeout after first peer failure.
>> There is one thing though, first call always fails for me, but if I
>> redial almost immediately its get through via second DNS SRV record.
>> This is repro 1.10, and I've tested on 1.9.7 too with same results so
>> this doesn't looks like regression.
>> debug output
>> https://gist.github.com/nshopik/8ef091d2e329336227e8
>> _______________________________________________
>> repro-users mailing list
>> repro-users@xxxxxxxxxxxxxxx
>> https://list.resiprocate.org/mailman/listinfo/repro-users