< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index  

Re: [reSIProcate] TLS error handling glitch / TLS connections dropped



On 30/05/14 17:31, Daniel Pocock wrote:
> 
> 
> I've observed some problems with the TLS code.  Specifically, I've
> noticed this when using SIP over WebSockets (wss://) but it appears
> possible that it is a lower level problem impacting all TLS transports.
> 
> Basically, the client sends some data and then the server disconnects.
> The server log contains something like this:
> 
> ssl/TlsConnection.cxx:418 | Got TLS read ret=-1 error=1
> error:00000001:lib(0):func(0):reason(1)
> 
> This occurs after a call to SSL_read
> 
> error=1 corresponds to the generic SSL_ERROR_SSL
> 
> I notice the TlsConnection::read() method has only ever been making one
> call to SSL_get_error() and so:
> 
> a) it may leave other errors in the error queue.  A subsequent call to
> SSL_get_error that would have returned SSL_ERROR_WANT_READ (possibly for
> some other connection) could actually return the unhandled error.
> 
> b) because it is not checking the error queue, it is not logging what is
> really wrong at the moment the error really happened
> 
> Issue (a) is somewhat mitigated by the fact that various methods invoke
> checkState() which invokes ERR_clear_error() before the actual OpenSSL
> operation.
> 
> However, I'm still left with the fact the WebSocket connection sometimes
> vanishes without giving clues what is wrong.  Wireshark shows that a TLS
> "Encrypted Alert" packet was sent from server to client but it is unable
> to show the exact error details.  Chrome just returns error code 1006
> and doesn't log anything about the error.
> 
> I've committed a fix making the error handling better in
> TlsConnection::read(), to go through all the error queue entries and log
> the details just as it has been done for SSL_write.

After committing this change, I started seeing the errors returned by
SSL_read

There are definitely multiple errors in the error queue:

ssl/TlsConnection.cxx:44 | error:140E0114:SSL
routines:SSL_shutdown:uninitialized
ssl/TlsConnection.cxx:44 | error:140E0114:SSL
routines:SSL_shutdown:uninitialized
ssl/TlsConnection.cxx:48 | Got TLS SSL_read error=1 ret=-1

and as ERR_clear_error had been called in checkState() just prior to
SSL_read, it would initially appear that both of these came from
SSL_read and were not leftover from some previous SSL library call.

Looking more closely, it is also possible that any of these came from
the call to SSL_pending in TlsConnection::read() or the call to
SSL_get_shutdown() in isGood() - I added extra logging around those
other calls to try and rule that out.

I also discovered somebody else saw this error string in the past:

http://list.resiprocate.org/archive/resiprocate-users/msg01896.html

and it is not immediately clear why he was getting the error text in the
original log message whereas I was seeing the more generic error string
in my original email.

Furthermore, the WebSocket connection is not dropping any more after
this change.

The message itself refers to SSL_shutdown (rather than SSL_read) - so
maybe these entries in the error queue relate to some socket other than
that being accessed with SSL_read