Re: [reSIProcate] TLS error handling glitch / TLS connections dropped
On 30/05/14 17:31, Daniel Pocock wrote:
>
>
> I've observed some problems with the TLS code. Specifically, I've
> noticed this when using SIP over WebSockets (wss://) but it appears
> possible that it is a lower level problem impacting all TLS transports.
>
> Basically, the client sends some data and then the server disconnects.
> The server log contains something like this:
>
> ssl/TlsConnection.cxx:418 | Got TLS read ret=-1 error=1
> error:00000001:lib(0):func(0):reason(1)
>
> This occurs after a call to SSL_read
>
> error=1 corresponds to the generic SSL_ERROR_SSL
>
> I notice the TlsConnection::read() method has only ever been making one
> call to SSL_get_error() and so:
>
> a) it may leave other errors in the error queue. A subsequent call to
> SSL_get_error that would have returned SSL_ERROR_WANT_READ (possibly for
> some other connection) could actually return the unhandled error.
>
> b) because it is not checking the error queue, it is not logging what is
> really wrong at the moment the error really happened
>
> Issue (a) is somewhat mitigated by the fact that various methods invoke
> checkState() which invokes ERR_clear_error() before the actual OpenSSL
> operation.
>
> However, I'm still left with the fact the WebSocket connection sometimes
> vanishes without giving clues what is wrong. Wireshark shows that a TLS
> "Encrypted Alert" packet was sent from server to client but it is unable
> to show the exact error details. Chrome just returns error code 1006
> and doesn't log anything about the error.
>
> I've committed a fix making the error handling better in
> TlsConnection::read(), to go through all the error queue entries and log
> the details just as it has been done for SSL_write.
After committing this change, I started seeing the errors returned by
SSL_read
There are definitely multiple errors in the error queue:
ssl/TlsConnection.cxx:44 | error:140E0114:SSL
routines:SSL_shutdown:uninitialized
ssl/TlsConnection.cxx:44 | error:140E0114:SSL
routines:SSL_shutdown:uninitialized
ssl/TlsConnection.cxx:48 | Got TLS SSL_read error=1 ret=-1
and as ERR_clear_error had been called in checkState() just prior to
SSL_read, it would initially appear that both of these came from
SSL_read and were not leftover from some previous SSL library call.
Looking more closely, it is also possible that any of these came from
the call to SSL_pending in TlsConnection::read() or the call to
SSL_get_shutdown() in isGood() - I added extra logging around those
other calls to try and rule that out.
I also discovered somebody else saw this error string in the past:
http://list.resiprocate.org/archive/resiprocate-users/msg01896.html
and it is not immediately clear why he was getting the error text in the
original log message whereas I was seeing the more generic error string
in my original email.
Furthermore, the WebSocket connection is not dropping any more after
this change.
The message itself refers to SSL_shutdown (rather than SSL_read) - so
maybe these entries in the error queue relate to some socket other than
that being accessed with SSL_read