< Previous by Date Date Index Next by Date >
  Thread Index Next in Thread >

[reSIProcate] TLS error handling glitch / TLS connections dropped



I've observed some problems with the TLS code.  Specifically, I've
noticed this when using SIP over WebSockets (wss://) but it appears
possible that it is a lower level problem impacting all TLS transports.

Basically, the client sends some data and then the server disconnects.
The server log contains something like this:

ssl/TlsConnection.cxx:418 | Got TLS read ret=-1 error=1
error:00000001:lib(0):func(0):reason(1)

This occurs after a call to SSL_read

error=1 corresponds to the generic SSL_ERROR_SSL

I notice the TlsConnection::read() method has only ever been making one
call to SSL_get_error() and so:

a) it may leave other errors in the error queue.  A subsequent call to
SSL_get_error that would have returned SSL_ERROR_WANT_READ (possibly for
some other connection) could actually return the unhandled error.

b) because it is not checking the error queue, it is not logging what is
really wrong at the moment the error really happened

Issue (a) is somewhat mitigated by the fact that various methods invoke
checkState() which invokes ERR_clear_error() before the actual OpenSSL
operation.

However, I'm still left with the fact the WebSocket connection sometimes
vanishes without giving clues what is wrong.  Wireshark shows that a TLS
"Encrypted Alert" packet was sent from server to client but it is unable
to show the exact error details.  Chrome just returns error code 1006
and doesn't log anything about the error.

I've committed a fix making the error handling better in
TlsConnection::read(), to go through all the error queue entries and log
the details just as it has been done for SSL_write.

Has anybody else noticed anything like this problem with TLS connections
dropping?

If anybody notices new errors appearing after I make this more verbose,
feel free to discuss them on the list.