[reSIProcate] Transport TxFifo draining: timing

Fri Nov 19 17:56:46 CST 2010

Hi,

One of the key issues I'm facing in extending resip/repro to handle many
connections is the why the stack Transport's drain their TxFifo queues. The
flow today is:

   - When data is ready to send from transaction layer, the message is added
   to the Transport's TxFifo queue.
   - Prior to calling select(), the hasDataToSend() query is run on all
   internal Transports. This is O(#transports).
   - After select returns (with zero timeout), process() is called on all
   internal Transports. Each transport checks its TxFifo queue.

With the epoll() patch in place, these two O(N) traversals for the TxFifo
appear to be driving factors. Aside from performance issues, this is another
obstacle to libevent migration. I can think of a few things to do about
this:

   - Keep a list queue of Transports with data to transmit. As data is added
   to TxFifo, append the Transport to queue. This would solve the O(N) problem.
   The work is non-trivial low-level queue stuff with reasonable chance of
   bugs, especially if implemented with performance in mind.
   - InternalTransport::transmit(), which posts to data to the queue, could
   also call a method of Transport to do "something" with the data. There are
   two somethings that could happen:
      - Request callback on socket writable, and in that callback (from
      epoll, etc.) would then try draining the queue. This is what I ended up
      doing with UdpTransport: see checkTransmitQueue().
      - Read from the queue immediately and try writing the message onto the
      wire. If the write fails, then need to stash this "working" message and
      request writable callback. This approach is the preferred style
when working
      with edge-triggered epoll.

Is there any particular reason not to try writing the data immediately when
posted to the queue? Issues I can think of:

   - Threads. I'm concerned only with the InternalTransports which all run
   within the same thread as the stack itself. As far as I know, no other
   threads post to the transmit queue. Is that true?
   - Fairness. Defering the write until after returning to the main loop
   would allow some sort of prioritization to take place. But as far as I can
   tell there isn't any code in place to enforce fairness or prioritization.
   - Recursion. Some transports post messages to their own TxFifo; would
   need to careful about recursion; ideally avoid it. I suspect recursion
   safety is relatively simple to implement.

Any other approaches? Preferences? Risks?

Thanks,
Kennard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20101119/58062af3/attachment.htm>