[reSIProcate] epoll performance results for resip

Kennard White kennard_white at logitech.com
Thu Jan 6 21:27:47 CST 2011


Hi,

Attached is some performance test results for the latest resip stack. Data
is gathered using resip/stack/test/testStackFlavors.py, which runs the
testStack program in the same directory. This is a very simple test that
runs two stacks. The sender stack generates UAC REGISTER transactions and
the receiver stack is the UAS. The testStack program can be configured (via
command line options) with various modes: UDP or TCP, number of ports, epoll
or select, etc. The script runs through various combinations of options.

The key metric is transactions-per-second (tps). More precisely, the
reported metric is really transaction pairs, since it is doing both the UAC
and UAS side. Intent of the test is to measure the relative performance of
different optimizations of the stack. The absolute performance isn't so
meaningful, though it is likely an upper bound on what any real application
might achieve.

The comments in the source code testStack.cxx provide a brief explanation of
the different thread modes. I've observed a lot of variation (>20%) in the
tps numbers from run-to-run. Thus don't assign much meaning to small tps
differences.

This test data was generated on a Dell PowerEdge R610 w/2 Xeon CPU @ 1.2G,
64bit; total 8 cores, running ubuntu linux 2.6.31.

Attachment is ASCII CSV file, open with Excel or your favorite text editor.

Some observations:
* For single port, event(epoll-based) is comparable to pre-existing
behavior. In this particular test it appears faster, but I've also seen it
perform slightly worse on other machines.
* The performance penalty going from 1 port to 10k ports is between 25% and
50%. My belief is this penalty is due to the O(logN) searches within
TransportSelector, not epoll itself. But haven't really investigated. The
first version had 10x penalties, so I'm pretty happy with current results.

One last comment. I have a repro instance using the epoll mode running as
TCP-UDP gateway, and it shows between 2k and 3k tps throughput for
non-invite transactions when handling 50k concurrent TCP connections with
simulated traffic. It is CPU limited. I've just started profiling this. If
anyone has previously profiled repro and/or has good ideas for performance
optimization, please let me know your thoughts.

Finally, I'd like to see similar data for Windows or other platforms. I
don't build or run under Windows myself. Please feel free to modify the test
script if needed in order to get it to run under Windows.

Regards,
Kennard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20110106/afd74400/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: misc01-testStack.csv
Type: application/octet-stream
Size: 15655 bytes
Desc: not available
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20110106/afd74400/attachment.obj>


More information about the resiprocate-devel mailing list