< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index Next in Thread >

Re: [reSIProcate] The Million User Dilemma


Note:  Some work was done a few year back by Justin Matthews to replace STL streams with a custom "Fast Stream" implementation.  From what I remember this improved performance on Windows more than other platforms.  It can be enabled by a compile time flag - see ResipFastStreams.hxx.

In my view the two biggest issues in resip today surrounding scalability are:
1.  Scalability when using TCP/TLS - this has been addresses recently by the addition of epoll for most OS's, but still remains an issue for Windows users.
2.  Resip's inability to take better advantage of multi-core CPU's.

This is a great discussion!
 
Regards,
Scott

On Mon, Jan 31, 2011 at 12:27 PM, Dan Weber <dan@xxxxxxxxxxxxxx> wrote:
Hi there,


I personally believe that the reactor design is the most scalable
approach especially when combined with the likes of epoll.  It can be
made concurrent very easily in the case of the stack inside resiprocate.
 Simply use hash based concurrency at the transaction layer based on
transaction ids to distributed across threads.  This involves no
necessary locks.

As for making DUM more parallel, I've been working with someone who is
an expert in Software Transactional Memory (author of TBoost.STM), to
help build data structures to support the needs of DUM and build an in
memory database.  Along with hash based concurrency, this could greatly
improve the performance of DUM.


Likewise, let's not forget that the two largest costs of resiprocate
right now are still memory consumption with fragmentation, and
serializing messages to the streams.  I believe we can greatly improve
the performance of those areas using something like boost::spirit::karma
with some enhanced data structures and a per sip message allocator.

Dan


On 01/29/2011 09:52 PM, Francis Joanis wrote:
> Hi guys,
>
> I've been using resip + dum for a while, but since I was more focusing
> on building UAs (and not proxies, ...) with it I've had no performance
> issue so far. In fact I found that it was actually performing better
> than some other SIP application layers, especially when handling
> multiple SIP events at the same time.
>
> The reason why it was performing better was because the DUM (application
> level) generally uses a single thread for all SIP events, rather than
> using one thread per event or event type (imagine what 1 thread per
> session would do... :( ).
>
> I see the current resip threading code as being like the reactor design
> pattern, where only a single thread is used to "select" then
> synchronously process events. From my experience, one main advantage of
> this approach is that the stack's general behaviour is "predictable"
> with regards to its performance and the flow of events (i.e. processing
> a single call VS processing 100+ incoming calls).
>
> However, one downside of the reactor is that it doesn't scale well on
> multicore CPUs since it only has a single thread. To really leverage
> multicore, programs need to become more and more concurrent (that is
> truly concurrent - i.e. without mutexes and locking) in order to get
> faster. This is probably nothing new for most of us, but it is something
> that I've been realizing practically more and more since I've been
> exposed to concurrent languages like Erlang.
>
> I think that investing time into making resip (at least the stack and
> DUM parts) multicore aware would be a great way to future-proof it.
>
> To add to what was already said in this thread:
>
> - It does make a lot of sense to leverage libevent or asio or ... to
> ensure best performance on all platforms. This is a long term goal but
> maybe we could start some prep work now (like decoupling stuff and
> laying down foundations). The alternative could be to try to implement
> the best select() substitute for each supported platforms, but we might
> then end up rewriting libevent ourselves.
> - Regarding the reactor design pattern, there is also the proactor one
> which uses (unless I'm mistaken) OS-level asynchronous IO (resip
> currently uses synchronous IO). The idea is that the resip transport
> thread would be able to service multiple IO operations at the same time
> through the kernel. I think this is similar to what Kennard mentioned as
> a post-notified system and this would not be an easy change.
> - Adding more threads where it makes sense (like one per transport or
> ...) might not be good enough if those threads still use thread locking
> to communicate between each other. I've done a bit of googling about
> lock-free data structures and it is quite interesting. I might try it
> one day to see how much faster it could get just between the stack and
> the DUM.
> - It does also make sense to look into code profiling and ensuring that
> the code is not "wasting cycles"
>
> Anyway, I think this is a great idea and I would be happy to help :)
>
> Regards,
> Francis
>

_______________________________________________
resiprocate-devel mailing list
resiprocate-devel@xxxxxxxxxxxxxxx
https://list.resiprocate.org/mailman/listinfo/resiprocate-devel