[reSIProcate] The Million User Dilemma

Mon Jan 31 11:55:06 CST 2011

Note:  Some work was done a few year back by Justin Matthews to replace STL
streams with a custom "Fast Stream" implementation.  From what I remember
this improved performance on Windows more than other platforms.  It can be
enabled by a compile time flag - see ResipFastStreams.hxx.

In my view the two biggest issues in resip today
surrounding scalability are:
1.  Scalability when using TCP/TLS - this has been addresses recently by the
addition of epoll for most OS's, but still remains an issue for Windows
users.
2.  Resip's inability to take better advantage of multi-core CPU's.

This is a great discussion!

Regards,
Scott

On Mon, Jan 31, 2011 at 12:27 PM, Dan Weber <dan at marketsoup.com> wrote:

> Hi there,
>
>
> I personally believe that the reactor design is the most scalable
> approach especially when combined with the likes of epoll.  It can be
> made concurrent very easily in the case of the stack inside resiprocate.
>  Simply use hash based concurrency at the transaction layer based on
> transaction ids to distributed across threads.  This involves no
> necessary locks.
>
> As for making DUM more parallel, I've been working with someone who is
> an expert in Software Transactional Memory (author of TBoost.STM), to
> help build data structures to support the needs of DUM and build an in
> memory database.  Along with hash based concurrency, this could greatly
> improve the performance of DUM.
>
>
> Likewise, let's not forget that the two largest costs of resiprocate
> right now are still memory consumption with fragmentation, and
> serializing messages to the streams.  I believe we can greatly improve
> the performance of those areas using something like boost::spirit::karma
> with some enhanced data structures and a per sip message allocator.
>
> Dan
>
>
> On 01/29/2011 09:52 PM, Francis Joanis wrote:
> > Hi guys,
> >
> > I've been using resip + dum for a while, but since I was more focusing
> > on building UAs (and not proxies, ...) with it I've had no performance
> > issue so far. In fact I found that it was actually performing better
> > than some other SIP application layers, especially when handling
> > multiple SIP events at the same time.
> >
> > The reason why it was performing better was because the DUM (application
> > level) generally uses a single thread for all SIP events, rather than
> > using one thread per event or event type (imagine what 1 thread per
> > session would do... :( ).
> >
> > I see the current resip threading code as being like the reactor design
> > pattern, where only a single thread is used to "select" then
> > synchronously process events. From my experience, one main advantage of
> > this approach is that the stack's general behaviour is "predictable"
> > with regards to its performance and the flow of events (i.e. processing
> > a single call VS processing 100+ incoming calls).
> >
> > However, one downside of the reactor is that it doesn't scale well on
> > multicore CPUs since it only has a single thread. To really leverage
> > multicore, programs need to become more and more concurrent (that is
> > truly concurrent - i.e. without mutexes and locking) in order to get
> > faster. This is probably nothing new for most of us, but it is something
> > that I've been realizing practically more and more since I've been
> > exposed to concurrent languages like Erlang.
> >
> > I think that investing time into making resip (at least the stack and
> > DUM parts) multicore aware would be a great way to future-proof it.
> >
> > To add to what was already said in this thread:
> >
> > - It does make a lot of sense to leverage libevent or asio or ... to
> > ensure best performance on all platforms. This is a long term goal but
> > maybe we could start some prep work now (like decoupling stuff and
> > laying down foundations). The alternative could be to try to implement
> > the best select() substitute for each supported platforms, but we might
> > then end up rewriting libevent ourselves.
> > - Regarding the reactor design pattern, there is also the proactor one
> > which uses (unless I'm mistaken) OS-level asynchronous IO (resip
> > currently uses synchronous IO). The idea is that the resip transport
> > thread would be able to service multiple IO operations at the same time
> > through the kernel. I think this is similar to what Kennard mentioned as
> > a post-notified system and this would not be an easy change.
> > - Adding more threads where it makes sense (like one per transport or
> > ...) might not be good enough if those threads still use thread locking
> > to communicate between each other. I've done a bit of googling about
> > lock-free data structures and it is quite interesting. I might try it
> > one day to see how much faster it could get just between the stack and
> > the DUM.
> > - It does also make sense to look into code profiling and ensuring that
> > the code is not "wasting cycles"
> >
> > Anyway, I think this is a great idea and I would be happy to help :)
> >
> > Regards,
> > Francis
> >
>
> _______________________________________________
> resiprocate-devel mailing list
> resiprocate-devel at resiprocate.org
> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20110131/95d2622e/attachment.htm>