Re: [reSIProcate] The Million User Dilemma
Hi Dan,
I realized our discussion mostly focused on the performance/scalability improvements that could be done in resip/dum but what about repro itself?
I must say I have never used it but going back to your original post you mention having a large network of them (kinda like a "cluster" I guess). Since it is a proxy, it might need to cache and replicate data like user registration contacts across its peers.
For large-scale data replication it might make sense to use a different data store like Basho Riak or CouchDB (I understand those are built for large-scale deployment - it's probably no coincidence that those two are built using Erlang ;) ).
Would you foresee the need to replicate more than that type of information? What about SIP-specific stuff like dialogs and transactions (stateful proxy, maybe)? That way, another proxy could takeover in the middle of a transaction... quite ambitious goal :).
The alternative could be to leverage SIP's "forgiveness". For example, if a transaction fails because of a proxy failure/..., then it would be up to the upstream UA to retry it.
It would be very interesting if resip/dum could have the data parts of its transactions and dialogs orthogonal to its operational parts (implementations, algorithms). For example, if we could seamlessly store the dialog information into a replicated data store so that multiple DUM instances can use them.
Cheers,
Francis
Hi there,
I personally believe that the reactor design is the most scalable
approach especially when combined with the likes of epoll. It can be
made concurrent very easily in the case of the stack inside resiprocate.
Simply use hash based concurrency at the transaction layer based on
transaction ids to distributed across threads. This involves no
necessary locks.
As for making DUM more parallel, I've been working with someone who is
an expert in Software Transactional Memory (author of TBoost.STM), to
help build data structures to support the needs of DUM and build an in
memory database. Along with hash based concurrency, this could greatly
improve the performance of DUM.
Likewise, let's not forget that the two largest costs of resiprocate
right now are still memory consumption with fragmentation, and
serializing messages to the streams. I believe we can greatly improve
the performance of those areas using something like boost::spirit::karma
with some enhanced data structures and a per sip message allocator.
Dan
On 01/29/2011 09:52 PM, Francis Joanis wrote:
> Hi guys,
>
> I've been using resip + dum for a while, but since I was more focusing
> on building UAs (and not proxies, ...) with it I've had no performance
> issue so far. In fact I found that it was actually performing better
> than some other SIP application layers, especially when handling
> multiple SIP events at the same time.
>
> The reason why it was performing better was because the DUM (application
> level) generally uses a single thread for all SIP events, rather than
> using one thread per event or event type (imagine what 1 thread per
> session would do... :( ).
>
> I see the current resip threading code as being like the reactor design
> pattern, where only a single thread is used to "select" then
> synchronously process events. From my experience, one main advantage of
> this approach is that the stack's general behaviour is "predictable"
> with regards to its performance and the flow of events (i.e. processing
> a single call VS processing 100+ incoming calls).
>
> However, one downside of the reactor is that it doesn't scale well on
> multicore CPUs since it only has a single thread. To really leverage
> multicore, programs need to become more and more concurrent (that is
> truly concurrent - i.e. without mutexes and locking) in order to get
> faster. This is probably nothing new for most of us, but it is something
> that I've been realizing practically more and more since I've been
> exposed to concurrent languages like Erlang.
>
> I think that investing time into making resip (at least the stack and
> DUM parts) multicore aware would be a great way to future-proof it.
>
> To add to what was already said in this thread:
>
> - It does make a lot of sense to leverage libevent or asio or ... to
> ensure best performance on all platforms. This is a long term goal but
> maybe we could start some prep work now (like decoupling stuff and
> laying down foundations). The alternative could be to try to implement
> the best select() substitute for each supported platforms, but we might
> then end up rewriting libevent ourselves.
> - Regarding the reactor design pattern, there is also the proactor one
> which uses (unless I'm mistaken) OS-level asynchronous IO (resip
> currently uses synchronous IO). The idea is that the resip transport
> thread would be able to service multiple IO operations at the same time
> through the kernel. I think this is similar to what Kennard mentioned as
> a post-notified system and this would not be an easy change.
> - Adding more threads where it makes sense (like one per transport or
> ...) might not be good enough if those threads still use thread locking
> to communicate between each other. I've done a bit of googling about
> lock-free data structures and it is quite interesting. I might try it
> one day to see how much faster it could get just between the stack and
> the DUM.
> - It does also make sense to look into code profiling and ensuring that
> the code is not "wasting cycles"
>
> Anyway, I think this is a great idea and I would be happy to help :)
>
> Regards,
> Francis
>