A Scalable Network of XMPP Servers for Smart Home Applications

The work I am going to cover in this post is already 2 years old and was presented at the PerCol workshop 2013 in San Diego which I was chairing. I think it is a quite appealing work and deserves bigger attention. Honguk Woo et al. from Samsung R&D Korea elaborate on the problem how to build up a scalable network of XMPP servers as an infrastructure for IoT / Smart Home applications.

Woo, Honguk, Kim, Hongsoo, Kim, Kyusik, Kim, Dongkyoung: A Large Scale Presence Network for Pervasive Social Computing. In: Proc. of the IEEE Conference on Pervasive Computing and Communications (PERCOM Workshops), pp. 145–150, 2013.

Most big XMPP servers today offer a clustering option like e.g. OpenFire based on Hazelcast. But this is only half of the story as you do not want to end up with one global data center if you sell your XMPP-enabled IoT products world-wide. Thus, there will typically be regional data centers like US East or Europe West serving regional users. If you now want your users to have globally unique IDs in the form user123@xyz.com, all your XMPP servers need to share the xyz.com domain which is only possible if all join the same cluster. This type of cluster would not perform well as WAN communication between the servers in the cluster would slow down the whole thing.

The solution of the authors is to build up a hierarchically connected network of XMPP data centers which can be seen in the following drawing (adapted from the paper cited above):


There are two types of clusters: Regional and inter-regional. A regional cluster consists of a number of XMPP servers and a load balancer. A Session DB holds all (client, server process) pairs of the local cluster. Roster handling is also done locally in this cluster. The inter-regional cluster holds a database of (client, regional cluster) pairs.

They extended the S2S communciation of eJabberD to enable the hierarchical inter-domain routing. If a message for a client can not be handled locally by the XMPP server itself, it first looks up the client in the local Session DB. If it is not found there, than the global Session DB is invoked to route the request to the respective cluster. Thus, inter-regional device groups are possible which is shown in the figure.

The authors deployed the architecture above in two virtual data centers with 5 XMPP servers each in the public cloud. They did extensive load testing with the Tsung traffic generator. Evaluation results show that they are able to serve 60,000 clients per XMPP server with their hierarchical approach.

There is even more interesting things to find in this 6-page-paper:

  • They implemented a push notification service to enable energy-saving hibernate mode for mobiles (devices only activate the XMPP connection if a message is waiting for them).
  • They used a local XMPP proxy on the device to enable communication of several XMPP-based apps over one shared XMPP connection.
  • They used XMPP Jingle to connect client devices P2P.
  • They implemented interesting use cases such as real-time collaboration with Android phones and remote control of a vacuum cleaner robot which even sends a live video to the phone.

After reading the paper I only have to say: Please make products out of this. I will definitivetly buy such stuff if it is in the stores, especially the vacuum robot with remote control and video. A world of plug-and-play XMPP-controlled smart home products would be kind of a brilliant future…

The authors later wrote a follow-up paper in 2014 focusing on PubSub communication:

Hong, Rankyung, Shin, Sangho, Yoon, Young, Laxmankatole, Atul, Woo, Honguk: Global-Scale Event Dissemination on Mobile Social Channeling Platform. In: Mobile Cloud Computing, Services, and Engineering (MobileCloud), 2014 2nd IEEE International Conference on, pp. 210–219, IEEE 2014.

They did a deep analysis of the eJabberD PubSub implementation and state a few reasons why this will not work well with a distributed network of servers. They developed their own PubSub implementation which uses a ring topology and consistent hashing to distribute the PubSub processing load. An evaluation with 25 machines on the public cloud shows their approach to be superior to classic XMPP PubSub. EJabberD has its maximum performance with 6 machines in the cluster. If more are added, the overall notification rate will even drop due to the high synchronization traffic of the replicated databases for subscriptions, topics and events. The ring-based approach scales linearly with the number of machines.

Before reading the paper, when thinking about XMPP server clusters I was wondering how this will work with PubSub. Now the paper has an answer which is quite interesting to read and written in great detail. However, the actual implementation is not provided which somehow contradicts the free software paradigm.