Some colleagues of mine are data analysis geeks creating fancy workflows with KNIME the whole day. For those who don’t know yet: KNIME is a powerful data mining framework where you have a graphical workflow editor. You can drag and drop ready-made analysis nodes and play around with certain parameters. KNIME nodes work on input tables and modify the content of a column or add one or more output columns.
Thus having a PDF directory with ALL the published XMPP-based research at hand, I tried out my first KNIME workflow and came up with counting the mentions of XEPs in the papers. It is based on regular expressions and will only grab mentions like “XEP-XXXX” which are often found in the references section. Clearly, this method generates a lot of false negatives but not a single false positive.
The result was written to a CSV and then sent to the Tagul tag cloud creator. The result looks like this:
As you can see, Multi-User Chat is the winner. It was used (and mentioned correctly) by 22 of our 250 publications. The follow-ups are Publish-Subscribe (20 mentions) and Serverless Messaging (13 mentions). Service Discovery, Jingle, Ad-Hoc Commands, BOSH, PEP, DoS Attacks and SI File Transfer are in the Top 10. A total of 67 extensions was mentioned in the papers while about half of them only once.
The result is not very surprising but the visualization is quite nice. It is often hard for non-experts to find their way through the jungle of more than 300 XEPs. A figure like this may help showing how some XEPs are more popular than others.