Technical Background

One of the main reasons why user privacy is affected by smartphone messenger apps is because of the close linkage between phone numbers and messenger identities [1, 2]. This was, presumably, a design decision, as it allows users to contact each other instantly, with no cumbersome setup or approval procedures. This convenience, however, requires users to upload their entire address book in order to determine which of their contacts are available via a specific messenger service. Certainly, these address book issues have been one of the main privacy concerns about messenger services in the past. Although these concerns are, without question, justified, they may not be the most critical ones. Another concern which has, until now, been widely neglected, involves the possibility of using a phone number to query information associated with a particular messenger account, without the user’s authorization and on a permanent basis. This is contrary to the specification of the Extensible Messaging and Presence Protocol (XMPP), an XML-based communication protocol frequently used within smartphone messenger apps, which requires the disclosure of "presence information (..) only to other entities that a user has approved (..) in order to protect the privacy of XMPP users".

In order, however, to manually exploit these design flaws to query information about a specific person, a user simply needs to launch the messenger app after adding the corresponding phone number to the phone address book. The app then automatically synchronizes contact details based on all available phone numbers. If any of the uploaded phone numbers is registered to use a particular messenger service, all publicly available data associated with those accounts can be explored via the app's user interface. This includes a user's name, status message, profile picture and more. This manual approach, however, does have some flaws, including a lack of scalability with regard to the number of observed users, and no continuity for monitoring users over a longer period of time. The latter, in particular, introduces a whole new raft of privacy implications, as the collected data provides insights into user's daily routines. This is demonstrated in the Data Summary and User Statistics sections.

Large-Scale User Monitoring

To monitor messenger users on a large scale and for a long period of time, automated approaches are required. Even though reverse engineering the specific messenger protocol and then repeatedly querying the backend services of those apps for available user data may seem straightforward, in practice, more generic and flexible approaches are needed. This is mainly because messenger apps are increasingly incorporating client-side security mechanisms to increase the effort required to reverse engineer them. These mechanisms include strong code obfuscation methods, challenge-response authentication, and calculation of message authentication codes based on obfuscated secret keys within an app. Using these techniques, messenger providers are attempting primarily to prevent the creation of third-party apps or alternative desktop clients. Re-implementing app functionalities for the sake of user monitoring would, however, also require a full understanding of the specific messenger protocol, which is costly in terms of time.

To facilitate large-scale monitoring of messenger users, therefore, we use an "in-app" monitoring approach that completely eliminates the need for any reverse engineering of the concrete messaging protocol. Instead, we retrofit existing messenger apps with monitoring capabilities and tap all relevant user data from within the app endpoint itself. As our approach is solely based on leveraging existing app functionality, monitoring is independent of changes to the underlying protocol. In a sense, we use the app itself as a "decoder" to the messaging protocol.

The actual monitoring is performed using messenger apps running on physical iOS devices (see Figure 1). For this, we developed a Monitoring Library which is injected during the app startup process and extends the messenger app with additional monitoring functionalities by runtime patching. All user data and system events, such as users' presence notifications, are monitored on an ongoing basis and delivered to our backend in real time.

Devices used for monitoring Figure 1: Monitoring Messenger Users from within Messenger Apps running on Physical Devices

Robustness and Practical Feasibility

To underline the advantage of this approach, we periodically analyzed the application binary of the WhatsApp messaging app for any changes to its core functionality. In doing so, we noticed that the application components responsible for managing communication between the app and the WhatsApp backend and the respective class and method descriptions for retrieving personal user data have remained mostly unchanged over the last two years. Although WhatsApp has undergone major upgrades and protocol changes in the past few years, particularly with regard to the authentication scheme and transport layer security mechanisms, app methods we initially identified for tracking user data have hitherto survived without any changes.

Moreover, we were able to run our monitoring solution against the WhatsApp services from July 2013 to April 2014 without any interruption. Although we monitored personal information of thousands of users for several months — and thus strongly deviated from normal user behaviour — our monitoring efforts were not inhibited in any way.

Although our approach enables monitoring on a permanent basis and provides better scalability than any manual solution so far, the quantity of observed targets is limited, as some messenger apps restrict the number of maximum possible contacts. For instance, in the early stages of our experiments, in June 2012, we were able to track more than 4,000 WhatsApp accounts simultaneously using a single monitoring instance. This number was reduced over time to about 250, which still reflects today's limit on concurrent sessions.


References

[1] Cheng, Yao, et al. "Bind your phone number with caution: automated user profiling through address book matching on smartphone." Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security. ACM, 2013.

[2] Schrittwieser, Sebastian, et al. "Guess who’s texting you? evaluating the security of smartphone messaging applications." Proceedings of the 19th annual symposium on network and distributed system security. 2012.