summary-response
Contact Tracing Mobile Apps for COVID-19: Privacy Considerations and Related Trade-offs
Hyunghoon Cho∗ Broad Institute of MIT and Harvard
Daphne Ippolito∗ University of Pennsylvania
Yun William Yu∗ University of Toronto
Abstract
Contact tracing is an essential tool for pub- lic health officials and local communities to fight the spread of novel diseases, such as for the COVID-19 pandemic. The Singaporean government just released a mobile phone app, TraceTogether, that is designed to assist health officials in tracking down exposures after an in- fected individual is identified. However, there are important privacy implications of the exis- tence of such tracking apps. Here, we analyze some of those implications and discuss ways of ameliorating the privacy concerns without decreasing usefulness to public health. We hope in writing this document to ensure that privacy is a central feature of conversations surrounding mobile contact tracing apps and to encourage community efforts to develop alter- native effective solutions with stronger privacy protection for the users. Importantly, though we discuss potential modifications, this docu- ment is not meant as a formal research paper, but instead is a response to some of the privacy characteristics of direct contact tracing apps like TraceTogether and an early-stage Request for Comments to the community.
Date written: 2020-03-24
Minor correction: 2020-03-30
1 Introduction
The COVID-19 pandemic has spread like wildfire across the globe [1]. Very few countries have man- aged to keep it well-controlled, but one of the key tools that several such countries use is contact trac- ing [2]. More specifically, whenever an individual is diagnosed with the coronavirus, every person who had possibly been near that infected individual during the period in which they were contagious is contacted and told to self-quarantine for two weeks [3]. In the early days of the virus, when
∗Authors listed alphabetically.
there were only a few cases, contact tracing could be done manually. With hundreds to thousands of cases surfacing in some cities, contact tracing has become much more difficult [4].
Countries have been employing a variety of means to enable contact tracing. In Israel, legisla- tion was passed to allow the government to track the mobile-phone data of people with suspected infection [5]. In South Korea, the government has maintained a public database of known patients, including information about their age, gender, oc- cupation, and travel routes [6]. In Taiwan, medical institutions were given access to patients travel his- tories [7], and authorities track phone location data for anyone under quarantine [8]. And on March 20, 2020, Singapore released an app that tracks via Bluetooth when two app users have been in close proximity: when a person reports they have been diagnosed with COVID-19, the app allows the Ministry of Health to determine anyone logged to be near them; a human contact tracer can then call those contacts and determine appropriate follow-up actions.
Solutions that have worked for some countries may not work well in other countries with differ- ent societal norms. We believe that in the United States, in particular, the aforementioned measures are unlikely to be widely adopted. On the legal side, publicly revealing patients’ protected health infor- mation (PHI) is a violation of the federal HIPAA Privacy Rule [9], and the Fourth Amendment bars the government from requesting phone data with- out cause [10]. Some of these norms may be sus- pended during times of crisis—HIPAA has recently been relaxed via enforcement discretion during the crisis to allow for telemedicine [11], and a pub- lic health emergency could well be argued to be a valid cause [12]. However, many Americans are wary of sharing location and/or contact data with tech companies or the government, and any privacy
ar X
iv :2
00 3.
11 51
1v 2
[ cs
.C R
] 3
0 M
ar 2
02 0
concerns could slow adoption of the system [13]. Singapore’s approach of an app, which gives in-
dividuals more control over the process, is perhaps the most promising solution for the United States. However, while Singapore’s TraceTogether app protects the privacy of users from each other, it has serious privacy concerns with respect to the gov- ernment’s access to the data. In this document, we discuss these privacy issues in more detail and intro- duce approaches for building a contact tracing ap- plication with enhanced privacy guarantees, as well as strategies for encouraging rapid and widespread adoption of this system. We do not make explicit recommendations about how one should build a privacy-preserving contact tracing app, as any de- sign implementation should first be carefully vetted by security, privacy, legal, ethics, and public health experts. However, we hope to show that there exist options for preserving several different notions of user privacy while still fully serving public health aims through contact tracing apps.
2 Singapore’s TraceTogether App
On March 20, 2020, the Singaporean Ministry of Health released the TraceTogether app for Android and iOS [14]. It operates by exchanging tokens between nearby phones via a Bluetooth connec- tion. The tokens are also sent to a central server. These tokens are time-varying random strings, as- sociated with an individual for some amount of time before they are refreshed. Should an indi- vidual be diagnosed with COVID-19, the health officials will ask* them to release their data on the app, which includes a list of all the tokens the app has received from nearby phones. Because the gov- ernment keeps a database linking tokens to phone numbers and identities, it can resolve this list of tokens to the users who may have been exposed.
By using time-varying tokens, the app does keep the users private from each other. A user has no way of knowing who the tokens stored in their app belong to, except by linking them to the time the token was received. However, the app provides little to no privacy for infected individuals; after an infected individual is compelled to release their data, the Singaporean government can build a list of all the other people they have been in contact with. We will formalize these several notions of privacy in Section 3.
*While the health officials ask, it is a crime in Singa- pore not to assist the Ministry of Health in mapping one’s movements, so ‘ask’ is a bit of a misnomer [15].
3 Desirable Notions of Privacy
Here, we discuss three notions of privacy that are relevant to our analysis of contact-tracing systems: (1) privacy from snoopers, (2) privacy from con- tacts, and (3) privacy from the authorities. Note that in this document, we do not rigorously define what it means for information to be private, as this is a topic better left for future works; some popular definitions include information theoretic privacy [16], k-anonymity [17], and differential privacy [18]. Furthermore, we discuss only these three notions of privacy to illustrate some of the short- comings of direct contact-tracing systems. Other recent work has presented a useful taxonomy of the risks and challenges of contact tracing apps [19].
For any contact tracing app that achieves the aim of telling individuals that they might have been exposed to the virus, there is clearly some amount of information that has to be revealed. Even if the only information provided is a binary yes/no to exposure, a simple linkage attack [20] can be performed: if the individual was only near to one person in the last two weeks, then there will be an obvious inference about the infection status of that person. The goal is of course to reduce the amount of information that can be inferred by each of the three parties (snoopers, contacts, and the authorities) while still achieving the public health goal of informing people of potential exposures to help slow the spread of the disease.
Of note, here we use a semi-honest model for privacy [21], where we do not consider the pos- sibility of malicious actors polluting the database or sending malformed queries, but rather instead just analyze the privacy loss from the information revealed to each party. A nefarious actor could, for example, falsely claim to be infected to spread panic; this is not a privacy violation, though we do consider this further in the Discussion. Alternately, when a server exposes a public API, queries can be crafted to reveal more information than intended by the system design, which is indeed a privacy violation. We leave a more thorough analysis of safeguards for the malicious model to future work.
3.1 Privacy from Snoopers
Consider the most naı̈ve system for contact trac- ing, which no reasonable privacy-conscious society would ever use, where the app simply broadcasts the name and phone number of the phone’s owner, and nearby phones log this information. Then,
2
upon diagnosis of COVID-19, the government pub- lishes a public list of those infected, which the app then checks against its list of known recent contacts. This is clearly problematic as a nefarious passive actor (a ‘snooper’) could track the identities of peo- ple walking past them on the street.
A slightly more reasonable system would as- sign a unique user-ID to each individual, which is instead broadcast out. This does not have quite as many immediate security implications, though all it would take is a nefarious actor linking each ID to a user before one runs into the same prob- lem, which is known as a ‘linkage attack.’ Given how easy and common linkage attacks are, this ap- proach also provides insufficient levels of privacy for users [22; 23].
The Singaporean app TraceTogether does better, in that it instead broadcasts random time-varying tokens as temporary IDs. Because these tokens are random and change over time, someone scan- ning the tokens while walking down the street will not be able to track specific users across different time points, as their tokens are constantly refreshed. Note that the length of time before refreshing a to- ken is an important parameter of the system (too infrequent and users can still be tracked, too fre- quent and the amount of tokens that need to be stored by the server could be huge), but with a rea- sonable refresh rate, the users are largely protected against attacks by snoopers in public spaces.
3.2 Privacy from Contacts
Here, the term contact is defined as any individ- ual with whom a user has exchanged tokens in the contact tracing app based on some notion of phys- ical proximity. Privacy from contacts is harder to achieve, because the information that needs to be passed along is whether one of the individual’s con- tacts has been diagnosed with COVID-19, so some information has to be revealed.
The TraceTogether app gives privacy from con- tacts by instead putting trust in government authori- ties. When TraceTogether alerts a contact that they have been exposed to COVID-19, the information comes directly from the Singaporean Ministry of Health, and no additional information is shared (to our knowledge) that could identify the individual that was diagnosed. Thus, TraceTogether does pro- tect users’ privacy from each other, except for what can be inferred based on the user’s full list of con- tacts, as the only information that is revealed to
the user is a binary exposure indicator, which is ar- guably the minimum possible information release for the system to be useful.
3.3 Privacy from the Authorities
Protecting the privacy of the users from the au- thorities, i.e. whoever is administering the app, whether that is a government agency or a large tech company, is also a challenging task. Clearly, in the absence of a fully decentralized peer-to-peer system, any information sharing among phones with the app installed will have to be mediated by some coordinating servers. Without any protective measures (e.g. based on cryptography), the coordi- nating servers are given an inordinate amount of knowledge.
TraceTogether does not privilege this type of pri- vacy, instead making use of relatively high trust in the government in its design. While it does not deliberately gather more information than neces- sary to build a contact map—for example, it does not use GPS location information, as Bluetooth is sufficient for finding contacts—it also does not try to hide anything from the Singaporean govern- ment. When a user is diagnosed with COVID-19 and gives their list of tokens to the Ministry of Health, the government can retrieve the mobile numbers of all individuals that user has been in contact with. Thus, neither the diagnosed user, nor the exposed contacts, have any privacy from the government.
Furthermore, because the government maintains a database linking together time-varying tokens with mobile numbers, they can also, in theory, track people’s activities without GPS simply by placing Bluetooth receivers in public places. There is no reason to disbelieve the TraceTogether team when they state that they do not attempt to track people’s movements directly; however, the data they have could be employed to do so. Citizens of countries such as the U.S. trust authorities much less than Singaporeans [24], so the privacy trade-offs that Singaporeans are willing to make may not be the same ones that Americans will accept.
4 Privacy-Enhancing Augmentations to the TraceTogether System
Here, we discuss potential approaches to build upon the TraceTogether model to obtain a con- tact tracing system with differing privacy char- acteristics for the users. Though important and
3
Table 1: Comparison of contact tracing systems discussed in this document with respect to privacy of the users in the semi-honest model and required computational infrastructure.
Privacy from snoopers
Privacy from contacts Privacy from authorities Infrastructure requirementsExposed
user Diagnosed user Exposed user Diagnosed user
Trace To- gether [14]
Yes Yes Yes No. Exposure status and all tokens revealed.
No. Infection status, all tokens, and all contact tokens revealed.
Minimal
Polling- based* (§4.1)
Yes Yes Yes† Partial. Susceptible to linkage attacks.
Partial. Susceptible to linkage attacks.
Low. Single server.
Polling- based with mixing (§4.3)
Yes Yes Yes†
Almost private. Protects against linkage attacks by mixing tokens from different users.
Almost private. Protects against linkage attacks by mixing tokens from different users.
Medium. Multiple servers for mixing.
Public database (§4.4)
Yes Yes
Partial. Info leaked at time of token exchange.
Yes Partial. Susceptible to linkage attacks.
Communica- tion cost to phones is high.
Private messaging system (§5)
Yes Yes
Partial. Info leaked at time of token exchange. ‡
Yes Yes
High. Multiple servers performing crypto.
* Augmenting with random tokens does not improve privacy. † However, if contacts are malicious, and they send malformed queries (e.g. a query that includes only a single token),
the diagnosed individual only has the same privacy level as in the public database solution. Namely, there’s only partial privacy because information is leaked through knowing the time of token exchange. ‡ This information leakage might be fixable using data aggregation based on multi-key homomorphic encryption, but we
do not do so here.
highly nontrivial, various technical and engineer- ing challenges behind the exchange of Bluetooth tokens [25] are outside the scope of this document. Our abstraction is that there exists some mecha- nism for nearby phones to exchange short tokens if the devices come within 6 feet of each other—the estimated radius within which viral transmission is a considerable risk [26]. We are primarily con- cerned with the construction of those tokens, and how those tokens can be used to perform contact tracing in a privacy-preserving manner.
First, we formally describe the TraceTogether system. Let Alice and Bob be users of the app, and let Grace be the government server (or other cen- tral authority). Alice generates a series of random tokens A = {a0, a1, . . .}, one for each time inter- val, and Bob generates a similar series of tokens B = {b0, b1, . . .}, all drawn randomly from some space {0, 1}N . They also both report their list of tokens A and B, as well as their phone numbers to Grace. At a time t, Alice and Bob encounter each other, exchanging at and bt. Alice and Bob keep lists of contact tokens  = {â0, â1, . . .} and B̂ = {b̂0, b̂1, . . .} respectively. These consist of tokens from every person they were exposed to;
i.e. bt ∈ Â and at ∈ B̂ because Alice and Bob exchanged tokens at time t. Five days later, Bob is diagnosed with COVID-19, and sends his list of contact tokens B̂, which includes at, to Grace. Grace then matches each b̂i to a phone number, reaches out to those individuals, including Alice, and advises them to quarantine themselves because they may have been exposed to the virus.
4.1 Partially Anonymizing via Polling
Instead of having Grace reach out to Alice when Bob reports that he has been diagnosed, a more privacy-conscious alternative is for Alice to “poll” Grace on a regular basis. In this setting, Grace maintains the full database, and Alice asks Grace if she has been exposed. This alternative does not require Alice and Bob to send their phone numbers to Grace. In this setting, there are two reporting choices for when Bob wishes to declare his diag- nosis of COVID-19. Bob can send his own tokens B to Grace, or he can send the contact tokens B̂ to Grace. In the former case, Alice needs to send Grace her contact tokens  to see if any have been diagnosed with COVID-19. In the latter case, Alice needs to send Grace her own tokens A to ask if any
4
of them have been published. Either way, Grace is able to inform Alice that she has been exposed, without revealing Bob’s identity. This presupposes that Alice is Honest but Curious (semi-honest); if Alice is malicious and crafts a malformed query containing only the token she exchanged with Bob, she may be able to reveal Bob’s identity.
Note that in either version of this system, indi- viduals still have privacy from snoopers and from contacts. However, they additionally gain some amount of privacy from authority, as Grace does not have their mobile numbers. Of course, Grace does have some ability to perform linkage attacks. If Bob publishes to Grace his own tokens B upon being diagnosed, and Alice queries Grace with all her contact tokens Â, then Grace can attempt to link those sets of tokens to individuals or geographic areas; further, Grace can also monitor the source of Alice and Bob’s queries (i.e. IP addresses of phones). For example, if Grace has Bluetooth sen- sors set up in public places, she can then trace Alice and Bob’s geographic movements. That kind of location trace is often sufficient to deanonymize personal identities [23]. Alternatively, the same is true if Bob publishes his contact tokens to Grace and Alice queries Grace with her own tokens. Thus, there is not perfect privacy from the authorities, but still better than in the original TraceTogether sys- tem, at the cost of potentially lower privacy for Bob in the malicious model.
4.2 Ineffectiveness of Adding Spurious Tokens for Further Anonymization
To further anonymize the polling-based system to increase privacy from authorities, there are a number of techniques that can be used to hide Al- ice and Bob’s identities. Let’s begin with a sim- ple approach—that doesn’t actually work—to give some intuition before moving on to more effective approaches. Consider injecting random noise by augmenting the data with artificial tokens. When- ever Alice and Bob send information to Grace (ei- ther in the form of a diagnosis report or a query), they can augment their tokens with random ones. Note that some care has to be taken in deciding which distribution to draw the random tokens from. Not only should the system keep the probability of spurious matches low, but the distributions should also be designed to make inferences by Grace diffi- cult.
For example, assume that Alice and Bob sam-
ple their tokens uniformly at random from {0, 1}N , where N is chosen to be sufficiently large that ac- cidental collisions between individuals’ tokens are unlikely. Suppose Bob sends to Grace his own to- kens B upon being diagnosed, and Alice queries Grace with all her contact tokens Â. In theory, Bob could augment his own tokens with a set of n random tokens {ri}ni=1 drawn uniformly from {0, 1}N , and send those to Grace as well. Un- fortunately, N was chosen to prevent accidental collisions; this means that the probability that the additional random tokens correspond to the tokens broadcast by any individual is vanishing small. But then, there is actually little to no privacy gained. Grace can just assume that the augmented set of tokens correspond to Bob, and perform the same linkage analysis that she would with only the cor- rect set of tokens. This does nothing but pollute Grace’s database with extra data, without affording any real privacy gains for Bob. Similarly, Alice also cannot obfuscate her exposure through Bob from Grace, because any extra tokens she sends to Grace will not change the fact that she has Bob’s token as one of her contacts.
The root of the problem is that Grace has access to the universe of all tokens through user queries, and so can simply filter out all of the random tokens generated. Thus, random noise is ineffective for hiding information from Grace.
4.3 Enhancing Anonymity by Mixing Different Users’ Tokens
Although introducing spurious random tokens into the system achieves little in terms of privacy, as discussed in the previous subsection, a slight mod- ification of this idea leads to meaningful privacy guarantees. The issue is that Grace has access to the entire universe of tokens, as well as both of the sets of tokens corresponding to Alice and Bob, possibly augmented with random noise. Instead of hiding true tokens with random noise, suppose the system includes a set of M honest-but-curious non-colluding “mixing” servers not controlled by Grace that aggregate data before forwarding it on to Grace.
When Bob is diagnosed with COVID-19, he par- titions the tokens he wishes to send (depending on the setup of the system, either his own tokens, or those of his contacts) into M groups, and sends each group to one of the mixing servers. The mix- ing servers then combine Bob’s data with that of
5
other users diagnosed with COVID-19 before for- warding it onto Grace. Similarly, Alice does the same thing for querying, except she also needs to wait on a response from the mixing server for each of the tokens she sends. The linkage problem then becomes much more difficult for Grace, because the valid tokens for individuals have been split up. Similarly, each mixing server only has access to a subset of the tokens corresponding to each indi- vidual, making the linkage analysis more difficult for them. Of course, if the mixing servers collude, then the privacy reduces to that of the standard polling-based approach.
Note that this approach can also be simulated without the mixing servers by either Alice or Bob if they have access to a large number of distinct IP addresses. They can simply send their queries and tokens with some time delay from the different IP addresses, preventing Grace from linking all of them together. However, this approach may not be feasible for most users.
4.4 Public Database of Infected Users’ Tokens is Efficient but Less Private
Alternatively, Grace can simply publish the entire database of tokens she receives from infected in- dividuals, including the ones from Bob. If Alice simply downloads the entire database, and locally queries against it, then no information about Al- ice’s identity is leaked to Grace.
This approach may seem less computationally feasible, especially on mobile devices. In circum- stances where the total number of people infected is not very high, this approach works, as evidenced by the South Korean model [6], though the approach may fail as the epidemic reaches a peak. However, the computational and transmission cost can be partially ameliorated by batching together Grace’s database, so that Alice is not downloading the en- tire thing. For example, in the version where Bob sends his own tokens B to Grace, Alice can down- load batches corresponding to her contact tokens Â. If each batch has e.g. 50 tokens, then Grace does not know which of those 50 tokens Alice came into contact with.
Unfortunately, it is worth noting that this ap- proach decreases Bob’s privacy from Alice, be- cause Alice knows when she encountered the token Bob sent; she can then limit the number of possible individuals who could have sent the token based on who she was in contact with during the time she en-
countered Bob’s token. If the token she exchanged with Bob is present in the database, she gets a hint as to the disease status of one of the individuals she was in contact with during the token exchange.
5 Privacy from Authorities based on Private Messaging Systems
None of the easy-to-implement augmentation ideas given in Section 4 guarantee full privacy from the authorities. At a cost of more computation, how- ever, we believe that a solution for secure contact tracing can be built using modern cryptographic protocols. In particular, private messaging systems [27; 28; 29] and private set intersection (cardinal- ity) [30; 31; 32; 33] protocols seem especially rel- evant. The sketch we provide below is based on private messaging systems, though we do not claim this to be an optimal implementation.
We will give the intuition here before going into technical details necessary for an effective imple- mentation. First, we replace the random tokens (at, bt) exchanged by Alice and Bob with random public keys (pkAt , pk
B t ) from asymmetric encryp-
tion schemes [34]. The matching secret keys are stored locally on each of Alice’s and Bob’s phones. Then, imagine that Grace has established a collec- tion of mailboxes, one for each public key that Al- ice and Bob exchange. Additionally, we introduce Frank and Fred. Frank forwards messages to/from Fred. Fred forwards messages to/from Grace. They do not tell each other the source of the messages. At fixed time points after Bob’s contact with Al- ice (up to some number of days), Bob addresses a message to Alice encrypted using the public key Alice gave Bob. Bob gives the message to Frank, who then forwards it on to Grace (through Fred), who puts it in Alice’s mailbox. The content of the message is Bob’s current infection status, and the reason he sends messages at fixed time points is to prevent Frank from figuring out Bob’s infection status from the fact that he is sending messages. Alice checks all of the mailboxes corresponding to her last several days worth of broadcasted public keys. In one of the mailboxes, she then receives and decrypts Bob’s message, and learns whether she has been exposed to the virus. Grace cannot de- crypt the message Bob sends to Alice because it is protected by asymmetric encryption. Furthermore, to protect Alice’s privacy, she can also access her mailboxes through Frank and Fred, who deliver the messages in Alice’s mailboxes to her without
6
Alice
Bob
At Contact
Alice and Bob exchange public keys
Periodically After Contact
Bluetooth
Alice
Bob
Bob sends encrypted infection status to Alice’s mailbox
Proxy servers obfuscate mailbox access patterns
Grace maintains mailboxes, but cannot tell Bob sent a message to Alice
GraceProxy Servers (Frank and Fred)
“I am (not) infected.”
Alice retrieves and decrypts messages in mailbox
Server 1
Server 2
Server
Figure 1: Overview of contact tracing based on private messaging systems. When Alice and Bob are near each other they exchange public keys as tokens. They then periodically encrypt (using each other’s public key, followed by the public keys of the proxy servers) a message indicating their infection status, and send it to the proxy server. They also periodically query the proxy server for messages posted to the mailboxes corresponding to their public keys to find out whether they have been exposed to the virus.
revealing which mailboxes she owns. Contact tracing can be viewed as a problem of
secure communication between pairs of users who came into contact in the physical world. The com- munication patterns of who is sending messages to whom can reveal each individuals contact his- tory to the service provider (Grace). This notion is known as metadata privacy leakage in computer security [35], where the metadata associated with a message (e.g. sender/recipient and time) is con- sidered sensitive, in addition to the actual mes- sage contents. In the contact tracing case, such metadata could reveal who has been in contact with whom, potentially revealing the users’ sen- sitive activities. We believe that recent technical advances [36; 27; 29] for designing scalable private messaging systems with metadata privacy present a promising path for developing a similar platform for secure contact tracing.
Following recent works, our idea is to leverage a ‘mix network [37], which is a routing protocol that uses a chain of proxy servers (Frank/Fred) that individually shuffle the incoming messages before passing them onto the next server, thereby decoupling the sender of each message from its destination—these types of mix networks are per- haps most well-known for being the basis of the Onion Router/Tor anonymity network [38]. This is
a more sophisticated use of mixing servers than de- scribed in Section 4.3 for the polling based solution. When Bob wishes to send his encrypted message to Alice, he first encrypts it multiple times with public keys corresponding to each of the servers in the mix network. Because the messages are en- crypted in multiple layers, and each server peels only the outermost layer, the final destination (Al- ice’s mailbox) is revealed only to the last server, and only Alice can read the content of the mes- sage (i.e. infection status). To prevent Grace from learning the identity associated with each mailbox, Alice can also access her mailboxes through the mix network, which shuffles the traffic to decouple the mailboxes from their owners. As long as one of the servers is neither breached nor controlled by the adversary, the final message cannot be linked to a specific sender even if the adversary has full control of the rest of the network. Such a system for private communication could allow the users (Bob) to share their infection status with their re- cent contacts (Alice) while hiding the metadata of their contact patterns from the service providers. The involvement of non-government entities, such as an academic institution or a hospital, in the mix network may help increase users trust in the system and lower the bar for adoption.
There are several remaining issues that will
7
need to be addressed for this system to be widely adopted. First, if time-varying IDs are used, then the user receiving a token from a nearby person could infer the identity of the sender based on their travel history; i.e. Alice might be able to infer who Bob is based on the time they exchanged the tokens, as described in Section 4.4 in the case where the database is made public. This loss of privacy from contacts can be partially alleviated by choosing a less frequent token refresh, so that with high like- lihood, Alice cannot completely identify Bob by the time interval. Actual implementations much de- cide on the right tradeoffs between Alice and Bob’s privacy from eachother and authorities, as well as contact tracing effectiveness. Another possible way to mitigate this problem would be to aggregate the messages for Alice on the server before making the results available to her. The messages are en- crypted under different public keys, but it may be possible to use multi-key homomorphic encryption schemes [39; 40] which allow computation over ciphertexts encrypted with different public keys to sum up the count of ‘infected’ messages. We defer the details of approach to future work.
One other issue is that the volume of messages delivered to each user may reveal how socially ac- tive each user has been, which could be considered sensitive by some users. Approaches to flatten the distribution with dummy messages could allevi- ate this concern. Flattening the distribution with dummy messages may however lead to scalability challenges for existing private messaging systems. Though many techniques [36; 27; 29] have been proposed to address this challenge, further discus- sion among the stakeholders is needed to determine the suitable trade-off between the level of latency that can be tolerated and the level of privacy guar- antees desired by the users. Ultimately, though, private messaging systems enable provable privacy from the authorities while still maintaining the use- fulness of contact tracing.
6 Strategies for Encouraging Widespread Adoption
Contact tracing apps depend on the network effect and critical mass to work. Having the app go ‘vi- ral’ requires that people trust the app enough to install it and are enthusiastic enough to convince their friends to do the same. After all, app adop- tion must have a higher ‘transmission rate’ than the virus itself in order for it to be effective. Providing
strong privacy guarantees would likely encourage voluntary adoption. Any app needs to clearly ex- plain privacy guarantees in ways understandable by the average user, which was our motivation in describing here the different types of privacy (from snoopers, contacts, and the authorities) that the app should be able to provide to users in order to earn their trust.
On that note, we believe it is imperative for any app to be open source and audited by both secu- rity professionals and privacy advocates. This is not yet true for TraceTogether, but the app’s cre- ators do claim that they will release the source code soon [41]. Furthermore, open sourcing allows dif- ferent countries to customize such apps for their particular use cases and cultural preferences.
Also, while in some countries it may be difficult to enforce a government mandate that all residents install an app, it is possible to have this as a require- ment for entering certain public places. Such a prac- tice has precedence in so-called implied consent laws, such as agreeing to field sobriety tests when getting a driver’s license [42]. One could imagine grocery stores, schools, and universities requiring installing a contact tracing app as a precondition for entrance. This does not stop users from unin- stalling or turning off the app off-premises, but it would at least be useful in getting people over the initial activation barrier of installation.
Finally, some amount of social pressure may also assist in reaching widespread adoption. Contact tracing apps, by design, know how many other people close by have the app installed. An app could display that number. Given this knowledge, a user may be incentivized to attempt to persuade others nearby to install the app, in the interest of public health.
7 Discussion
In this document, we discuss ways to build an app for contact tracing, based upon the premise that phones can broadcast tokens to all nearby phones. Notably, we do not address the engineering behind applying Bluetooth to enable such a feature. Nor do we address the possibility of location data col- lection for assisting epidemiologists in forecasting disease spread [43]. We also do not discuss ap- propriate selection of token refresh interval and frequency at which phones should poll for nearby ones, which are important factors for balancing privacy and efficiency—stale IDs have been seen
8
to permit linkage attacks in other similar contexts [44]. Lastly, we also do not build a full model for privacy of contact tracing, which is a delicate and easy-to-get-wrong task that requires much more careful research. Instead, we focus only on the privacy implications of a dedicated contact tracing app, in the hopes that providing sufficiently strong privacy guarantees would assist an app in gaining the critical mass needed to be effective.
Note that here we only discuss direct contact trac- ing using Bluetooth proximity networks, without using any location data. Some indirect proposals for contact tracing instead simply securely log the user’s location history, which is then given to the au- thorities if a user is diagnosed with COVID-19 [45]. This approach has the benefit of not requiring net- work effects, because single individuals can track their locations without needing their contacts to have the app. The approach of logging location history is inherently less private than direct con- tact tracing, but that may possibly be resolved with appropriate safeguards and redactions [45]. Fur- thermore, hybrid approaches involving both GPS data and Bluetooth proximity networks may prove to be useful to public health officials in modelling disease spread beyond just contact tracing [46].
We first discussed how, with just minor mod- ifications, a polling-based direct contact tracing solution allows for some anonymity from authori- ties, which is lacking in the Singaporean Ministry of Health app TraceTogether. We believe that this may help an app succeed in countries such as the U.S., where many citizens are loath to give too much data to the government.
Even the polling-based solution still reveals quite a bit of information to the authorities, who could make use of linkage analysis to track individual users. However, utilizing additional mixing servers is relatively practical and does provide additional protection. Alternately, a system can follow the South Korean model of openly publishing data about patients diagnosed with COVID-19, trading off some of their privacy to enhance the privacy of individuals who are trying to determine if they have been exposed.
However, if we are willing to invest in additional computational resources, it is possible to achieve increased privacy from snoopers, contacts, and the authorities, and we propose the beginnings of one approach using private messaging systems, which we hope will be further expanded upon in future
works. This is more computationally expensive, but would assure users that they do not have to give up their privacy in order to take part in public con- tact tracing efforts. Indeed, the chief selling point would be that they would get additional informa- tion on their exposure without needing to trust any individual third party with their private location or medical information. We believe that such a guar- antee would go a long way towards mass adoption of a contact tracing app in the United States.
Future work remains to actually build such an app, of course, and additional engineering, security, and policy considerations are sure to arise. For ex- ample, scalability of the data structures used in the servers may become a major issue when the num- ber of infected individuals rises. One additional concern which we have not addressed is that of nefarious actors seeking to spread panic by falsely claiming to be infected. This could be prevented by allowing only hospital workers to trigger the broad- cast of infection status, as in Singapore’s system, where the Ministry of Health directly contacts those exposed, though that of course trades away some of the privacy of diagnosed patients. Alternately, others have proposed cryptographic verification of contact events, which could perhaps be extended to infection event broadcast without giving direct access of tokens to the authorities [47]. However, given that some cities are already rationing testing kits and doctors’ visits to only the most serious cases [48; 49], restricting self-reporting might re- sult in many instances of virus spread to be missed. Alternately, the system can also be designed to sep- arate self-reports from confirmed reports by simply keeping two databases.
Our goal in writing this document is to start a conversation on (1) what kinds of privacy trade-offs people are willing to endure for the sake of public health, and (2) the fact that with sufficient computa- tional resources and use of cryptographic protocols, app-based contact tracing can be accomplished without completely sacrificing privacy. Because bad early design choices can persist long after roll- out, we hope that developers and policy-makers will give privacy considerations careful thought when designing new contact tracing apps.
Acknowledgment
We would like to thank David Rolnick, Adam Seal- fon, Noah Daniels, and Michael Wirth for helpful comments.
9
References [1] “Novel Coronavirus Map from HealthMap,”
March 2020. [Online]. Available: https: //www.healthmap.org/covid-19/
[2] K. T. Eames and M. J. Keeling, “Contact tracing and disease control,” Proceedings of the Royal Society of London. Series B: Biological Sciences, vol. 270, no. 1533, pp. 2565–2571, 2003.
[3] D. Normile, “Coronavirus cases have dropped sharply in South Korea. Whats the secret to its success?” https://www.sciencemag.org/news/2020/ 03/coronavirus-cases-have-dropped-sharply-south- korea-whats-secret-its-success, 2020, accessed: 2020-03-23.
[4] B. Chappell, “Coronavirus: Sacramento County Gives Up On Automatic 14-Day Quarantines,” https://www.npr.org/sections/health-shots/2020/ 03/10/813990993/coronavirus-sacramento-county- gives-up-on-automatic-14-day-quarantines, 2020, accessed: 2020-03-23.
[5] J. Tidy, “Coronavirus: Israel enables emergency spy powers,” BBC News, March 2020. [Online]. Available: https://www.bbc.com/news/technology- 51930681
[6] M. J. Kim and S. Denyer, “A travel log of the times in South Korea: Mapping the movements of coronavirus carriers ,” The Washington Post, March 2020. [Online]. Available: https://www.washingtonpost.com/ world/asia pacific/coronavirus-south-korea- tracking-apps/2020/03/13/2bed568e-5fac-11ea- ac50-18701e14e06d story.html
[7] C. J. Wang, C. Y. Ng, and R. H. Brook, “Response to COVID-19 in Taiwan: Big Data Analytics, New Technology, and Proactive Testing,” JAMA, 2020.
[8] Y. Lee, “Taiwan’s new ’electronic fence’ for quarantines leads wave of virus mon- itoring,” March 2020. [Online]. Avail- able: https://www.reuters.com/article/us-health- coronavirus-taiwan-surveillanc-idUSKBN2170SK
[9] “HIPAA Privacy Rule,” December 2000. [On- line]. Available: https://www.hhs.gov/hipaa/for- professionals/privacy/index.html
[10] C. J. Roberts, “Carpenter v. United States,” Supreme Court of the United States, no. 16-402, 2018. [Online]. Avail- able: https://www.supremecourt.gov/opinions/ 17pdf/16-402 h315.pdf
[11] “Notification of Enforcement Discretion for telehealth remote communications during the COVID-19 nationwide public health emer- gency,” March 2020. [Online]. Available: https: //www.hhs.gov/hipaa/for-professionals/special- topics/emergency-preparedness/notification- enforcement-discretion-telehealth/index.html
[12] A. J. Jacobs, “Is state power to protect health com- patible with substantive due process rights,” Annals Health L., vol. 20, p. 113, 2011.
[13] R. Prez-Pea, “Virus Hits Europe Harder Than China. Is That the Price of an Open Society? ,” New York Times, March 2020. [Online]. Avail- able: https://www.nytimes.com/2020/03/19/world/ europe/europe-china-coronavirus.html
[14] “Help speed up contact tracing with TraceTo- gether,” Singapore Government Blog, March 2020. [Online]. Available: https://www.gov.sg/article/ help-speed-up-contact-tracing-with-tracetogether
[15] T. TraceTogether, “Can I say no to uploading my TraceTogether data when contacted by the Ministry of Health?” https://tracetogether.zendesk.com/ hc/en-sg/articles/360044860414-Can-I-say-no- to-uploading-my-TraceTogether-data-when- contacted-by-the-Ministry-of-Health-, 2020, accessed: 2020-03-23.
[16] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949.
[17] L. Sweeney, “k-anonymity: A model for protect- ing privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002.
[18] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of cryptography conference. Springer, 2006, pp. 265–284.
[19] R. Raskar, I. Schunemann, R. Barbar, K. Vil- cans, J. Gray, P. Vepakomma, S. Kapa, A. Nuzzo, R. Gupta, A. Berke et al., “Apps gone rogue: Main- taining personal privacy in an epidemic,” arXiv preprint arXiv:2003.08567, 2020.
[20] C. Dwork, A. Roth et al., “The algorithmic foun- dations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
[21] O. Goldreich, S. Micali, and A. Wigderson, “How to solve any protocol problem,” in Proc. of STOC, 1987.
[22] M. M. Merener, “Theoretical results on de- anonymization via linkage attacks,” Transactions on Data Privacy, vol. 5, no. 2, pp. 377–402, 2012.
[23] M. Srivatsa and M. Hicks, “Deanonymizing mobil- ity traces: Using social network as a side-channel,” in Proceedings of the 2012 ACM conference on Com- puter and communications security, 2012, pp. 628– 637.
[24] E. T. Barometer, “January 20, 2019,” 2019. [On- line]. Available: https://www.edelman.com/sites/g/ files/aatuss191/files/2019-02/2019 Edelman Trust Barometer Global Report 2.pdf
10
[25] T. TraceTogether, “How does TraceTogether work?” https://tracetogether.zendesk.com/hc/en-sg/ articles/360043543473-How-does-TraceTogether- work-, 2020, accessed: 2020-03-23.
[26] “How COVID-19 spreads,” Centers for Disease Control and Prevention, March 2020. [Online]. Available: https://www.cdc.gov/coronavirus/2019- ncov/prepare/transmission.html
[27] J. Van Den Hooff, D. Lazar, M. Zaharia, and N. Zel- dovich, “Vuvuzela: Scalable private messaging re- sistant to traffic analysis,” in Proceedings of the 25th Symposium on Operating Systems Principles, 2015, pp. 137–152.
[28] N. Tyagi, Y. Gilad, D. Leung, M. Zaharia, and N. Zeldovich, “Stadium: A distributed metadata- private messaging system,” in Proceedings of the 26th Symposium on Operating Systems Principles, 2017, pp. 423–440.
[29] H. Corrigan-Gibbs, D. Boneh, and D. Mazières, “Riposte: An anonymous messaging system han- dling millions of users,” in 2015 IEEE Symposium on Security and Privacy. IEEE, 2015, pp. 321–338.
[30] M. J. Freedman, K. Nissim, and B. Pinkas, “Effi- cient private matching and set intersection,” in Inter- national conference on the theory and applications of cryptographic techniques. Springer, 2004, pp. 1–19.
[31] L. Kissner and D. Song, “Privacy-preserving set operations,” in Annual International Cryptology Conference. Springer, 2005, pp. 241–257.
[32] E. De Cristofaro and G. Tsudik, “Practical private set intersection protocols with linear complexity,” in International Conference on Financial Cryptogra- phy and Data Security. Springer, 2010, pp. 143– 159.
[33] E. De Cristofaro, P. Gasti, and G. Tsudik, “Fast and private computation of cardinality of set intersection and union,” in International Conference on Cryptol- ogy and Network Security. Springer, 2012, pp. 218– 231.
[34] G. J. Simmons, “Symmetric and asymmetric en- cryption,” ACM Computing Surveys (CSUR), vol. 11, no. 4, pp. 305–330, 1979.
[35] B. Greschbach, G. Kreitz, and S. Buchegger, “The devil is in the metadatanew privacy challenges in de- centralised online social networks,” in 2012 IEEE In- ternational Conference on Pervasive Computing and Communications Workshops. IEEE, 2012, pp. 333– 339.
[36] A. Kwon, D. Lu, and S. Devadas, “{XRD}: Scal- able Messaging System with Cryptographic Pri- vacy,” in 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20), 2020, pp. 759–776.
[37] D. L. Chaum, “Untraceable electronic mail, re- turn addresses, and digital pseudonyms,” Communi- cations of the ACM, vol. 24, no. 2, pp. 84–90, 1981.
[38] M. G. Reed, P. F. Syverson, and D. M. Gold- schlag, “Anonymous connections and onion rout- ing,” IEEE Journal on Selected areas in Communi- cations, vol. 16, no. 4, pp. 482–494, 1998.
[39] A. López-Alt, E. Tromer, and V. Vaikuntanathan, “On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption,” in Pro- ceedings of the forty-fourth annual ACM symposium on Theory of computing, 2012, pp. 1219–1234.
[40] H. Chen, W. Dai, M. Kim, and Y. Song, “Efficient multi-key homomorphic encryption with packed ci- phertexts with application to oblivious neural net- work inference,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communica- tions Security, 2019, pp. 395–412.
[41] J. Zhang, “620,000 people installed TraceTogether in 3 days, Spores open source contact tracing app,” Mothership, March 2020. [Online]. Avail- able: https://mothership.sg/2020/03/tracetogether- installed-open-source/
[42] A. C. Wagenaar, T. S. Zobeck, G. D. Williams, and R. Hingson, “Methods used in studies of drink- drive control efforts: a meta-analysis of the litera- ture from 1960 to 1991,” Accident Analysis & Pre- vention, vol. 27, no. 3, pp. 307–316, 1995.
[43] S. Pei, S. Kandula, W. Yang, and J. Shaman, “Forecasting the spatial transmission of influenza in the United States,” Proceedings of the National Academy of Sciences, vol. 115, no. 11, pp. 2752– 2757, 2018.
[44] S. E. Sarma, S. A. Weis, and D. W. Engels, “RFID systems and security and privacy implica- tions,” in International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 2002, pp. 454–469.
[45] “Private Kit: Safe Paths- Can we slow the spread without giving up individual privacy?” http:// safepaths.mit.edu/, 2020, accessed: 2020-03-23.
[46] “COVID Watch,” https://covid-watch.org/, 2020.
[47] J. Petrie, “Cryptographically Secure Contact Trac- ing,” March 2020.
[48] J. Dolan and B. Mejia, “L.A. County gives up on containing coronavirus, tells doctors to skip testing of some patients,” Los Ange- les Times, March 2020. [Online]. Available: https://www.latimes.com/california/story/2020-03- 20/coronavirus-county-doctors-containment-testing
[49] C. Y. Johnson and L. H. Sun, “Health officials in New York, California restrict coronavirus testing to
11
health care workers and people who are hospital- ized,” The Philadelphia Inquirer, March 2020. [On- line]. Available: https://www.inquirer.com/health/ coronavirus/coronavirus-testing-20200321.html
12