Technical Notes on the NSA's Prism Program

June 6th, 2013

The story about the NSA's Prism program was broken by the U.K. Guardian yesterday. Technology companies refuted those claims today. These are three short notes on the technical implications:

The leaked materials show that the NSA has direct access to internal user data at various technology companies. The CEOs of all the implicated companies have denied the existence of backdoors or unfettered access to the NSA. There is one possible scenario in which they are both correct. The NSA undoubtedly has access to peering points where Internet traffic is transferred between ISPs. All they would need for unfettered access to user data is the SSL/TLS private key file used to encrypt traffic. This file is required at all points where SSL/TLS traffic terminates (i.e., many, many load balancers and proxies), and is probably less than a kilobyte. It's small enough to be transferred on paper. Certificate pinning, elliptical curve cryptography, large high-entropy keys are all completely useless if the NSA can subpoena or otherwise acquire the private keys and install beam splitters at peering points. There's no need to be a man-in-the-middle when you can be the man-at-the-end.
The acknowledged use of phone-call metadata implies the use of a communications graph, possibly to find second-degree nodes that are connected to "suspicious" nodes, or infer community structure. These topics have been extensively studied in the last decade (search Google Scholar for semi-supervised learning and community detection in networks). The use of graph based methods for detection leads to graph-based ideas for avoiding detection as well. It's plausible that knowledge about the detection method will lead to changes in interaction patterns that will make the method less effective. What's worse, however, is the possibility of "metadata attacks", where elements within a "suspicious" cell create a pattern of links to a target, causing the target to also be classified as suspicious.
Link prediction is a well-studied research problem where, given a graph, the goal is to predict links that are "missing" in some sense. These could be friendships in a social network (think "people you may know..."), or whatever you would call links in a network of "suspicious" people. One of the simplest ways you could do this is to count the number of friends two people (who are not friends themselves) have in common. The higher the number of common friends, the more likely it is that the two people are actually (or should be, or will become) friends. An excellent but slightly older introduction to the problem is this excellent paper by Liben-Nowell et al. The most worrying question is what becomes of the false-positive links?