Skip to content

Big Data Meets Big Brother

June 6, 2013

The Guardian (because really, are the American papers even trying any more?) has published a classified court order forcing Verizon to give the FBI and NSA all telephone metadata from the three months ending July 19th. The metadata allows them to see things like what number the phone connected to, what time the call started and ended, what tower it was sent from (and thus the location of the caller, to within a few hundred meters) and similar data. It would be surprising if Verizon were the only major carrier affected, or if this three month period were some kind of anomaly; probably the government has had complete access to all your call log information for quite some time regardless of which carrier you use. Isn’t the information age fun?

I’ve seen a few people write, oddly, that this isn’t that bad because the data would be too hard to go through, as if the only way to learn anything from it was to print out all the logs on giant stacks of old-fashioned perforation-separated printer paper with strips of holes along the sides, and to have an army of coffee-stained secretaries leaf through the stacks looking for Middle Eastern surnames. This kind of naivete is really remarkable for 2013. In reality all you need to do some really creepy stuff is a medium-sized computing cluster, available for a rounding error in the NSA’s budget, and a few computer science PhDs with no scruples about massive warrant-free invasions of privacy.

A quick but generous back-of-the-envelope estimate suggests that the country might make something on the order of a terabyte of cell phone metadata a day. That is to say, about one big laptop hard drive every 24 hours. Over the three months the warrant covers, we might generate 100 TB of data—that sounds like a lot, but to put it in perspective, if the detectors at the Large Hadron Collider just sent all their data straight to hard drives with no pruning of uninteresting data, they would generate a few petabytes (1 PB=1000 TB) a second. With the NSA’s much bigger budget, 100 TB is not too bad. The time it takes to run an operation on the data will depend on what you’re doing to it and how nice a computer you’re using, but let’s say, to get a rough idea, that a medium difficulty operation (one that takes a good hard look at each metadatum but doesn’t look for patterns across all of them) will take about as much time as letting Norton Antivirus scan for malware. To virus scan a large, full laptop hard drive with a single commercial processor takes a few hours; let’s say 10,000 seconds. Of course, the NSA can probably let a few thousand processors do the job with no problem. Let’s give them 2500 cores (reasonable for a well-funded science research group, probably an underestimate for the NSA), and say that cluster overhead is roughly canceled out by the fact that Norton is commercial software and probably written to use less of the available computing power than it has to. Then we get about 4 seconds of computer time per day of metadata. Out of an abundance of caution, let’s more than double that to say that a year’s worth of data will take about an hour to search, using incredibly generous assumptions. This is assuming a general search over all the data. As long as the data are organized in an intelligent way, more specific searches could be much faster. These data are actually very fast easy to manipulate; for some searches you could almost do it in real time.

But what could you actually do with them? Here’s one very simple thing you could do, just the first thing to come to mind:

  • Make a list of 25 known persons of interest, and their cell phone numbers. Unless the nation’s security apparatus spends all its time sitting on its thumbs, there are probably hundreds or thousands of these lists just lying around.
  • Search a year’s metadata for all the calls they sent and received, and make a list of all the numbers they communicated with.
  • Turn the list of numbers into a list of names using customer records that must be easy to requisition from the telecom companies once you’re already getting all this stuff.
  • Narrow that down to anyone who talked to more than, say, 10 of the original people of interest over the course of the past year.

Congratulations, you now have an essentially complete list of the leadership and most active members of Occupy Oakland The Fort Worth Tea Party Patriots an Al Qaida cell in Des Moines, IA.

And that’s basically the simplest thing you could do. There are probably cell phone use patterns that are good predictors of political and religious views (gay phone sex hotlines→right wing evangelical Christian) so you could start guessing who to spy on before you even look up their name. If you were an NSA employee and you suspected that your husband was cheating on you, you could find out pretty easily with a few keystrokes! If you associated the data with tracking software placed in regular paid ads on popular websites—which, if I were the NSA, I would have started using years ago—you could start associating the phone metadata with the websites people visit and their political and social affiliations. If you could set up a way to get the data from the carriers in real time (which wouldn’t be too hard, really), you could play all kinds of fun games. If you start to associate the phone numbers with their owners’ political affiliations, you could watch the location metadata to see if similarly-minded people were converging on a single location and head off protests before they start. Think a band of environmental activists are going to disrupt and ExxonMobil stockholder meeting? Arrest them before they get there after you use their sudden flurry of cell phone activity to guess that they’re up to something and their location data to see that they’ve all traveled to Aspen together along with the shareholders. The possibilities are endless.

And don’t worry about warrants or any of that stuff. The guys writing the Fourth Amendment in the 1780s didn’t anticipate cell phones, so this is all pretty legit as far as the Federal Government’s lawyers are concerned.


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: