Red Branch Report April 2017

We have begun a monthly report on issues relating to Washinton and the tech industry.  Our vision: We aim to bring Washington, DC to Silicon Valley.  The large Valley firms have K St. lobbyists, lawyers, and public relations professionals. The small and mid-size start-ups and entrepreneurs have nobody. We try to fill that gap. This monthly newsletter has “news you can use” that may be important to the IT start-up sector. You may not have the resources to track it, but Washington is too important to ignore.

Here is the April 2017 Report.

The NSA and Big Data

This essay was first published by Praeger Security International (

Recent revelations that the National Security Agency (NSA) has been collecting large sets of telephone call meta-data raise a host of questions, with the most interesting systematic cybersecurity questions revolving around the NSA’s use of call log meta-data. This is yet another example of the phenomenon known as Big Data—the collection, storage, and analysis of massive databases of information. In this short piece, I want to describe Big Data in general terms and then talk about the NSA program as an example of the phenomenon.

Big Data

Increasingly, in a networked world, technological changes have made personal information pervasively available. As the available storehouse of data has grown, so have governmental and commercial efforts to use this personal data for their own purposes. Commercial enterprises use collected information to target their advertising and solicit new customers in order to expand market share. Governments use the data to, for example, identify and target previously unknown terror suspects—to find so-called “clean skins” who are not in any intelligence database. This capability for enhanced data analysis has already proven its utility and holds great promise for the future of commercial activity and counter-terrorism efforts.

Yet, this analytical capacity also comes at a price—the peril of creating an ineradicable trove of information about innocent individuals. That peril is typically supposed to stem from problems of misuse; in the government sphere one imagines data mining to identify political opponents, and in the private sector we fear targeted spam. To be sure, that is a danger to be guarded against.

But, the dangers of pervasively available data also arise from other factors. Often, for example, there is an absence of context to the data that permits or requires inaccurate inferences. Knowing that an individual has a criminal conviction is a bare data point; knowing what the conviction was for and in what context allows for a more granular and refined judgment.

The challenges arising from these new forms of analysis have already become the subject of significant political debate. The NSA meta-data disclosures are but the most recent and the most public.

NSA Meta-Data

The NSA, apparently, collects “meta-data” from Verizon. When we say “meta-data,” we mean the non-content portions of telephonic communications. These include data elements such as what number originated a call; what number was called; how long the call lasted; and quite possibly where, geographically, the two endpoints of the call were physically located. This meta-data is collected for every telephone call in the United States or between the United States and a foreign country. And though the disclosures do not say so directly, there is every reason to suspect that other telecommunications providers (AT&T, Sprint) are subject to similar disclosure orders. Though the contents of the phone call are not recorded, this database of meta-data for every call in the United States is a powerful analytic tool.

The data may well serve two purposes. First, it serves as a repository for what me might call “retrospective link analysis.” When, for example, the Tsarnaev brothers were suspected in connection with the Boston Marathon bombing, this repository of information could be queried to give investigators a picture of who (if anyone) the Tsarnaevs may have been in contact with prior to the bombing. With appropriate court orders, the subscriber information for the most commonly called phone numbers might be revealed and that could very well guide further investigation.

This use, of course, is limited to the extent that the database itself is limited. In the absence of a collection program of the sort operated by the NSA, the retrospective look would only go back as far as the service providers retained the calling meta-data—and that varies from company to company. If (as some have speculated) the NSA program is more than a half-dozen years old, that database would be rich indeed—and far larger than what the commercial service providers would retain on their own account.

The second use is a far broader and less narrowly-tailored one. It would involve the use of big data analytics for what we call social network analysis. In other words, again starting with a particular subject, we might map out not only who he is connected to, but also how the people he knows may be connected to each other and/or to other as yet unidentified individuals.

Here the science is clear—large databases are effective in establishing social patterns only to the extent they are actually comprehensive. If your argument is that we need to do a social network analysis to find terrorist connections, then you need the entire network to provide the grist for the mill, so to speak. That, almost surely, is what the Director of National Intelligence James Clapper meant when he said, “The collection is broad in scope because more narrow collection would limit our ability to screen for and identify terrorism-related communications. Acquiring this information allows us to make connections related to terrorist activities over time.”

Describing social network in the abstract is difficult. For those who want to see how social network analysis operates in a real-world context, I recommend the interesting (and amusing) post by Kieran Healey (a sociology professor at Duke), “Using Metadata to find Paul Revere.” Healy did a very simple form of matrix analysis using only two factors—the name of a person and the name of the political clubs he belonged to—and applied it to the colonist revolutionaries. The names were familiar—e.g., Sam and John Adams—as were the clubs (the North Party and the Long Room Club, for example). He used data collected from historical records by David Hackett Fisher that might well have been available to the British at the time of the revolution.

What he found is quite stunning for those who don’t know big data. Perhaps it’s a bit of a spoiler to say so but it turns out that the data identify one man as the lynchpin for a large fraction of the organization of the clubs and the men in Boston—Paul Revere. And while, in historical retrospect he may not have been THE leader of the revolution, it is pretty clear that he was a significant operative in the revolutionary structure—hence his famous ride. So with just two fields of data and some relatively simple analytics, British counter-intelligence of the era might have learned about his significance. (Note, of course, that more fields of data give even greater granularity and fidelity to the conclusions.)

And so, we now understand why it is the NSA was interested in these data sets. Large data sets can, with appropriate manipulation, reveal the organizational details of social structures. Terrorist activities are social structures of that sort. To my mind it is pretty clear that there are reasonable grounds to believe that the telephone call metadata database is relevant to the discovery of that structure and therefore relevant to an investigation of those terrorists.

Is It Wise?

But what is legal, is not always wise. Whatever this program’s legality, the entire order is remarkably overbroad and quite likely unwise. We fear (with some reason) its potential abuse. The technique is, of course, value neutral. It can be used to discover links for other types of groups, and it can be used in other large data sets. The limits we set are only constraints of law—the technology is not self-limiting.

In the end then, for me at least, the value of this program comes down to empirical questions about which I have no data: How effective has the program been in identifying terrorist social structures? How useful has it been in retrospective investigations? How likely is it to be abused and what preventative controls are in place? And how (if at all) will the existence of the program change citizen behaviors in ways we can’t predict?

In the criminal justice world we have an old maxim: “Better that ten guilty go free than that one innocent suffer in jail.” It is, in essence, a mathematical statement of a risk preference—we’d rather suffer the crime that comes from releasing the guilty than the societal harm that comes from imprisoning an innocent by about a 10–1 margin. Today the question posed by the NSA data collection program is a similar one: “Better that X terrorists structures go undiscovered than that Y innocent Americans have their calling data collected” where both X and Y are unknown.

For myself, in the absence of hard data, it is difficult to imagine a set of facts that would justify collecting all telephony meta-data in America. We live in a changed world after 9/11. But I would have hoped that it has not that much changed.

The NSA’s Surveillance Order — Legal, But Unwise?

The revelation that the National Security Agency (NSA) has secured a court order directing Verizon to provide it with call data has sparked controversy. And, rightly so. If the order is genuine (and nobody has denied that it is), it reflects a significant expansion of America’s surveillance apparatus – one that should at a minimum be closely examined.

First, some details. The order applies only to “meta-data” of calls: the phone numbers called, the location of the cell phone when the call was made, and the time and duration of the call. So the order does not require Verizon to let the NSA monitor the conversations or other content of the calls.

Also, the order applies both to international calls and to calls occurring wholly within the United States. Verizon is required to update its compliance “on a daily basis.”

Finally, though the order disclosed Wednesday applies only to Verizon, the logic of the request supports an inference that similar orders have been issued to other major telecommunications carriers like ATT & Sprint.

In short, the order appears to give NSA blanket access to the records of Verizon customers’ phone calls –foreign and domestic—made between April 25, when the order was signed, and July 19, when it expires.

Of course, if the order is only the latest in a series of orders (as also seems likely), then the access may go back for quite some time.

To a large degree this revelation it is not unexpected. We are a country still at war against Al Qaeda and its affiliates.

As such, we need to have counterterrorism tools, such as Section 215 of the PATRIOT Act, which was apparently used in this case. And, though we don’t yet know the details, it is important to note that since 9/11, the powerful tools have been modified and amended to maximize the protection of civil liberties to the extent possible.

Here, the FISA court issued an order allowing for telephone calling data only, not the content of any calls. Such data are critical for link analysis — connecting the dots between phone numbers in terrorist investigations.

That is constitutional.

Meta-data are not currently protected under the Fourth Amendment, and the large-scale collection of that meta-data remains lawful.

On the other hand, it is uncertain how the NSA was allowed to collect information on U.S. citizens within the United States.

Historically, both law and policy have limited the NSA to collecting signals intelligence only when it involves foreigners. Presumably there is some underlying procedural or legal limitation that insures that the NSA’s actions conform to law – but to date we don’t know what that is.

Finally, whatever its legality, the entire order is remarkably overbroad and quite likely unwise.

It is difficult to imagine a set of facts that would justify collecting all telephony meta-data in America. While we do live in a changed world after 9/11, one would hope it has not that much changed.

Cybersecurity and the Chinese Hacker Problem

Earlier this month, I did a podcast on the Chinese Hacker problem with Richard Bejtlich.  Richard is the Chief Security Officer for Mandiant — the company that published the high-profile report on how Chinese hackers are tied to the Chinese military.  Here is a summary of the podcast:

A few weeks ago Mandiant, a private cybersecurity firm, released an explosive report attributing an epidemic of Chinese cyber espionage to the Chinese army. In light of this report and other intelligence findings, the New York Times reports that the Obama Administration has publicly called on the Chinese government to intervene directly to end such cyber attacks from its own military. Richard Bejtlich, the Chief Security Officer for Mandiant, discusses the content of that report. Our other cyber expert, Paul Rosenzweig, joins to discuss what, if anything, the United States should be doing about this problem. This previously recorded conference call is a part of a new Teleforum series on Cybersecurity and Public Policy.

Taming the Cyber Dragon?

While Ben has often mocked the New York Times for its opinions, the Washington Post has mostly escaped our attention.  To a large degree this reflects the level-headedness of its opinions.  So when it slips into an alternate universe of unreality, that likely reflects something important.  Consider yesterday’s editorial opinion calling for the US to do more to “tame the cyber dragon.”

The Post rightly notes that China is stealing us blind — intellectual property and national security secrets are being exfiltrated through cyberspace on an industrial scale.  And though China denies any responsibility, its denials are the barest fig leaf of an effort.  As the Post says, there is growing evidence that China is behind one of the largest heists in history.  And something must be done.

But what?  Here the Post opinion becomes, well, a bit risible.  Their recommendations?  Wait for it …. “speak more firmly to China’s leadership about the problem, perhaps threatening to deny visas or expel those found to be involved in economic espionage.”  Of course, since those involved in the espionage aren’t here in the first place — they operate from China — the threat of expulsion is an empty one.  And I’m sure that speaking firmly to China will cause them to change their ways …. not!  As for denying visas to, say, students and tourists — that punishes our own domestic industries with little harm to the Chinese government.

To be fair, the Post does say that as a further step an “offensive cyber-assault to preemptively disarm adversaries” might be necessary.  But now we’ve leaped from “speaking firmly” to cyber war.  If we have any sense at all, we’ll find a middle ground — some kind of espionage-based response that causes equivalent pain to Chinese interests and that might get their attention.

Here’s one possibility I recently heard discussed the gives you a flavor:  Since China is interested in maintaining the status quo and uses the Great Firewall to keep destabilizing information out of the hands of its citizens, might we not promote internet freedom and dissuade Chinese intellectual theft by initiating a program to poke holes in the firewall?  Provocative to be sure — but far more likely to get their attention than speaking firmly and a lot less escalatory  than cyber assaults.  If we aren’t thinking about responses in this general vein, we should be.