Yesterday I wrote about the RealID Act, which will give the DHS the ability to acquire tons of sensitive personal data. But how can all this data be analyzed?
Here are some sophisticated (and scary!) tools that analysts have or soon will have at their disposal.
But first, a disclaimer:
With regard to these technologies I am an interested layman. The inferences I draw in the following are on the basis of my best understanding, but that understanding may be flawed. Any corrections, additions etc. are more than welcome.
From Technology Review.Com, May 10:
A new generation of software called Starlight 3.0, developed for the Department of Homeland Security by the Pacific Northwest National Laboratory (PNNL), can unravel the complex web of relationships between people, places, and events. And other new software can even provide answers to unasked questions.
Anticipating terrorist activity requires continually decoding the meaning behind countless emails, Web pages, financial transactions, and other documents, according to Jim Thomas, director of the National Visualization and Analytics Center (NVAC) in Richland, Washington.
Federal agencies participating in terrorism prevention monitor computer networks, wiretap phones, and scour public records and private financial transactions into massive data repositories.
“We need technologies to deal with complex, conflicting, and sometimes deceptive information,” says Thomas at NVAC, which was founded last year to detect and reduce the threats of terrorist attacks.
As an aside, we would do well to keep in mind that the “massive data repositories” do not necessarily have to be physical installations. Existing data mining technology (pioneered by those friendly folks who call you up at dinnertime) makes it possible to run queries across any number of databases at once. In other words, all databases, public or private, which the government can access essentially belong to one single effective database.
What exactly does the software do?
In September 2005, NVAC, a division of the PNNL, will release its Starlight 3.0 visual analytics software, which graphically displays the relationships and interactions between documents containing text, images, audio, and video.
The previous generation of software was not fully visual and contained separate modules for different functions. It has been redesigned with an enhanced graphical interface that allows intelligence personnel to analyze larger datasets interactively, discard unrelated content, and add new streams of data as they are received, according to John Risch, a chief scientist at Pacific Northwest National Laboratory.
…“For tracking terrorist networks, you can simultaneously bring in telephone intercepts, financial transactions, and other documents…all into one place, which wasn’t possible before,” Risch says.
…
Visualizations … graphically depict relationships between content by displaying them in a variety of formats, such as a star cluster showing more popular topics as larger stars; topographic maps; or a river of information showing interest in a topic over time. Generating visualizations instead of relying on text-based searches “allows the human mind insight into fuzzy relationships and tries to resolve uncertainty,” says Thomas.
So in other words, the software generates a graphical representation of the interrelationships between data objects and data streams – which in turn represent different individuals.
If it works on “terrorists”, imagine how well it would work on citizens.
What kind of performance is this software capable of?
Starlight quadruples the number of documents that can be analyzed at one time — from the previous 10,000 to 40,000 — depending on the type of files. It also permits multiple visualizations to be opened simultaneously, which allows officers for the first time to analyze geospatial data within the program. According to Risch, a user will be able to see not only when but where and in what proximity to each other activities occurred.
Geospatial data: that implies the ability to integrate data from physical surveillance of meatspace movements, and not merely analysis of our electronic trails. But I digress.
40,000 files, or datasets – what does this mean? To make a wild-ass guess, this means that they could take as their initial set, say, all active contributors to BooMan Tribune and identify everyone they interact with at work, in their free time – and in the pursuit of their legitimate political activities.
Oh yeah – it’s also Windows-based and uses XML.
This is not the only string to the analysts’ bow. Some others are:
- IN-SPIRE, [a] software that extracts the meaning of large datasets and allows users to pose alternative hypotheses and to see data supporting that scenario… For instance, an analyst could posit that Osama Bin Laden is planning an attack on a European nation at a given time and with a particular weapon. IN-SPIRE will look for relationships between documents validating the hypothesis; for example, the software would look for the most likely nearby locations where such a weapon could be acquired and if secondary or tertiary associates have visited those areas.
- The Department of Defense is using software from Intelligenxia called IxReveal to track online message threads and give “answers to questions that haven’t been asked,” according to Ren Mohan, co-chairman and CTO of the Jacksonville, Florida-based data analysis company… Mohan says that, because “we often don’t know what we don’t know” about terrorist activities, analysts employ the company’s IxReveal to extract the topics that are being discussed most frequently rather than searching for specific items. This approach can overcome analyst bias by exposing all of the important concepts currently being discussed in chat rooms, email, or user groups…
Hmmm, you don’t suppose they mean us, do you?