Saturday, September 16, 2006

AOL's search gaffe: not so bad?

If you haven't heard, somebody at AOL released three months of web search records for roughly 1.5% of U.S. users that use AOL's client software. In the aftermath, AOL officially apologized and followed that up with the firing of three staff members including the Chief Technology Officer. Despite this, criticism of AOL continues, with the EFF leading the complaints.

Now usually I agree completely with the EFF, but in this case I'm not so sure. In the first place, each searcher is associated with a random ID number, so most searchers are impossible to identify--so far as I have seen, only one person has been identified by a third party. In the second place, AOL has been so thoroughly flogged that I don't think they will ever try it again.

And that's my point. Much has been made of the danger of releasing search records--sometimes people search for their own name, or personal info such as their Social Security Number whose presence online they only want to detect, not create. And, of course, there's the chilling effect that could be caused by people's fear that they might be watched. I'm sure there are other reasons too.

But as AOL says, their intention--or the intention of those who were fired, presumably--was to give academic researchers real-world data to analyze, so that researchers could look for patterns in the data and see what sort of thing real users search for. This is interesting stuff, both to provide us with shock-and-awe stories of sickos online (for example, this and this), and to satisfy our curiosity about what other people search for.

The nice thing about the AOL scandal is that none of those whose searches were revealed knew that their queries would be recorded and broadcast. This obviously makes it a serious privacy incursion, but it also ensures that the searches were completely honest, not constrained or biased by privacy worries.

The fact is, this search information can tell us things that we simply cannot learn any other way. So I, for one, will be genuinely interested to see what sort of observations academics will make from this data (if they dare).

But now that this data is available, I guess we don't really need any more. This data is valuable because of its uniqueness, but a second data leak wouldn't have the same novelty. So if some other dumb search engine decides to take a data dump, that's when I'll consider unleashing my wrath.

No comments: