June 29, 2017

A Different Take on the AOL Search Data

I thought I’d take a quick break from Functionalism to quickly lay out some thoughts on the recent AOL search data madness.

As a disclaimer, though AOL has been a client of ours, we’ve never worked for the Search Group and weren’t involved in this in any way – so my thoughts are an outsiders, based solely on the various articles I read…

The firestorm of publicity surrounding the release of AOL search data underscores the risks that companies take in keeping and analyzing data about user behavior. An incident like this will have repercussions that most consumers and journalists never hear about, think about or, probably, care about. It will make it that much harder for managers in any organization to collect, analyze and deploy solutions based on tracking and understanding consumer behavior – no matter how reasonable the use or beneficial the application.

For most people, that may seem like a good thing. But is it?

There’s a clear difference between the AOL case and other recent “Data Loss” cases. I imagine that everyone agrees that when a company or governmental agency loses private data, has it stolen, or exposes it on the internet that this is a bad thing. But that isn’t what happened here. AOL released data scrubbed of personal identity so that researchers could better understand the way searchers actually worked. This is an important usability issue on the Internet – and it’s a complicated problem on which many independent researchers might well be able to make a significant contribution. So, in many ways, what AOL did was laudable – not a case of abuse but legitimate good use.

Now I think users probably realize and expect that companies will do this kind of thing internally. There is no significant web enterprise running today that doesn’t analyze both clickstream and search data to better understand user behavior on their site. The U.S. government spends millions of dollars a year collecting and publishing data on a much wider range of activities – and while that data is also scrubbed and aggregated it’s often quite possible to re-construct personal information. So the collective outrage about AOL’s decision to release this data reflects a deep misunderstanding of what the data is and how it can be used to benefit everyone from consumers to businesses.

Is there anything questionable in the AOL release? For thoughtful commentators, the biggest concern has been that even scrubbed data often has pointers to particular people. That’s true, and the fact that the search data is non-aggregated makes it potentially more vulnerable. But it’s also fair to suggest that actually identifying anyone would be nearly impossible (does anyone search “My Name is Gary Angel”?) and that the use of this data was not actually going to (or could in any reasonably imaginable world) harm anyone. People don’t search on their Social Security Numbers or their bank account numbers or even their mother’s maiden names! In a world of very real security and personal identity concerns, this one seems banal.

And the real problem is that for Search Engines and Web Sites to make themselves better, this is precisely the kind of thing they have to study and even use. Web sites have improved their usability and functionality dramatically in the past five years. And features like Amazon’s “Suggestions” are accepted as de facto best practice. That doesn’t happen by magic. It happens because companies study how users behave on their web site and try to figure out how to make it better.

Would it surprise people to know that department stores and groceries do the same thing? That retailers analyze how visitors flow through their store and group the products they purchase into “baskets” that tell them what items are purchased together and should be adjacent on shelves?

Should the data AOL released have been kept under corporate control? I’d say no – I think it would be better if more companies shared their de-personalized and/or aggregated data. Worse, the reaction has been so disproportionate to the decision that it raises the stakes in the minds of web marketers everywhere. It won’t protect indivdual’s personal data. It may even divert resources from meaningful efforts to protect that data. However, it will probably insure that your web experiences are less productive than they really should be. In the end, it’s hard to believe that these occasional paroxysms of public hysteria about data privacy are really good for anybody. The protections they encourage are usually knee-jerk reactions – poorly considered and probably unproductive.

The problem with the cases like the AOL data release are that web marketers are just as likely to react irrationally as anyone else. Here’s the choice they can make – do some form of data analysis to make their site better and get a small profit increase or risk their job by ending up in the public spotlight because somehow what they were doing became a “story.” Nobody wants to lose their job!

There’s a big difference between a company that exposes or releases data of direct personal consequence out of carelessness or cupidity and one that releases scrubbed non-personal data in a legitimate attempt to improve their service. Pretending that the difference doesn’t exist makes it harder for everyone – the people that are charged to protect your data and the people who need it use it.

Add to Del.icio.us | Digg | Yahoo! My Web | Furl

Get all the updates in RSS:

About Gary Angel 1 Article
Gary Angel is the author of the "SEMAngel blog - Web Analytics and Search Engine Marketing practices and perspectives from a 10-year experienced guru.