Microsoft: How to study search data without risking privacy |
Follow the ups and downs of a new Seattle startup in a series of behind-the-scenes posts by its founders.
Data on Internet search queries is a potential gold mine for researchers, as a glimpse into the minds of the online population. But despite efforts to keep that data anonymous, its release is a mine field for personal privacy, as evidenced by AOL's legendary 2006 "screw up."
Now some Microsoft researchers say they've come up with a way to release and study search data without risking privacy. The company is quick to add that it doesn't have any plans to release search data in this way. But if anyone else is brave enough to give it a try (Yahoo? Google?) the approach is detailed in a Microsoft paper accepted for the International World Wide Web Conference in Madrid: PDF, 10 pages.
The trick is an algorithm that produces what the researchers call a "private query click graph" that shows queries and URLs, giving weight to different URLs based on the number of users who clicked on them after making particular queries.
"While this graph is not as powerful as the actual search log, many computations can still be performed on the click graph with results similar to the actual search log, e.g., finding similar queries, keyword generation, and performing spell corrections," the researchers write.
The research paper, nominated as one of the best at the conference, is one of 16 Microsoft Research papers accepted there -- about 15 percent of the total number, and more than any other organization participating.
The work was done by Stanford University student Aleksandra Korolova, while working as a Microsoft Research intern, along with researchers Krishnaram Kenthapadi, Nina Mishra, Alexandros Ntoulas of Microsoft Search Labs in Mountain View, Calif.
They created the algorithm based on what's known as the differential privacy definition. "In a nutshell," they write, "the definition states that upon seeing a published data set an attacker should gain little knowledge about any specific individual."
But for now, at least, the company isn't preparing to implement the approach itself. A Microsoft spokesman says in an email that the company "currently does not have any plans to use the capabilities found through this research in its products and services."
Todd Bishop is co-founder and managing editor of TechFlash. He has covered Microsoft and the technology industry for more than five years, most recently as a daily newspaper reporter and blogger based in Seattle.
READ FULL BIOGRAPHYJoin the Microsoft WebsiteSpark program and get software, support and visibility – at no upfront cost. You’ll benefit from fast and easy access to current Microsoft development tools, platform technology and server products including Visual Studio, Expression Studio, Silverlight, Windows Web Server 2008 and SQL Server 2008 Web.
Seattle-based Adhost is a WebsiteSpark hosting partner providing dedicated servers with free Windows Web Server 2008 and SQL Server 2008 licensing for three years to Web developers enrolled in WebsiteSpark. Servers are located in our secure data center with SAS 70 Type II certification, 24x7 technical support and 24x7 client access.
WTIA 15th Annual Industry Achievement Awards
Held on March 4th at the Showbox SODO, this casual event celebrates and recognizes some of the best emerging and established companies in the Washington Tech Industry. The evening will feature a finalist company technology showcase and a variety of coffee, chocolate, and wine samples from local companies. More than 800 attendees are expected to be present at the celebration. Public online voting for Technology Leader of Tomorrow Scholarships will open February 3rd on the WTIA site.
We congratulate the 2010 finalists and look forward to unveiling the winners on March 4th. REGISTER NOW!
Mobile applications are a cutting-edge way to extend your brand’s reach into the hands of consumers 24/7. Increase loyalty and customer engagement by developing a targeted and effective mobile application. Learn the basics about mobile applications by reading our white paper Beyond the iPhone: Engaging Customers with Mobile Applications.
If you’re considering a mobile initiative for Q1-2010, now is the time to get started. The white paper covers: Venturing into Mobile Application Development, Typical Problems to Avoid, and Developing the Solution. Reaxion is a Seattle-based mobile application development company focused on start-to-finish project management and cost-effective development. Download Beyond the iPhone: Engaging Customers with Mobile Applications.