Tuesday, February 23, 2010

Data-Driven, rather than Hypothesis-Driven

Article about Google's reliance on data at Wired.com

Instead of using a semantic framework to build up a theory of language, Google mines its massive trove of data to find contextual word associations.

As Google crawled and archived billions of documents and Web pages, it analyzed what words were close to each other... "Today, if you type 'Gandhi bio,' we know that bio means biography," Singhal says. "And if you type 'bio warfare,' it means biological."

Want to introduce a new feature? Forget focus groups or relying on management to make decisions, run experiments on actual users!

But Google also has a larger army of testers — its billions of users, virtually all of whom are unwittingly participating in its constant quality experiments. Every time engineers want to test a tweak, they run the new algorithm on a tiny percentage of random users, letting the rest of the site’s searchers serve as a massive control group.

Blog post about data-driven versus hypothesis-driven science

The new data-driven approach suggests that we collect data first, then see what it tells us.

More info can be seen at a previous blog post.

No comments: