Google and Facebook: Algorithmic and Social Search

Jeff Atwood recently posted on the declining search quality of Google. After citing issues with spam sites and content farms out-ranking legitimate sites (like his own Stack Overflow), he concludes “Is the next generation of search destined to be less algorithmic and more social?”

I think the stark contrast he asserts between algorithmic search and social search is misleading. Google’s algorithm is social because it treats a link like a vote of approval. It’s not so different from Hacker News or Reddit in that more votes generally mean higher ranking. The problem though is that many of those votes are not trustworthy or valuable. So Google’s real problem is, just like Hacker New’s problem, maintaining a community of good voters.

But defining “good community member” is a hard problem, a hard philosophical problem. What is needed is a set of necessary and sufficient conditions for being a good community member, and harder still a set of conditions that can be measured. We can start by asserting that a good community member links to good websites and/or produces good content. What is a good website, what is good content? Well, those are also hard problems, let’s just say good is some combination of accurate, informative, useful, interesting, etc. and move on. Is a lot of unique visitors sufficient for being a good community member? Doesn’t seem so, the content farms and spam sites get many uniques (largely from Google itself, but traffic from Google shouldn’t factor into any ranking decisions because it would be circular; more traffic from Google leads to higher rankings which leads to more traffic from Google. You’d not get a meaningful ranking that way.). Maybe high traffic is not sufficient, but what about necessary? Surely not, someone could start a site with nothing but excellent links and content, but get hardly any traffic. There are many factors which determine how much traffic a site gets, and good content/links is only one of them. So if not necessary, nor sufficient, is traffic a good surrogate for quality? Meaning, do higher traffic sites tend to be higher quality? If they do then traffic could be used for ranking. This process needs to be repeated for every measurable quality of a website, maybe sites with certain words or phrases on them tend to be higher quality, or maybe sites with more Twitter mentions are better, etc.

Facebook skirts around this issue by defining a good community member as your friend. The problem with this is two-fold: friends do not always produce good links, and nobody has enough friends to provide links to all the long-tail content they’d want. Solving the first problem is going to require some filtering (either by algorithm or by hand; maybe they will ask you to rate your friend’s interestingness and promote interesting friend’s links). Solving the second problem is going to make the first problem far, far worse. If you had enough friends to provide you with all the links you’d want, you’d have too many friends and never find any of the links you’d want.

The one that wins at search is going to be the one that figures out what a good community member is and how to measure it.