Why meme tracker/diggers suck
There has been much conjecture lately about memetrackers, or the new nomenclature of memediggers (just call them Digg clones and be done with it, Pete), how they all look the same, how they encourage snark over substance, and how they are unsatisfyingly static to people who prefer river-of-news formats.
The problem is not with the theory of clustered news. It is with the application. To be more specific, it's not that everyone is doing it wrong, it is that they are prevented from doing it best by scale. When you see the front page of Memeorandum or Tailrank or any other clustered news service, you are seeing a flat text file, or a pre-ordered result from a small database table, which in itself takes little or no processing power on the server side. That's not to say it is simple to create that cluster, far from it. Clustering algorithms require a great deal of processing on the server. To enable a blindingly fast load time for you on that page, those algorithms are not calculated in real time. They are prepopulated and written to that text file or small table every X minutes so that you have a seamless reading experience.
This is the heart of the problem that users are experiencing. In a perfect world with unlimited processing power and zero latency, readers would have a bunch of sliders on the client with AJAX-powered dynamic result updating for things like immediacy, meme size, link numbers, article length, orthogonality to your OPML file, and anything else that can be thought of as a filtering variable. Unfortunately, we are stuck in the real world where the best that can be hoped for is a standard 24 hour window and fixed algorithmic relationships that only vary slightly from aggregator to aggregator - and form much of the competitive differentiation of the various players.
The obvious next step is not necessarily to shift from one static view to another, as Kevin Burton has done with Tailrank at Dave Winer's suggestion. That's not to diss Kevin though, as more than anyone he has led from the front in the battle to escape the legacy of being a Memeo clone, something that none of the Digg clones have managed to do (or seem to want to do). No, the only way to be truly free of the baggage of the hegememeony is to get bought out by one of the major players who have humungous server farms to devote to the scale problem: i.e. the big G, since they're the ones with the best iron. If one of the tracker/diggers had access to enough parallel cycles to do justice to their algos in the same way PageRank is calculated on the fly by football-field-sized farms of corkboard-insulated droneboxen... then you have something. More than in most other Web 2.0 startup sectors, the memetrackers are crying out to be flipped because they would benefit so crucially from the toys that the GEMAYA sugar daddies can let them play with.
Ah, you say, but what about Google News? No. GN is not a long tail play. It only covers a small amount of the media, namely the major outlets, and the major outlets have a very narrow focus. GN does not use anywhere near the full power of the company's vast server farms. Judging by the size of its database, I would estimate (© PNOOMA Research) its scraping could conceivably be performed on a handful of boxes. The independent memetrackers are different beasts, that could eat GN for lunch if just given enough room to grow into something that doesn't suck.
My picture appears to the right in an effort to get it to appear via Memeorandum's new picture clipping feature. Link me early and often!
The problem is not with the theory of clustered news. It is with the application. To be more specific, it's not that everyone is doing it wrong, it is that they are prevented from doing it best by scale. When you see the front page of Memeorandum or Tailrank or any other clustered news service, you are seeing a flat text file, or a pre-ordered result from a small database table, which in itself takes little or no processing power on the server side. That's not to say it is simple to create that cluster, far from it. Clustering algorithms require a great deal of processing on the server. To enable a blindingly fast load time for you on that page, those algorithms are not calculated in real time. They are prepopulated and written to that text file or small table every X minutes so that you have a seamless reading experience.
This is the heart of the problem that users are experiencing. In a perfect world with unlimited processing power and zero latency, readers would have a bunch of sliders on the client with AJAX-powered dynamic result updating for things like immediacy, meme size, link numbers, article length, orthogonality to your OPML file, and anything else that can be thought of as a filtering variable. Unfortunately, we are stuck in the real world where the best that can be hoped for is a standard 24 hour window and fixed algorithmic relationships that only vary slightly from aggregator to aggregator - and form much of the competitive differentiation of the various players.
The obvious next step is not necessarily to shift from one static view to another, as Kevin Burton has done with Tailrank at Dave Winer's suggestion. That's not to diss Kevin though, as more than anyone he has led from the front in the battle to escape the legacy of being a Memeo clone, something that none of the Digg clones have managed to do (or seem to want to do). No, the only way to be truly free of the baggage of the hegememeony is to get bought out by one of the major players who have humungous server farms to devote to the scale problem: i.e. the big G, since they're the ones with the best iron. If one of the tracker/diggers had access to enough parallel cycles to do justice to their algos in the same way PageRank is calculated on the fly by football-field-sized farms of corkboard-insulated droneboxen... then you have something. More than in most other Web 2.0 startup sectors, the memetrackers are crying out to be flipped because they would benefit so crucially from the toys that the GEMAYA sugar daddies can let them play with.
Ah, you say, but what about Google News? No. GN is not a long tail play. It only covers a small amount of the media, namely the major outlets, and the major outlets have a very narrow focus. GN does not use anywhere near the full power of the company's vast server farms. Judging by the size of its database, I would estimate (© PNOOMA Research) its scraping could conceivably be performed on a handful of boxes. The independent memetrackers are different beasts, that could eat GN for lunch if just given enough room to grow into something that doesn't suck.
My picture appears to the right in an effort to get it to appear via Memeorandum's new picture clipping feature. Link me early and often!
1 Comments:
Great article.
Post a Comment
<< Home