Monday, November 28, 2005

Clone the Memeorandum API

Yes, I'm being flippant, mainly because Memeorandum doesn't even have an API yet. As founder Gabe Rivera says in the comments of a Scobleizer thread in cheerful response to a call for an open API:

Still, it’s not like I can just flip a switch to activate it. It will cost man-years of development to make it work well, and for various technical reasons, I think it will benefit a smaller number of people than most believe. I expect to make gradual progress toward that capabilty, but I don’t see the business justification for jumping in headfirst…

Man-years? How complex is the Memeorandum algorithm, anyway? I was looking over an old Larry Page PowerPoint linked by Ben Barren the other day and came across the original PageRank equation. Now that's some math, right there. Hoo boy. Does Gabe's algorithm similarly involve matrices, eigenvectors and differential equations? My feeling is no (although I expect Gabe will pop up in the comments to put me straight).

At this stage, I don't think Tinfinger's search result ranking algorithms will be anything near as complex. You might say that would mean Google's algorithms will be inherently better than ours, and I wouldn't argue. However, I think that there is a market for a search engine with not just an open API, but an open algorithm. For that algorithm to be accessible to a majority of people, it can't be anything more complex than addition, subtraction, multiplication and division. Or, if calculus is unavoidable for some reason, build it so that users of your service can adjust a bunch of easily understandable sliders to influence your equation.

This is what I don't understand about the clamour for Gabe to open up his API. It shouldn't be that hard to rebuild what he built, especially for those open source hackers. Hell, I'm not even a trained programmer and I can see how simple it should be (although maybe that tells me I'm too stupid to foresee the complexity in practice). We're doing something similar for Tinfinger, focused on a different set of assumptions, but the principles are the same. This is in no way to denigrate what Gabe has achieved, for to devise the system in the first place is a far greater feat than to copy it.

On the subject of algorithmic complexity, I've been reading Paul Graham's essays a lot lately (highly recommended for budding startups), and his piece on Undergraduation contains this interesting snippet:

When I was in college, a lot of the professors believed (or at least wished) that computer science was a branch of math. This idea was strongest at Harvard, where there wasn't even a CS major till the 1980s; till then one had to major in applied math. But it was nearly as bad at Cornell. Whn I told the fearsome Professor Conway that I was interested in AI (a hot topic then), he told me I should major in math. I'm still not sure whether he thought AI required math, or whether he thought AI was nonsense and that majoring in something rigorous would cure me of such stupid ambitions.

In fact, the amount of math you need as a hacker is a lot less than most university departments like to admit. I don't think you need much more than high school math plus a few concepts from the theory of computation. (You have to know what an n^2 algorithm is if you want to avoid writing them.) Unless you're planning to write math applications, of course.

That is interesting in the context of the Google guys' Stanford background, and what I've been reading recently over at Xooglers about Google's obsession with hiring the smartest. Hit the back button on that PageRank slide and you come to Larry Page's word for the echo chamber: the Cyclotron. I don't pretend to know much of anything going on in that PageRank equation - I'm way too stupid to qualify to get hired by Google - but I do know that that equation's complexity is an attempt to the problem of the Cyclotron, among other things. One of the unsolved mysteries of Tinfinger, for me, will be seeing just how true Paul Graham's words are for my business. Are search engines just "math applications"? They certainly can be pure math, but does the math have to be so impenetrably difficult that they can't be opened up in an accessible interface to the general public to play with as they see fit?


Blogger DevilsRejection said...

Hehe, yes I was the one who mentioned the fact that the API should be made. Damn fine article, maybe with more pressure this will happen faster!

2:04 pm, November 28, 2005  
Anonymous Anonymous said...

The current memeorandum algorithm is beside the point. The proposed API is a large-scale distributed service, like bloglines and rojo, but with far more computational requirements. Given the man-years sunk into those two unprofitable services, both of which continue to have performance problems, the API idea doesn't look too attractive to me.

2:14 pm, November 28, 2005  
Blogger DevilsRejection said...

It wouldn't hurt applying for a position at either GYM ;-)

2:30 pm, November 28, 2005  
Blogger Nick Lothian said...

Does Gabe's algorithm similarly involve matrices, eigenvectors and differential equations?

I'd bet it does use at least matrices and eigenvectors. (I've written a thing kind of like Memeorandum. Pretty much any time you are doing similarity algorithms you need to use eigenvectors)

WRT an open algorithm - have you seen the "Result Ranking" thing on MSN search ( MSN uses a neural network for ranking so the algorithm is probably even more complex than PageRank (or its successors). That doesn't mean that the parameters can't be human-tweakable, though. See, too.

2:33 pm, November 28, 2005  
Blogger Paul Montgomery said...

From what I read of Stefan's request, it sounded more like reusable code fragments, kind of like an open source meme engine. I could certainly understand why THAT would not sound attractive to someone who's put all the work in.

I was really talking more about the algorithm in any case, which is why I was upfront about the title being flippant.

I'm no API expert Gabe, so I can't comment on its attractiveness or otherwise in relation to your site, beyond that there seems to be a big hump between developing the algorithm and delivering it on a scale large enough to support a popular open API. That's where extra capital comes in, evidently, as Stefan suggests (evilly!).

2:36 pm, November 28, 2005  
Blogger Danny said...

Paul Graham appears to have rather a narrow view of mathematics (which may reflect that of CS departments). What about logic, the operational semantics of code, the declarative semantics of markup, the relational model... The whole of computer science *including programming* is built on math. Sure, you may any need little bits of the theory in practice (the tools wrap the theory) but it's there underneath all the same.

That it took Gabe a year to build Memeorandum suggests that building working software is hard work, not that there might necessarily be hard sums involved.

6:21 pm, November 28, 2005  

Post a Comment

Links to this post:

Create a Link

<< Home