Australian entrepreneur with FanFooty (alive) and Tinfinger (dead) on his CV. Working on new projects, podcasting weekly at the Coaches Box, and trying not to let microblogging take over this blog.

Monday, September 24, 2007

From Apache to lighttpd and back again

Reading the fascinating insights from cdbaby.com's Derek Sivers on switching from PHP to Ruby and back to PHP has encouraged me to blog about something I've been thinking of blogging for a while now: our departure from Apache to lighttpd as web server of choice for FanFooty during the last three months and our very recent flight back to Apache.

Apache was pretty much a default for our sites to begin with, as the A part of LAMP. However, we have had increasing problems with FanFooty scaling up to meet the demands of huge numbers of concurrent users. By far FanFooty's most popular pages are the live match fantasy scoring pages - for example the one for the Geelong v Collingwood game this weekend - which after many battle-hardening iterations have been reduced down to a flat HTML page which through AJAX calls a single comma-delimited text file every 30 seconds. No PHP, no mySQL, just flat text files.

Even this pared-down architecture was showing distinct signs of overloading on our one solitary server halfway through the 2007 AFL season, with frequent crashing during peak load times. I don't know exactly how many concurrent users we had, but each match day we were getting accessed by 12,000 unique browsers over the course of the day, so I would estimate that load on each live match fantasy scoring page would have been many thousands of requests for the exact same file every 30 seconds... for a couple of hours straight.
Seen at right, our traffic graphs were the same for every game: a smooth ramp up during each game culminating in a big spike at the end of each game as fantasy coaches checked the final scores. Some days there would be at least one game on continuously for eight straight hours, with three or four spikes.

Something had to give. We examined the state of our server during those peak load times, and found that the main problem was that Apache was starting too many processes and running out of RAM. We tried doubling the RAM on our server, but that was an expensive process and not a long-term solution.

To cut a long story short: we heard of lighttpd, read some reviews, tested it, liked the concurrent user benchmarks we were getting on our box, tried it, and it worked a treat. The only teething problems came from having to rewrite Apache's .htaccess rules into lighttpd's mod_rewrite rules: regular expressions are tricky for a self-taught rube like me. Once it was up and running, all of our scaling problems went away, and in-game crashes almost disappeared.

I say almost, because lighttpd, for all of its speed and efficiency, has one glaring flaw: memory leaks. The hallmark of lighttpd is its low memory footprint compared to Apache, but to do that it has sacrificed a lot of Apache's management subroutines, one of the most important ones being how Apache handles the changeable rate of requests it handles over time. Through painful personal experience, I can say that lighttpd has serious problems with managing its memory footprint, in that it seems incapable of reducing its memory usage after a high load period. The lighttpd application seems to creep ever upwards in memory usage, ever so slightly over many hours, until it fills all available RAM and brings the server to a standstill. While the high load period is ongoing there is nothing better than lighttpd for handling thousands of concurrent users, especially for static content like text and picture files, but when the party's over lighttpd can't seem to shake off the hangover.

This is why we have switched back to Apache, now that the AFL football season is all but over. Having to manually restart the lighttpd application multiple times per day is a pain in the arse I don't want to have to deal with. I'm putting co-founder Tai on the task of building a remote monitoring application to run on our local boxes. We're even investigating the possibility of a "dual-boot" server, if you will, which would allow us to switch between Apache and lighttpd as traffic conditions dictate (EDIT: this link looks promising). We're trying to build Tinfinger with a similar reliance on flat text files (through a religious zeal for caching) so that when the TechCrunch traffic spikes hit then we'll be able to handle it with lighttpd.

I don't know that many Web applications would have similar traffic patterns and system architectures to FanFooty, but hopefully the above is helpful to someone who is in the position I was in three months ago and looks for guidance online about what to do in the face of massively spiky traffic problems.

Wednesday, September 19, 2007

Exclusive: Rich Skrenta's new startup Blekko

Rich Skrenta, recently departed CEO of Topix, has been in the news in the last couple of days due to the 25th anniversary of the release of the Elk Cloner virus he wrote. An interesting little snippet at the end of the AP story is about his new project, which is described as "Blekko Inc., a month-old startup still working in stealth mode".

After contacting Rich via IM and grilling him using all my powers of investigative journalism and torture techniques, he finally admitted to me what Blekko was all about, and I can exclusively reveal it to the two or three of you who still subscribe to this feed.

Blekko is, in fact, a return to Skrenta's roots. The codebase is in its infancy... well, more like it's a couple of zygotes really. However, the intention is clear, the target has been set, and the infection vectors have been mapped.

Blekko is going to innovate strongly in the hot field of viral adoption. This will be helped greatly by the fact that the main Blekko product will, in fact, be a virus. But not just any sort of virus. It will be the first virus specifically designed to infect iPhones.

The name of the company gives the final clue as to what Blekko will do once it has infected iPhones, as a cursory googling will attest. Obviously Skrenta has had this planned for a decade or more, with the first seeds for the concept planted back in the days when telnet was still a new technology.

For those few of you who haven't guessed by now: yes, the Blekko virus will endlessly loop the Star Wars ASCII movie on iPhones it infects. All of the venture capital Skrenta has raised will go towards finishing this massive project, which after more than 10 years is still a long way from completion.

I wish Rich all the very best of luck in his endeavours, and look forward to seeing the looks of horror on the faces of those self-absorbed, over-cashed iPhone buyers as their beloved toys get bricked by Obi-Wan.

Exclusive screenshot of Blekko in action!

Update: explicit confirmation of this story has been provided by Rich Skrenta himself, as a quick look at blekko.com will demonstrate.

Sunday, September 16, 2007

The search engine is dead, part II

I have attacked Charles Knight before on this blog, but I am about to do so again. His Read/Write Web affiliate blog called Alt Search Engines recently published three articles trying to define the market from which he evidently wants to suck blood like a remora. What is a Search Engine? was the first effort, What is Not a Search Engine? was the second and the third, What is an “Alternative” Search Engine?, was the only one written by Knight himself.

In short, the whole exercise is a joke. The first two articles provide a somewhat serious look at how to define a search engine, but Knight pretty much admits with a straight face that he thinks a search engine is whatever is going to make him money. The clincher is that even though Quintura is specifically mentioned in the second article as not being worthy of being called a search engine because it does no webcrawling of its own, Knight still claims it is, and recently named it Search Engine Of The Month... and who is that advertising on the right of the page? Why, that looks like a Quintura ad! How could that have gotten there? Fancy that.

(screenshot modified according to the uncov school of aggression.)

The fact that such a blatantly self-promoting, ignorant shill can even survive means that the search engine industry is dead. His obvious pandering to potential advertisers and consultancy targets might be okay if he had any idea what he was talking about, but Knight admits openly that he treats the industry like he's a movie critic with zero credentials. Knight's ethos of interface over backend, style over substance, is the sort of thing that is giving the concept of the search engine a bad name. If you can code up a PHP front-end to someone else's crawl with pastel colours, rounded corners and tag clouds, Knight will call you a search engine.

Charles Knight is the Grim Reaper of the search engine dinner party, standing in the middle of the table and pointing his bony finger at eaters of the salmon mousse. Richard McManus, this joker makes your operation look amateurish. He is only getting worse and worse.

Sunday, September 02, 2007

The search engine is dead, part I

I have decided that Tinfinger is not a search engine. It never was one, really, even though before now I had displayed the bold claim in the site's tagline: "Human search engine". I have come to the conclusion that I was sadly misguided.

There is no search engine but Google. There is no use fighting it. That is, unless you are skilled in the very techniques which have made Google what it is, and not the sort of thing that gets you a stinging rebuke from the lads at uncov, as PeekYou did this week.

They don't even bother to parse the query! Spock can even get off their lazy ass and do that. Sure, with names it's never as simple as a call to split(), or, uh, in PHP shall we say explode(), but at least try. This embarrassing lack of parsing suggests that their search is backed by MySQL, and the queries look like this:

SELECT * FROM Table WHERE FirstName="$firstName" AND LastName="$lastName";

I can't make the point any more clear that this is not a search engine. Search engines add value to a data store with intelligent parsing and ranking. If I wanted to look up people like this, I'd use the fucking phone book.

PeekYou, and its competitors Spock, Wink and the like... ARE the phone book. That's all they are. Specifically, they are phone books for social networks. Phone books are, by their nature, dumb. In a structural sense, I mean. Sure, they have utility, and every once in a while you'd like to have them around, but they're not very complex. The US White Pages includes separate form input boxes for each of its fields. So does the Australian White Pages.

Is that a search engine? No, I don't think so. Just because the site allows you to search a database doesn't make it a search engine, in my eyes.

Similarly, there was a stink a little while ago about Robert Scoble announcing that a gestalt entity comprised of Facebook, Techmeme and Mahalo was going to kill Google in four years' time. As punishment for the obvious linkbaiting, I choose to perform a gillmor (a verb meaning "to not link a vital underpinning part of one's argument, assuming that the reader knows what one is talking about, and additionally, fuck you readers, you little worms"). Anyway, so behind all the handwaving and whiteboard engineering, Scoble seemed to think that Mahalo's search results had the potential to become better than Google's.

Is Mahalo a search engine? No, I don't think so. The front page is rendered in a directory structure, and while there is a search box for the site, once you stray past the minority of authored pages, you're not in Kanmahalosas any more. Essentially, it's a bunch of link whitelists.

Obviously I don't think Wikipedia or Citizendium are search engines either, they are encyclopedias. The point I'm trying to make is that we need a new vocabulary for describing what we now call search engines, because as the uncov boys point out so cuttingly, many of the startups who currently bask in the status of being called a search engine are no more worthy of the name than the dead tree doorstop that used to get delivered to your home each year. Persai, the project being worked on by uncov denizens Ted, Kyle and Matt, is a real search engine, or at least it promises to be.

So, what do I think Tinfinger actually is? Well, apart from the fact that it's going to have a social networking element tacked on, and a Techmeme-style news aggregation feature (which made me wonder if Scoble had hax0red my bizplan), I think it's going to be an omnibus. By that, I mean it will be a collection of articles about a subject. It's not a very sexy word, but there it is.

You may be wondering: what does it matter what we call them? I think it is critical to the success of these ventures what they think of themselves as. I shall explain in my next blog post.