Thursday, January 24, 2008

All your browsers are belong to X-UA-Compatible

An article this week on the Web design bible A List Apart lets us know the latest plans by Microsoft to embrace and extend the way HTML is rendered in Web browsers. Apparently in consultation with ALA boffins, Microsoft has agreed to implement a new meta declaration in the head section of HTML documents in their forthcoming Internet Explorer 8 release.

Using a simple meta declaration, we can specify the rendering engine we would like IE8 to use. For example, inserting this:

<meta http-equiv="X-UA-Compatible" content="IE=8" />

into the head of a document would make IE8 render the page using the new standards mode. This syntax could be easily expanded to incorporate other browsers as well:

<meta http-equiv="X-UA-Compatible" content="IE=8;FF=3;OtherUA=4" />

This is a great idea for developers, because they get to write once and then never have to edit their code again, no matter what new browsers are released. Never again will a new version of IE or Safari break their lovely site. Unfortunately, it sucks for users, mainly because participating browsers will have to bloat out to humungous sizes because they will have to include the rendering engines of all previous browser versions in order to be compatible with this new system. It also sucks for Mozilla, because part of their marketing message is that Firefox is the cleanest, smallest browser out on the market, and the inevitable bloat will blow that claim out if Firefox implements this system.

Microsoft caused this problem in the first place by not adhering to Web standards in previous versions of IE. Now they are trying to apply another band-aid over the suppurating wound, and they have enlisted a surprisingly self-serving ally in the ALA crowd. I would have thought ALA would be better than that. Developers should focus on developing standards-based code, minimising their use of browser hacks, and lobbying Microsoft to adhere to standards, not to cover their arses.

Monday, January 21, 2008

Flickr CC-BY attribution

I have fixed a problem on Tinfinger relating to photos uploaded from Flickr which are marked as Creative Commons licensed requiring attribution. As per the Flickr community guidelines, Tinfinger now links to the photo page instead of the source URL. Thanks Stuart Hamilton for putting me on the straight and narrow!

Sunday, January 20, 2008

Picks and shovels of the semantic Web want to be free

A lot of people have been asking me over the past week or so, during the beta launch of Tinfinger, what it is about and why we are doing it at all. At the same time, I have been watching a few strands of conversation across the blogosphere which have crystallised my answer to that question. So here it is.

The first hint came with the news the day before we launched that MetaWeb raised US$42 million in Series B funding, making a total investment of US$57 million and causing some industry incredulity. MetaWeb's Freebase is doing something that I hope to do with Tinfinger: creating a freely available semantic Web database of all the world's information (although with Tinfinger we're sticking to the people vertical).

MetaWeb's business model for their flagship product Freebase, stated as somewhat of a vague afterthought in their FAQ, is to charge large corporate users for access to that database, which is licensed as CC-BY meaning that it's free to use with attribution back to the source. Using CC-BY is sensible for some uses - indeed, Tinfinger will do the same for its data and 150-word profile articles - but to me it seems strange for MetaWeb because the economics are all wrong. It makes perfect sense for Wikipedia to use CC-BY, because although they don't allow money to change hands for the production of any of their content, the currency they operate in is PageRank, and CC-BY is arguably the finest PageRank-building mechanism known to man. If you are wondering why Wikipedia is in the top 10 results for just about every search term in Google, look no further than the CC-BY license, because they get links back from every page on the Web which reprints Wikipedia content, of which there are legion. But what does PageRank mean for MetaWeb and Freebase? Freebase is not a destination site. They have not shown the slightest inclination to build landing pages. They display no knowledge of SEO techniques. CC-BY is useless to them.

It is an old cliche that the people who make money out of gold rushes are those selling the picks and shovels. MetaWeb is endeavouring to be the goldpanning equipment vendor for the semantic Web, which is a respectable goal. But how can you turn a dime if there is a place next door which is giving away dynamite for free? Let us be honest about the origins of Freebase, Tinfinger, Google Base, Twine, Spock et al. All such attempts to build the semantic Web have used as the core of their proprietary/licensed database the freely available (or at least freely scrapeable) databases such as dbpedia, ISBNdb, IMDb, IBDb, ITDb, BASE, Cricinfo, all the way to Project Gutenberg. It is my opinion that the economics of the database industry are such that, eventually, most of the important databases will be made available for free online. After a somewhat moribund period in the 90s, storage hardware has been undergoing some very rapid Moore's-Law-style advancements this decade and it will not be long before we have highly affordable solid state drives which are Internet-ready. Cost will not be an issue. It's probably not really an issue right now anyway, it's just a matter of the politics of shoveling huge data silos like SEC filings out from behind corporate paywalls.

To my mind, on one side you have MetaWeb, LexisNexis, EDGAR Online and the rest of the cabal who are relying on siphoning micropaid profits from licensing of data when the semantic Web takes off. On the other side, you have... the entire Internet. How can the semantic Web take off when Big Companies are standing in its way? The Internet finds a way around. In this case, it finds how to create its own semantic database, which might not be perfectly crafted or 100% reliable, but in the words of Rich Skrenta "the cheap rickety thing wins in the end".

You may still ask, "But isn't that what Wikipedia is for too?" Wikipedia is a fabulous resource for prose text, but in the area of tagging, MediaWiki was not really built from the ground up to handle it to the extent that a fully-fledged semantic Web application would require. Wikipedia's tag system is ad hoc, bootstrapped and too prone to user error.

This is where I hope Tinfinger can help. I didn't talk up our tagging features very much at launch because the tag data in our system now is mostly adapted from Wikipedia and thus not of the greatest quality - as was rightly pointed out already - but I think that's where the eventual power of Tinfinger will lie, once we implement the full system.

I think it's only fitting that a site such as Tinfinger which builds on top of public domain data sources contributes back to the public domain to the extent that economics allow. We will try to publish as much of our structured data as possible in ways that can help you with your own projects, the same way as Wikipedia allows you to add instant content to your Web pages. With attribution, of course. ;)

Saturday, January 19, 2008

Geelong Advertiser story on Tinfinger

Apparently their Web site dude is a TechCrunch reader and he alerted one of their journos to our existence. Who knew!

The picture was very nice, the Addy have some excellent photographers. It's quite a good article, considering I was a but rushed during the interview and didn't get all I wanted to say out of my mouth. I guess I underestimated the tech savvy of our local rag!

Thursday, January 17, 2008

An open letter to Andrey Golub, Spock's #1 user

Andrey Golub, an Italian sysadmin Belarusian business analyst who is apparently the #1 user contributor to Spock, saw fit to engage me in a comment on the TechCrunch story on the Tinfinger launch. I presume he wasn't put up to it by the Spock boys, so I won't go hard, but I feel I need to defend myself and Tinfinger.

to Paul Montgomery

1) a general question- why it wasn’t enough Wikipedia? the project with proven record as a Web Encyclopedia, at least for those known and famous… It wasn’t enough Wikipedia for the NORMAL people, so here has arrived Spock. but nobody had problems I believe with the celebrities’ stories on the Web :)

2) about Spock:
- why did you decide that Spock searches ONLY on Social Networks? (as from your response to “Jason and antje”)
I think it was clear for everyone that Spock checks Web 2.0 for the Web 2.0 people, and of course it looks at the normal Web for the non Web 2.0 people!
a) as far as I know, wikipedia, the world’s largest DB about all known and famous people, have been already processed by Spock- that means Spock already has everyone listed on Wikipedia.

do you wanna say us Tinfinger will beat wikipedia first of all?

- you say this new Who’s Who can search the Blogs and so on- well, Spock can search everything that’s on the Web. If Google “knows something”- so also Spock will :)

do you wanna say us Tinfinger will beat Google after it beats Wikipedia?

hm… I have a doubt sincerely. About the mission first of all- it was probably enough Wikipedia for us + it was really needed to add the Web 2.0 part that has been done by Spock…

Well, good luck however!
Kind Regards,
Andrey Golub- find me on Spock

Okay Andrey, I sense that English is not your first language so I hope I understand your arguments correctly. I'll take your points in turn.

1. Why start Tinfinger when Wikipedia exists? You might as well ask why anyone would start any sort of publishing business. You might as well ask why start Spock when Google exists?! Just because there is a page about someone or something on Wikipedia doesn't mean it's the best possible page that anyone can write about that subject. In fact, many pages about people on Wikipedia are incomplete, inaccurate, stale or of poor general quality. I think it is possible to create an alternative which focuses more on quality, but still retains that openness for anyone to contribute. Plus, Wikipedia restricts itself to neutral encylopaedic articles, whereas Tinfinger will host many types of opinion articles, which makes the profile pages dynamic and worth visiting more than once over time.

2. I'm sure you know a lot more about Spock's systems than I do so I apologise for not acknowledging that Spock also has a general Web crawl. I read the other day that Spock has over 3 billion people data records. I didn't know that many people were on the Web... are there? ;)

2a. Okay, so Spock has incorporated Wikipedia data, as Tinfinger has. However, we have not and will not allow any Wikipedia articles to appear on our pages - we only used the names and some of the tags. I note that Spock includes the full Wikipedia articles about people on their relevant pages. That surprises me, because as I am sure the Spock boys know, that really kills a page's ranking in Google, because they mark PageRank down sharply for duplicated content. That leads me to believe that Spock doesn't care at all about ranking highly in Google - understandably so if the idea is to compete with Google - which leads to the question: how does Spock expect to grow traffic?

As for Tinfinger "beating" Wikipedia, yes, I would like for Tinfinger to eventually start beating Wikipedia in Google rankings for certain keywords, specifically names, and thus start beating them in traffic to those kinds of pages. It's a long-term goal of ours, but I think it's achievable if we concentrate on quality.

On Tinfinger beating Google: no. That's not our strategy at all, we say up front that we are not a search engine. Tinfinger is a human omnibus, a collection of articles from everywhere. People will find out about those articles through search engines like Google, and whatever comes after Google. We are a firm Google partner.

Finally, on the subject of the relative power of our search functions, I freely admit that our news and blog search is nowhere near as comprehensive and powerful as Spock's sounds like, from your words. However, neither is Techmeme's, and Gabe Rivera does alright with Techmeme. Our news and blog search powers 650 Techmeme-like headline pages, and that's all we need it for: a specialised product for a specialised use.

I wish you and the Spock boys luck in your efforts, Andrey. I appreciate your fervour in advancing the Spock bandwagon. I only hope that our users can become as passionate about Tinfinger.

Tuesday, January 15, 2008

Tinfinger beta starts today

Today marks the beta launch of Tinfinger, after two years of off and on development by myself and Tai Tran. It's a very exciting day, if a bit sleepy because we timed it for 9am Californian time, so it's 4am local time here in Geelong! The launch press release:


Tinfinger changes the rules of Web creation

Human omnibus launches with user-debt strategy breaking new ground between Wikipedia and Squidoo/Mahalo/Knol

(Geelong, Australia, 15 January 2008) -- Fans of famous people will have a new place to share their fandom with today's beta launch of Tinfinger.com, a human omnibus. The site combines user-authored encyclopaedic profile pages of famous people with a news and blog search engine based around mentions of those peoples' names, which are aggregated into frequently-updated front pages for 650 categories. Tinfinger is intended by its two Australian co-founders, Paul Montgomery and Tai Tran, to become the primary resource for information about famous people on the Web.

"Tinfinger will be to the Who's Who what Wikipedia was to the Encyclopaedia Britannica," said Montgomery. "The Web is ready to move beyond the hyperlink as the only way to cluster and rank Web content, and we're going to try using names instead."

Tinfinger has not taken funding, which led Paul and Tai to devise a new business model: going into debt to its users. Contributions to Tinfinger will be paid for not with cash but with Google AdSense impressions (AIs), so that Google pays Tinfinger users for ads that Google puts on the Tinfinger site. It is expected that the rate of page production and AI payments will outstrip ad inventory at first as traffic to the site builds gradually, so that Tinfinger will start with a debt owed to its users, payable out of its future page views. More on the AI system below.

"This is an innovative business model built on necessity. We hope those who choose to participate develop a sense of ownership not only over their own contributions, but over Tinfinger as a partner in an ongoing contract," said Montgomery. "There are plenty of people out there who would like to meet other fans of their favourite people, and we hope to create a way for them to share their passions."


1. A database of famous people - famous meaning that they are mentioned in news or blogs related to a newsworthy issue - which classifies people both via a top-down category structure and a flat tag structure. Tags on Tinfinger will be expressible as RDF triples (subject-predicate-object, as opposed to subject-object). At beta launch time, existing people and their tags are not editable by users, but registered users can submit new people.

2. Profile pages about famous people, featuring articles and pictures submitted by users using WYSIWYG content authoring software. A collaboratively-authored profile article of around 150 words for each person, similar in function to a stub profile on Wikipedia, will be released by Tinfinger under the Creative Commons license. Each profile page can also include many other types of copyrighted single-author articles: biography, review, interview, encounter, comparison, praise, criticism, etc.

3. Clustered news aggregation with "front pages" for 650 categories, with a design familiar to readers of Google News, Topix or Techmeme. Tinfinger operates its own news and blog search engines, and snippets from these sources are collated into clusters based on mentions of each famous person's name. Tinfinger does not use links or semantic connections to cluster; just names, using a publicly available algorithm called tinscore. Higher-level category news pages include people from lower level categories, so that for instance the Africa category contains stories about people from all African countries, and the Internet category contains stories about people from Search Engines, Web 2.0, Web Advertising, Voice Over IP, Broadband and so on. Users can submit new sites, and the list of sites indexed for each category is available as an OPML reading list. The news and blog searches (as well as articles and pictures) each have their own RSS feeds and also can be published to other sites using a widget.

4. Social networking software from PeopleAggregator to enable user interaction and feedback. There are Tinfinger-controlled groups for each category, and users can create their own groups.


In the AI system, users will be rewarded for writing articles by their AdSense publisher IDs being put on Tinfinger pages. This is not a new thing by itself, but existing systems involve giving users a percentage of page impressions next to their articles. Tinfinger will debit the user's account with a fixed number of AIs for each article, which will then be gradually paid out of traffic on Tinfinger pages - not next to their articles, but from general site ad inventory. (The category headline and profile pages will be reserved for Tinfinger.) The starting rate for articles will be 10,000 AI. It is unknown what CPM rates that ads on Tinfinger pages will attract, but Tinfinger is aiming for at least US$1 CPM, implying a base payment per article of US$10. The AI figure each article actually earns will be highly changeable based on various quality and editorial factors, which are determined by Tinfinger based on published rules, and can also be boosted if Tinfinger places temporary "bounties" on articles about particular types of people. This approach will likely lead to a significant AI debt which Tinfinger will owe for many months. That will be part of our partnership with users.


- The co-founders and only employees are Paul Montgomery, a former technology journalist, and Tai Tran, a former corporate programmer. Both live in the Australian city of Geelong, which is 100km southwest of Melbourne. Paul blogs at http://tinfinger.blogspot.com and has blogged quietly about many Tinfinger features already.
- Development of Tinfinger began more than two years ago.
- Tinfinger has taken no funding, and is not currently in the market for funding. It will effectively be funded through debt owed to its users.
- To discourage spam, the social network portion and external links from profile pages will be labelled with nofollow.
- Tinfinger will consume OpenID logins.
- Database figures at launch: 404,000 people, 395,000 snippets, 2,700 pictures, 612,000 tags, 820 sites... 10 articles. Most people records were adapted from dbpedia and IMDb; most tags were from dbpedia. The people database is lumpy, with some categories containing very few people as yet.
- Tinfinger's mascot is a robot called Ned, who bears a worrying resemblance to Sidney Nolan paintings of the Australian bushranger Ned Kelly.
- The Tinfinger news algorithm, called tinscore, is detailed at http://tinfinger.blogspot.com/2005/12/tinscore-and-other-ways-to-clone.html


What to do at Tinfinger


Thursday, January 03, 2008

Tinfinger beta launch date

AEDT: Tuesday 15 January, 4am.
US PST: Monday 14 January, 9am.