Thursday, April 12, 2007

The semantic Web: sentenced to life in a federated plenipotentiary

One of the last tasks to be completed before we open up Tinfinger to real users in our beta is how we handle our tag structure. Tags are used in Tinfinger to record any kind of information about people. We've tried a few different approaches, and they have been clumsy and unworkable. The addition of Wikipedia's structure via dbpedia, however, has been the catalyst for a solution - though not the solution itself.

In researching how to use dbpedia, Clay Shirky's repudation of the Semantic Web stood out to me as a cautionary note to be reckoned with. Clay even uses an example central to Tinfinger, people's names, along the way to dismantling the usefulness of the simplicity of the semantic Web concept.

My understanding of Clay's argument as it applies to me is this. dbpedia uses the W3C's specification for N-triples. Take the W3C's first example of a triple:

<http://www.w3.org/2001/08/rdf-test/> <http://purl.org/dc/elements/1.1/creator> "Dave Beckett" .

The three elements are subject, predicate and object, which are the building blocks of grammatically correct sentences. You can write that in plain English as: "The creator of this document is Dave Beckett." Ah, but there is another creator also listed below, so that sentence is wrong. It should be: "The creators of this document are Dave Beckett and Jan Grant." Without the second triple, the first one leads to error. XML is reductive by nature, which leads to syllogisms and thus can result in absurd and/or wrong deductive conclusions when the data set is not complete.

So is the solution just throwing more and more tags until every bit of metadata is covered? That's the approach by places like Spock, recently gushed over by TechCrunch. I disagree. Apart from the fact that it looks damn ugly, I think tags are not an endpoint for giving understanding. The problem is in the strictures put on comprehension of information imposed by the W3C's spec. Having just one concept in your subject, predicate and object is limiting. Imagine if every sentence you read in a profile of somebody went like this:

Bill Clinton was a president of the United States. Bill Clinton is a womaniser. Bill Clinton is a disbarred American lawyer. Bill Clinton is a great leader. Bill Clinton is a saxophonist.

You get the idea. What about compound sentences? Sure, you have nouns and verbs, but what about adjectives, adverbs, prepositions and conjunctions? XML-based tagging leaves no scope for complex ideas expressed with nuance and juxtaposition. That's what prose is for. For semantic metadata to be expressed usefully, I think it should be encapsulated in prose sentences.

That is what Tinfinger profiles will be: a collection of sentences which will first be assembled by our robot mascot Ned as generated from simple metadata triples, to the best of his admittedly limited ability, but later incorporated by human authors into human-comprehensible prose which nevertheless maintains that W3C-approved tag structure. Instead of infoboxes and templates for tabulated datums, as Wikipedia uses, Tinfinger will focus purely on the sentence as the primary method of communication.

Thus organised into barely submerged paragraph structures, we hope the resulting hypertext will bridge the gap between the illogical flaws of metadata and the chaotic echolalia of human-authored prose text, so that both spiderbots and meatbags can grok.

1 Comments:

Blogger Unknown said...

Paul:

Would drop you an email - but can't locate contact info.

Take a look at http://sws.clearforest.com and http://optevi.net/newstracker. May be a good way to jumpstart some of your efforts.

4:09 am, April 14, 2007  

Post a Comment

<< Home