Archive

Archive for February, 2008

Semantic Web Search Engine Roundup

February 27th, 2008

Unlike traditional search engines, which crawl the Web gathering Web pages, Semantic Web search engines index RDF data stored on the Web and provide an interface to search through the crawled data. Below is a list of Semantic Web search engines that are currently under development.

Semantic Web Search Engine (SWSE)
SWSE is a search engine for the RDF Web on the Web, and provides the equivalent services a search engine currently provides for the HTML Web. The system explores and indexes the Semantic Web and provides an easy-to-use interface through which users can find the information they are looking for. Because of the inherent semantics of RDF and other Semantic Web languages, the search and information retrieval capabilities of SWSE are potentially much more powerful than those of current search engines. SWSE indexes RDF data from many sources, including OWL, RDF and RSS files. RSS2 is converted to RDF and they will be adding GRDDL sources soon. Developed by DERI Ireland.
Sindice
Sindice is a lookup index for Semantic Web documents built on data intensive cluster computing techniques. Sindice indexes the Semantic Web and can tell you which sources mention a resource URI, IFP, or keyword, but it does not answer triple queries. Sindice currently indexes over 20 million RDF documents. Developed by DERI Ireland.
Watson
Allows you to search through ontologies and semantic documents using keywords. At the moment, you can enter a set of keywords (e.g. "cat dog old_lady"), and obtain a list of URIs of semantic documents in which the keywords appear as identifiers or in literals of classes, properties, and individuals. You can also use wildcards in the keywords (e.g., "ca? dog*"). Developed by KMi, UK.
Yahoo! Microsearch
Microsearch is Yahoo!'s stab at Semantic Web search and provides a richer search experience by combining traditional search results with metadata extracted from Web pages. Indexes RDF, RDFa and Microformats crawled from the Web. Microsearch will soon be adding support for GRDDL.
Falcons
Falcons is a keyword-based search engine for the Semantic Web, equipped with browsing capability. Falcons provides keyword-based search for URIs identifying objects and concepts (classes and properties) on the Semantic Web. Falcons also provides a summarization for each entity (object, class, property) for rapid understanding. Falcons currently indexes 7 million RDF documents and allows you to search through 34,566,728 objects. Developed by IWS China.
Swoogle
Searches through over 10,000 ontologies. 2.3 million RDF documents indexed, currently including those written in RDF/XML, N-Triples, N3(RDF) and some documents that embed RDF/XML fragments. Currently, it allows you to search through ontologies, instance data, and terms (i.e., URIs that have been defined as classes and properties). Not only that, it provides metadata for Semantic Web documents and supports browsing the Semantic Web. Swoogle also archives different versions of Semantic Web documents. Developed by the Ebiquity Group of UMBC.
Semantic Web Search
Powered by RDF Gateway, Intellidimension's proprietary platform for Semantic Web applications and agents. Developed by Intellidimension Inc.
Zitgist Search
The Zitgist Query Service simplifies the Semantic Data Web Query construction process with an end-user friendly interface. The user need not conceive of all relevant characteristics - appropriate options are presented based on the current shape of the query. Search results are displayed through an interface that enables further discovery of additional related data, information, and knowledge. Users describe characteristics of their search target, instead of relying entirely on content keywords.

Got something to say? Leave a comment!

English , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

The Calais Initiative Looks Back on Its First Month

February 26th, 2008
Open CalaisThe Calais Initiative is almost one month old, and they've already received a large and welcoming response from the development community (1,113 early adopters)! When they weren't busy doing interviews or answering hundreds of emails and forum posts, they were coming up with ways to help spread the technology. They will soon be releasing a Wordpress plugin, followed by plugins for Drupal, Plone and other content management systems. They also express that Calais is not only good for named entity extraction, but can extract other facts from documents. An example they give is "what technologies are associated with what company in a document?" Good luck, Calais team! Got something to say? Leave a comment!

English , , , , , , , ,

True Knowledge: The Natural Language Question Answering Wikipedia for Facts

February 26th, 2008
True KnowledgeTrue Knowledge is a natural language search engine and question answering site, but to leave it at that would not do the site justice. What makes it stand out from similar sounding services like Powerset and Freebase? True Knowledge tackles natural language search and question answering (much like Powerset and Hakia), and it also maintains a knowledge base of facts about the world (similar to DBpedia and Freebase). However, what makes True Knowledge stand out is that they've combined these features and encourage their userbase to contribute facts and add new knowledge.

A brief overview of True Knowledge

True Knowledge has combined their technologies to create something that doesn't easily fall into any one category. In fact, you can categorize it as all of the following:
Question-Answering site
You can ask questions about any subject and get a direct response. Unlike human-powered Q&A sites, you don't need to wait for someone to respond. The computer answers your question using knowledge stored in a form it can comprehend, and isn't just regurgitating text that it doesn't understand. For this reason it can answer questions it hasn't seen before and can combine knowledge through a process of inference and cross-referencing stored information to produce a reasoned answer.
Natural language search engine
True Knowledge also returns search results like a standard search engine, however not without first passing it through their natural language technology. Your query may be a standard question; even if it isn't, they may be able to work out what you are looking for and give you the answer directly. Because of the way facts are assessed you can enjoy a high degree of confidence that any information they retrieve will be accurate (unlike information on any single Web page). You aren't limited to properly constructed questions, you can also use the typical two and three word "keywordese" queries that many search engine users are accustomed to. Where what is typed is just the name of an entity, their technology can produce a small information screen giving core information about the entity (as well as search engine results).
Wikipedia for facts
The knowledge in their system comes from two main sources: information they import themselves from various sources (such as the CIA Factbook) and facts added by their userbase. A big part of their technology is enabling users to add knowledge without having to have any technical understanding of the underlying computer processes. Unlike Wikipedia, where the knowledge in each entry is buried in natural language, True Knowledge stores each piece of knowledge as a discrete fact that can be reasoned on. Once a fact has been established with enough evidence it can't be easily changed. Furthermore, facts that contradict this knowledge are also automatically prevented, which helps the system deal with vandalism.
"Universal database"
With a typical database-driven application the developers sit down and create a schema. They then write code which manipulates and processes the data in that schema and when the application is finished this code is run by users. The knowledge that such a system can process is extremely narrow and remains so because nothing that happens after launch expands the scope of the application. Users may add data to the tables but the schema remains fixed. True Knowledge is like a database application except that everything in it is amenable to expansion by users. The scope of the knowledge that it can store expands every time a user adds a new class, relation or attribute; and knowledge about every conceivable entity can be put into the system and be used to answer questions.
In short, they've created a platform for representing the world's knowledge in a form that is clear and accessible to humans, as well as being comprehensible to computer.

Information about their architecture

At the heart of the True Knowledge system is the Knowledge Base - a huge database of facts on any topic represented in a form that can be processed by computer. Facts are also inferred by the Knowledge Generator, either using Knowledge Base facts, other generated facts or external feeds of knowledge. Users can ask questions through a browser interface and those questions are translated via Natural Language Translation into queries expressed in the True Knowledge query language. Their technology has the ability to disambiguate ambiguous questions, including removing interpretations of questions that are unlikely. Questions can also be abbreviated to two or three ("keywordese") words and still be understood - similar to typical keyword search terms. Their question answering system uses the Knowledge Base and generated facts to answer queries. The API provides an alternative interface to the question answering system from remote computers. System Assessment further processes existing facts in order to maintain semantic consistency of knowledge. For example, facts can be marked as untrue if they are contradicted by other facts. The browser interface provides a means for users to assess the validity of facts (User Assessment), enabling them to endorse or contradict particular facts. A user's reputation and track record is used to automatically weight this information. In combination with System Assessment this prevents the back-and-forth battles that are common on Wikis. The Knowledge Base grows through Knowledge Addition, either from users via the browser interface, or imported in volume from external sources. A key design decision is that all components are extendable by users. In addition to users adding facts, they can also extend the questions that can be translated into whole new areas and even provide new inference rules (and even executable code for steps that involve calculation) for the Knowledge Generator.

True Knowledge API

No service such as this would be complete without an API! They say their API can execute any query you supply it with, however they are in the process of releasing a series of API services. These simple services encapsulate areas of knowledge which are well served by their current Knowledge Base. All these services can be accessed via the same query interface using a single account. Click on the names of the services below to test each one!
IP Geolocation
Converts an IP address to a probable geographical location of an internet user (e.g. the user of a website). This geographic knowledge can then be used in subsequent queries to retrieve further relevant facts about the location from the Knowledge Base: including the user's likely language, preferred currency, local time etc.
Local Time
Identifies a place either from an IP address obtained automatically or from a supplied string denoting the place and obtains a local time either now or at some past or future time. Possible applications included an online or phone conferencing system wanting to inform the participants about the date/time of the meeting in their local time zone.
Name-to-Gender
Takes a personal name (first name or full name) and returns the gender inferred by the system for that name. The system applies certain heuristics to a string representing a person's name in an attempt to judge the gender of the person. If the gender can be determined with reasonable probability, then it will be returned. This service would be useful to, for example, a social networking site wishing to use gender-specific language about a user whose name, but not gender, was known.
Email-to-Name
Takes an email address and returns the forename inferred from its local-part (if a name can safely be inferred). Businesses with access to users' email addresses but not names could use this to address emails more personally. This service can be combined with the Name-to-Gender service to infer a person's gender from his/her email address.
Trading Day
Takes a point in time and a geographical location and returns 'no' if it is a weekend day or a public holiday in the location and 'yes' otherwise.
Location-to-Language
Returns a language which can be read by a significant number of people at a location. True Knowledge has complete coverage at the national level and partial coverage for smaller areas. This can be used in combination with the IP Geolocation service to decide which language(s) are appropriate when displaying websites to international users, for example.
Telephone Number-to-Location
Returns the geographical location of the specified landline telephone number.
Don't worry, the road doesn't end there. True Knowledge says they are currently working on even more services to add to this list.

Adding knowledge to True Knowledge

Time for some hands-on stuff! What do True Knowledge and Jurassic Park have in common? Nothing as far as I'm aware of. However, I am going to show you step-by-step how I taught True Knowledge something it didn't know. To be more specific, I'm going to show you how to add new knowledge from start to finish and then how to expand on it. Because True Knowledge seems to update itself in real-time, I was able to see the fruits of my labor right away. Not having to wait for an index to rebuilt made the task of adding knowledge feel more worthwhile. After playing with a few test queries I tried to find something it didn't know anything about. I asked "who is the author of jurassic park?", which returned the response "I don't know" and a more detailed explanation:
It sounds like "jurassic park" may be a thing that is published that I don't currently know about. If you want, you can add the thing that is published called "jurassic park" to the Knowledge Base.
Incidently the search results that appear along the side the answer are pretty relevant. The first result contains the answer to my question. By chance, the title is exactly my answer. Clicking the link took me to a screen that asked me to enter the most common name for "a thing that is published." I entered "Jurassic Park." They do ask that you don't enter information about fictional things (e.g., unicorns). I had to think for a moment if Jurassic Park is considered a fictional thing in this context. I came to the conclusion that Jurassic Park is not fictional in the sense that it is both a literary work and the title of several movies so I clicked Submit. After a quick look at the confirmation page I was ready to proceed. I should note that there are several confirmation pages along the way. If you're comfortable enough with the process you can disable each confirmation page individually by checking the box that says "Don't show me this confirmation page again." Next I was presented with a possible Wikipedia match and a helpful extract from the page. I was satisfied that the Wikipedia entry presented to me was indeed talking about the very same Jurassic Park so I clicked continue. The next screen asked me if I knew anything that Jurassic Park is that is more specific than a "thing that is published." It was trying to figure out the name of the class of things Jurassic Park belonged to. I clicked yes, entered "movie" and clicked submit. True Knowledge is already aware of what a movie is and asks me specifically if what I meant was "movie (connected cinematic narrative)." Satisfied that I had my match I clicked submit and continued on. This is where I thought things got interesting. The next screen asked me to be more specific about what kind of movie Jurassic Park is and gave me the following options to choose from:
  • Made for TV movie
  • Made for video movie
  • Big screen movie
Since we all know Jurassic Park was a major motion picture I chose "big screen movie" and clicked select. Alternatively if I didn't want to choose any of those refinements (e.g., if they didn't apply) I could simply click Yes and proceed with Jurassic Park labeled as a "movie." The next screen asked me to enter a phrase that could be used instead of Jurassic Park in all circumstances. Basically they were asking for a short but descriptive phrase that makes it absolutely clear what Jurassic Park is. They give a few examples such as "France, the Republic of France" and "Star Wars, the 1977 adventure action sci-fi movie Star Wars." Going off the Star Wars example I entered "Jurassic Park, the 1993 movie about dinosaurs" and clicked submit. I was then asked to confirm that the phrase I entered was an unambiguous way of saying Jurassic Park, which would be recognized by anyone wanting to say something about that big screen movie. After confirming a few points about the ambiguity of my phrase I clicked Yes. I was then asked to enter a few alternate names. I entered "JP" (the US promotional title) and "Jurassic Park 1" (a common way of referring to the original movie after the sequels were released). Next I had to enter a unique, human readable ID. The page informed me that [jurassic park] was available and auto-populated that value for me. I certainly couldn't think of a better ID so clicked submit. After submitting the ID I was presented with a list of facts that the system had gathered from the information I entered. Reading through the list of facts you can see how each step along the way input the information into True Knowledge. I am listed as the source for each fact because I have not specified any other sources. Luckily I am able to do that at the bottom of the page. As I want this information to be trustworthy, I included a trustworthy source: The IMDB entry for Jurassic Park. I entered the URL for the entry on IMDB and clicked add new source. This took me to a mini-process of adding a document stored in a remote system (i.e., a Web page). I clicked OK to start the process. The next screen asked me to verify that the contents below were what I was expecting. Everything checked out so I clicked confirm. Now that I have a new source available to me (the IMDB page) I changed the source where appropriate. Once I had the sources set I clicked add these facts to finish up the process of adding new knowledge. All done! Clicking on OK will take you to a page with your new entry. The page has a few links for adding more information that would be relevant to the entry. I wasn't done yet since I still couldn't answer the question "who is the author of jurassic park?" Of course now I have a whole new problem, I told the system that Jurassic Park was a movie, not a literary work. We'll see how the system handles this. On the add knowledge page I selected "add a new fact." On the add a fact page I was given three textboxes to enter a (subject,object,predicate) tuple about anything. Since I want to enter the author information for Jurassic Park I entered "Michael Crichton" -> "is the author of" -> "Jurassic Park" and clicked submit. The next screen actually informs me that the system is already aware of Michael Crichton, the American author born in 1942. Since we're both talking about the same person I clicked submit. On the fact confirmation page that followed I was given the option to go ahead and add the fact as-is or to change the left or right part of the fact (the subject or object). Although the proper course of action would have probably been to create a new entry in True Knowledge for the literary work Jurassic Park, I wanted to see if the property "author" could be applied to an instance of class "movie." I also wanted to determine whether or not something can belong to multiple classes ("book" and "movie"). I chose to add Michael Crichton as the author of Jurassic Park (the movie), and clicked Yes. When it came time to list sources I told it that I was not the source, and I listed the Wikipedia entry for Jurassic Park and went through the two-step process of adding a Web page. Now True Knowledge knows about Jurassic Park (the 1993 movie about dinosaurs) and Michael Crichton, the author of Jurassic Park (the literary work). It should be noted that True Knowledge is under the impression that Michael Crichton is actually the author of the movie Jurassic Park. I tried my original question and this time I got a direct answer, including how it came to that conclusion. So you can apply an author to a movie. It feels weird to me that you can do that, because I don't feel you can be the "author" of a movie (rather, the movie's script and screenplay). Back on the add a fact page I tell True Knowledge that "Jurassic Park" -> "is a" -> "book." This time around I'm given three options of what a book might be. I chose the last option, "book (a written work intended to be published as a set of pages bound together on one side)" because I felt it was the best definition of what a book is. After confirming the fact and adding my source (Wikipedia again) I am informed that "Jurassic Park is a book" contradicts previously inserted facts. In this case, it is apparent that a movie cannot also be a book. In the end the fact did not get added because it contradicts an existing fact in the system. Today was just my first day, so I'm sure I'll get better at this.

My first impression of True Knowledge

I found my first experience with True Knowledge very satisfying! The user interface is simple and it's hard to get lost trying to do something new. They are still in beta, and as such they still have some polish to apply before the general public is let in, but the product is solid and I can't wait until more users are let in the gates. I'm interested to see how it will prevail over similar services. Components of True Knowledge compete with many semantic services (Freebase, Hakia, Powerset, DBpedia, etc) and even non-services like Cyc. I am of the opinion that True Knowledge has the winning combination of each approach. Got something to say? Leave a comment!

English , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Nationalties in DERI

February 25th, 2008

People say it’s impossible to feel foreigner in DERI, and they are right. Now there are 27 nationalities here, nothing more, nothing less.

Nationalties in DERI

And people has also created a map with all DERIans.

English, Spanish

My Commentary: Radar Networks Raises $13M for Twine

February 25th, 2008
I am pleased to announce that my company Radar Networks, has raised a $13M Series B investment round to grow our product, Twine. The investment comes from Velocity Interactive Group, DFJ, and Vulcan....

English

HBase User Group & Update

February 22nd, 2008

HBase has been gaining a lot of traction since our last post. Hadoop was promoted to a top-level Apache project and Yahoo announced that it has the world’s largest Hadoop production application. HBase is now a full sub-project of Hadoop. Also, other companies like Rapleaf are using HBase in production. After the success of the HBase tech presentation at Rapleaf HQ, Powerset has decided to organize a second Hbase User Group meeting. If you’re using HBase now or evaluating HBase as a data store, join us to network with other database geeks and give suggestions and feedback to the Hbase core development team. The User Group will meet on Tuesday, March 4, from 5:00-7:00 p.m. at the Powerset HQ. Just register on the event page at Upcoming so we can plan how much food and booze to stock. If you’re interested in HBase or Hadoop and can’t make it to the User Group meeting, there’s a Hadoop Summit at Yahoo on March 25. Powerset will be doing a presentation on HBase and Rapleaf will be showing off their application.

English , , , , ,

302 Semantic Web Videos and Podcasts!

February 21st, 2008
A lot of you emailed me asking where to find more videos, so I'm delivering the goods. I've expanded the previous list from a paltry 17 to a remarkable 302, and I've included podcasts this time! There were so many videos I had to break them up into different categories for easier skimming. There are no duplicates, however I did place some videos into more than one category when I felt it was appropriate. This list is monstrous, enjoy.

Introductions (videos)

RDF (videos)

Ontologies / Ontology Development (videos)

Web Services (videos)

Annotation (videos)

Semantic Desktop (videos)

Interviews (videos)

Information Extraction / NLP (videos)

Search (videos)

Social Semantic Web (videos)

Uncategorized (videos)

Podcasts

Got something to say? Leave a comment!

English , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Using Sindice to get the best URI for a person

February 19th, 2008

One of the open issues that we still had in SWAML was how to search the best document where we can find more information about subscribers of a mailing list. In the most part of cases we only have his name and his email address. So we needed to use an InverseFunctionalProperty, such as foaf:mbox_sha1sum, in order to find more information. That was one of the main reasons because Iván and I started with Futil, but unfortunately we never finish that project… at least we discovered a lot of people and it was a very funny hacking experience.

These days I had the opportunity to meet part of the team of Sindice project. Sindice is a lookup index for Semantic Web documents that indexes the Semantic Web and can tell you which sources mention a resource URI, IFP or keyword. For more information read their paper.

I decided to used it (I see that I’m not alone), and I’ve written a client in python for its API to test it. The project works well (see a query over the sha of my email), but the order of the results it is not the best in many cases (for example my URI is the last one in that query, although all result are good). I was talking with the developers (thanks Richard!), but actually the project only uses retrieval techniques to assign a score to the results.

Then we have some good documents where there are more information about our subscribers, but we need to choose the best one. So I think that we can apply a simple SPARQL query, and get the first result where our person is the foaf:primaryTopic:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person
WHERE {
  <http://.../file.rdf> foaf:primaryTopic ?person .
  ?person rdf:type foaf:Person . 
  ?person foaf:mbox_sha1sum "..."                                    
}

It could not be very efficient, I know it, but it works. And now it’s implemented in SWAML :-)

English, Spanish , ,

Semantic Focus Community Update (18-Feb-2008)

February 18th, 2008

I've been hard at work on updating Semantic Focus, both from an articles standpoint as well as adding new features. Although larger, more hush-hush projects (Semantic Web/NLP related) loom on the horizon, I'd like to share with you a few changes you may have noticed around the blog, and how they are especially of benefit to our guest writers and other members of the Blogosphere.

Authors by activity (post count)

Authors of Semantic Focus articles are now being displayed on every blog page in the right sidebar under the heading "Authors by activity (post count)." The authors are arranged by the number of articles he or she has published. This is both for the benefit of the good people that have taken time to deliver fresh content to the Semantic Focus audience, as well as foster new guest authors to come forward and submit their material.

The true benefit for authors is the site-wide non-nofollow link to the author's homepage. This gives them exposure, traffic, and Googley weight. Thanks guys!

Latest blog mentions

Another "giving back" feature is "Latest blog mentions," also in the right sidebar of all blog pages. Using the Technorati Cosmos API, links to Semantic Focus are automatically indexed every few hours (by a trusty Python script), and those that are approved (manually to check for spam) appear in the blog mentions section. Like the author links, blog mentions do not carry the rel="nofollow" tag. This has the benefit of giving traffic and weight back to those who give exposure to this community.

Planet Semantic Focus

Quietly I've been tinkering with a blog indexing system of my own design, written in Python (lots of PyLove today). The blog indexer is currently working behind the scenes to gather more, and better, Semantic Web content from around the Web. I'm still experimenting with combining results from Technorti, Flickr and other sources. When the system is running smoothly and reliably I will change the Planet Semantic Focus index over to the new and improved one. Stay tuned!

Got something to say? Leave a comment!

English , ,

A Note to Fans of This Blog — Thanks for your Comments and Emails

February 12th, 2008
To all my readers -- and especially to those of you who have commented or sent me emails -- I just wanted to say thanks! I don't always have time to reply, but I always read everything, and I do try...

English

Video of My Semantic Web Talk

February 12th, 2008
This is a video of me giving commentary on my "Understanding the Semantic Web" talk and how it relates to Twine, to a group of French business school students who made a visit to our office last...

English

Birthdays – XML is 10 and RDF/XML is 9

February 10th, 2008

Happy 10th Birthday XML.

It’s clear you are going to be around for some time. People know your good points and bad and have got the kinks worked out using you in production, in diversity and at scale.

Take care not to be distracted in the next 10 years by sexy new text formats that overlap in some features, but don’t replace you for many uses. I’m talking about you, JSON.

In the RDF world, RDF/XML is the syntax people love to hate, or just love/hate. It is 1 year younger than you, so maybe in February 2009 we’ll have something to celebrate about that. Yeah, it might happen :)

I recently made a new textual RDF syntax sibling Turtle with TimBL whose official birthday was last month, although it’s actual birth was January 2004 in Bristol, or earlier if you look into it’s ancestry. In 6 (10?) more years it’ll be something we can properly rely on, like XML is today.

Dave

P.S. For more memories, check out Tim, Eve and Norm who were involved in XML from very early on when I was just an observer.

English, comment

Birthdays - XML is 10 and RDF/XML is 9

February 10th, 2008

Happy 10th Birthday XML.

It’s clear you are going to be around for some time. People know your good points and bad and have got the kinks worked out using you in production, in diversity and at scale.

Take care not to be distracted in the next 10 years by sexy new text formats that overlap in some features, but don’t replace you for many uses. I’m talking about you, JSON.

In the RDF world, RDF/XML is the syntax people love to hate, or just love/hate. It is 1 year younger than you, so maybe in February 2009 we’ll have something to celebrate about that. Yeah, it might happen :)

I recently made a new textual RDF syntax sibling Turtle with TimBL whose official birthday was last month, although it’s actual birth was January 2004 in Bristol, or earlier if you look into it’s ancestry. In 6 (10?) more years it’ll be something we can properly rely on, like XML is today.

Dave

P.S. For more memories, check out Tim, Eve and Norm who were involved in XML from very early on when I was just an observer.

Uncategorized , ,

Conference season 2008

February 6th, 2008

JFK-SAN-AUS-SFO

The March 2008 US conference season is nearly upon us. I’m just on my way back from representing Dopplr at Social Graph Foo Camp (find out more by listening to the Citizen Garden Podcast I participated in after the camp), but I’ll be back here again in three weeks.


I’m spending a few days in New York, where I’ll be hosted by the lovely Chris Shiflett, and then it’s on down to San Diego for ETech. That’ll be swiftly followed by SXSW Interactive where I’ll be on a panel entitled “Creative Collaboration: Building Web Apps Together”, about working in multidisciplinary teams. Finally, a week in San Francisco decompressing and having a few meetings.

I’m particularly excited by the trip to ETech. The last two years have brought smart people together to talk mostly Web 2.0 topics, but this year looks significantly more awesome. Full of genuinely emerging technology, the lineup looks like one Matt Jones and Tony Stark would appreciate.

Some highlights for me include a talk from Google’s economics groups on Prediction Markets, Computing for Socio-economic Development, and the excitingly-titled Antigenic Cartography: Visualizing Viral Evolution for Influenza Vaccine Design. Hope I see you there.

English, Uncategorized

Conference season 2008

February 6th, 2008

JFK-SAN-AUS-SFO

The March 2008 US conference season is nearly upon us. I'm just on my way back from representing Dopplr at Social Graph Foo Camp (find out more by listening to the Citizen Garden Podcast I participated in after the camp), but I'll be back here again in three weeks.

I'm spending a few days in New York, where I'll be hosted by the lovely Chris Shiflett, and then it's on down to San Diego for ETech. That'll be swiftly followed by SXSW Interactive where I'll be on a panel entitled "Creative Collaboration: Building Web Apps Together", about working in multidisciplinary teams. Finally, a week in San Francisco decompressing and having a few meetings.

I'm particularly excited by the trip to ETech. The last two years have brought smart people together to talk mostly Web 2.0 topics, but this year looks significantly more awesome. Full of genuinely emerging technology, the lineup looks like one Matt Jones and Tony Stark would appreciate.

Some highlights for me include a talk from Google's economics groups on Prediction Markets, Computing for Socio-economic Development, and the excitingly-titled Antigenic Cartography: Visualizing Viral Evolution for Influenza Vaccine Design. Hope I see you there.

Permalink

Uncategorized , , , , , , , , ,

A Universal Classification of Intelligence

February 5th, 2008
I've been thinking lately about whether or not it is possible to formulate a scale of universal cognitive capabilities, such that any intelligent system -- whether naturally occurring or synthetic --...

English

The Best Political Ad. Ever.

February 2nd, 2008
This video in support of Obama is the best piece of political advertising I've ever seen. And here is a funny parody of a response from the McCain camp (via Bram)

English

‽CAMBIAD! La realidad en elearning, conferencia de Stephen Downes

February 2nd, 2008
Os dejo, casi en directo, traducci‽n y resumen de la presentaci‽n de Stephen Downes en Ontario hace un par de d‽as (30/01/2008). Es de lo m‽s interesante que he le‽do ‽ltimamente.

delicious

Bee Node Deconstructed

February 1st, 2008

As with my first "FOAF tale", "Joe Triple" yesterday's story "Bee Node" was intended as more than an exercise in punning. The original story was intended to help illustrate a few aspects of Semantic Web technology which I think I worth drawing attention to. But this time around the focus is mainly on SPARQL rather than on RDF modelling and ontologies.

The SPARQL queries in the story illustrate a general pattern of interaction that I expect will become common in clients accessing data via SPARQL endpoints.

This pattern is: ASK, DESCRIBE, CONSTRUCT which I'll call "ADC" from now on. What the ADC pattern provides is a way to probe a remote data set to see if it has information that is of interest and then extract information from that data set with increasing levels of precision and control.

The ADC Pattern: ASK

The initial step is the ASK query. When I was first learning SPARQL I didn't really see the usefulness in ASK. It seemed that the same effect, i.e. detecting where a given graph pattern can be matched against the data, could be achieved with a SELECT query:


SELECT *
WHERE {
  ...pattern of interest...
}
LIMIT 1

If there's at least one row, then we know there's matching data. This kind of query is useful when checking for existence of data in a relational database for example.

But, as I understand it, a SPARQL query engine can optimize for this common usage as it need not return any data (as it must do with a SELECT), it can simply terminate the query once it has found the first query solution. Better all round really as the query form better reflects the intent of the query than the "LIMIT 1" hack does.

Detective Sparql practically applies this query form in his investigation. His first query attempts to find sources that have the location of Bee Node and just asks whether the endpoint has the specific data items:


PREFIX geo <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX foaf <http://xmlns.com/foaf/0.1>
ASK WHERE {
  {
  </person/bnode> 
     geo:lat ?lat;
     geo:long ?long. 
  }
  UNION
  {
  ?person
     foaf:mbox <mailto:bnode@example.com>;
     geo:lat ?lat;
     geo:long ?long.      
  }
}

The query uses a UNION to ask the same question in slightly different ways. The first pattern uses a URI for Bee Node, the second references her via an identifying property. This is a realistic and likely scenario as different endpoints may have different URIs for the same resource.

The second ASK query that Piotr uses does essentially the same thing, but instead of looking for specific triples, e.g. geo:lat it ASKs a more general question: does the endpoint have any triples for specified subject; in this case Bee Node. It does this by using a variable in place of both the predicate and object:

</person/bnode> ?p ?o.

Queries that use wildcards for properties are a brilliantly useful feature in SPARQL as it allows one to describe very general, reusable graph patterns.

Actually Detective Sparql missed a trick here as what he should have asked is:


ASK WHERE {
  {
  </person/bnode> ?p o.  
  }
  UNION
  {
   ?s ?p </person/bnode>.
  }
}

...as that query would have checked for both facts about Bee Node and facts relating to Bee Node.

The ADC Pattern: DESCRIBE

Following up on the ASK queries, Detective Sparql uses a DESCRIBE query to request that specific sources "spill the beans" and demonstrate what they know and provide whatever information they find useful.

This provides a good way to extract some useful view of the data context within which a specific resource sits: its literal properties and relationships to other resources in the dataset. Depending on the algorithm the endpoint uses to generate these views (and the shape of the underlying data set) the amount of data returned by a DESCRIBE query can vary wildly.

This is very useful in some contexts; particularly web crawling where the client just wants to execute some general queries and use that as a starting point for further accesses. However in many others this unpredictability may not be suitable, particularly where the client wants or needs to control the shape of the result graph and the amount of information returned.

The ADC Pattern: CONSTRUCT

It's at this point where the CONSTRUCT query becomes useful.

The advantage of a CONSTRUCT query is that it provides the client with complete control over how the result graph is constructed. The client can specific exactly what resources it wants returned and which properties it's interested in.

Like ASK I originally wrote off CONSTRUCT and DESCRIBE as being specialized queries that would only be of limited interest. I expected that SELECT queries, which line up very nicely with their SQL equivalents, would be the primary SPARQL query form. But I was mistaken. Now that I've actually began writing applications that make heavy use of SPARQL I've found that CONSTRUCT is the query form that has most flexibility. There's more to say about that, but the presentation I gave at a recent SWIG meeting is useful background reading.

One important utility of CONSTRUCT is the ability to transform the underlying data set. Currently a CONSTRUCT query is the closest thing that RDF has to XSLT. Using CONSTRUCT a data set can be transformed into a particular shape than may fit the processing expectations of the client application. Although it should be said that CONSTRUCT is a poor cousin to XSLT (or SQL for that matter), in that it's limited in what it can achieve. At least until SPARQL gets more basic functions for things like string manipulation.

Detective Sparql uses this feature to transform SIOC data into his preferred ontology. This is going to be inevitable where vocabularies don't neatly line up with one another as is the case with SIOC and FOAF.

CONSTRUCT also provides a limited form of inferencing capability without requiring all a full reasoner.

Where CONSTRUCT is limited is in its ability to traverse an RDF graph. Limited in the sense that the traversal must be explicitly specified. DESCRIBE doesn't suffer from this, except that you have to rely on the SPARQL endpoint deciding where and how far to traverse. It'd be interesting to see DESCRIBE extended to allow the client to specify the algorithm for generating the view

Hopefully this posting demonstrates some aspects of SPARQL which go beyond the simple query language, and illustrates how the different query forms have their own strengths and weakness and how they can be combined to work with data out in the wild.

Uncategorized , , , , , , , ,

Visiones, Visionarios, hackeando de forma correcta la blogosfera

February 1st, 2008
En la futura red organizada, la web 3.0, cobrar‽n importancia los est‽ndares. Y estos, a diferencia de los idiomas no podr‽n estar supeditados al poder, ni pol‽tico ni econ‽mico : deber‽amos hackear la noosfera, luchar por los est‽ndares, im

delicious