Archive

Posts Tagged ‘HTTP’

Freebase Officially Linked Data with Release of RDF Service

October 29th, 2008

At ISWC2008 Freebase released its new RDF service for generating RDF representations of Freebase topics, allowing Freebase to be used as Linked Data! To obtain the RDF data for a topic send a GET request to http://rdf.freebase.com/rdf/some.topic.id where "some.topic.id" is replaced by the desired topic identifier (slashes in the identifier must be replaced by dots). Topic data can be represented as N3, RDF/XML or Turtle depending on the preferences expressed in your client's HTTP Accept header. Try it out with the Freebase topic Semantic Web.

You can also cater to clients that prefer HTML output by using the /ns end-point (http://rdf.freebase.com/ns/some.topic.id). The service performs the content negotiation automatically; delivering human-friendly HTML representations to Web browsers, and redirecting clients expecting RDF to the /rdf URL (via 302 redirect).

One downside is the data doesn't appear to link to external resources, in a sense walling itself in. It should be trivial to link the topics that came from Wikipedia back to Wikipedia as well as DBpedia (which would be killer, by the way).

Got something to say? Leave a comment!

English , , , , ,

Explaing REST and Hypertext: Spam-E the Spam Cleaning Robot

October 23rd, 2008

I'm going to add to Sam Ruby's amusement and throw in my attempt to explicate some of Roy Fielding's recent discussion of what makes an API RESTful. If you've not read the post and all the comments then I encourage you to do so: there's some great tidbits in there that have certainly given me pause for thought.

The following attempts to illustrate my understanding of REST. Perhaps bizarrely, I've chosen to focus more on the client than on the design of the server, e.g. what resources it exposes, etc. This is because I don't think enough focus has been placed on the client, particularly when it comes to the hypermedia constraint. And I think that often, when we focus on how to design an "API", we're glossing over some important aspects of the REST architecture which includes after all, other types of actors, including both clients and intermediaries.

I've also deliberately chosen not to draw much on existing specifications, again its too easy to muddy the waters with irrelevant details.

Anyway, I'm well prepared to stand corrected on any or all of the below. Will be interested to hear if anyone has any comments.

Lets imagine there are two mime types.

The first is called application/x-wiki-description. It define a JSON format that describes the basic structure of a Wiki website. The format includes a mixture of simple data items, URIs and URI templates that collectively describe:

  • the name of the wiki
  • the email address of the administrator
  • a link to the Recent Changes resource
  • a link to the Main page
  • a link to the license statement
  • a link to the search page (as a URI template, that may include a search term)
  • a link to parameterized RSS feed (as a URI template that may include a date)

Another mime type is application/x-wiki-page-versions. This is another JSON based format that describes the version history of a wiki page. The format is an ordered collection of links. Each resource in that list is a prior version of the wiki page; the most recent page is first in the list.

Spam-E is a little web robot that has been programmed with the smarts to understand several mime types:

  • application/x-wiki-description
  • application/x-wiki-page-versions
  • RSS and Atom
  • XHTML

Spam-E also understands a profile of XHTML that defines two elements: one that points to a resource capable of serving wiki descriptions, another that points to a resource that can return wiki page version descriptions..

Spam-E has internal logic that has been designed to detect SPAM in XHTML pages. It also has a fully functioning HTTP client. And it also has been programmed with logic appropriate to processing those specific media types.

Initially, when starting Spam-E does nothing. It waits to receive a link, e.g. via a simple user interface. Its in a steady state waiting for input.

Spam-E then receives a link. The robot immediates dereferences the link. It does so by submitting a GET request to the URL, and includes an Accept header:


Accept: x-wiki/description;q=1.0, x-wiki/page-versions;q=0.9, application/xhtml+xml;q=0.8, application/atom+xml;q=0.5, application/rss+xml;q=0.4

This clearly states Spam-E's preference to receive specific mime-types.

In this instance is receives an XHTML document in return. Not ideal, but Spam-E knows how to handle it. After parsing it, it turns out that this is not a specific profile of XHTML that Spam-E understands, so it simply extract all the anchor elements from the file and uses it to widen its search for wiki spam. Another way to say this is that Spam-E has changed its status to one of searching. This state transition has been triggered by following a link, receiving and processing a specific mimetype. This is "hypermedia as the engine of application state" in action.

Spam-E performs this deference-parse-traverse operation several times before finding an XHTML document that conforms to the profile it understands. The document contains a link to a resource that should be capable of serving a wiki description representation.

Spam-E is now in discovery mode. Spam-E uses an Accept header of application/x-wiki-description when following the link and is returned a matching representation. Spam-E parses the JSON and now has additional information at its disposal: it knows how to search the wiki, how to find the RSS feed, how to contact the wiki administrator, etc.

Spam-E now enters Spam Detection mode. It requests, with a suitable Accept header, the recent changes resource, stating a preference for Atom documents. It instead gets an RSS feed, but thats fine because Spam-E still knows how to process that. For each entry in the feed, Spam-E requests the wiki page, using an Accept header of application/xhtml+xml.

Spam-E now tries to find if there is spam on the page by applying its local spam detection logic. In this instance Spam-E discovers some spam on the page. It checks the XHTML document it was returned and discovers that it conforms to a known profile and that embedded in a link element is a reference to the "versions" resource. Spam-E dereferences this link using an Accept header of application/x-wiki-page-versions.

Spam-E, who is now in Spam Cleaning mode, fetches each version in turn and performs spam detection on it. If spam is found, then Spam-E performs a DELETE request on the URI. This will remove that version of the wiki page from the wiki. Someone browsing the original URI of the page will now see an earlier, spam free version.

Once it has finished its cycle of spam detection and cleaning, Spam-E reverts to search mode until it runs out of new URIs.

There are several important points to underline here:

Firstly, at no point did the authors of Spam-E have to have any prior knowledge about the URL structure of any site that the robot might visit. All that Spam-E was programmed with was logic relating to some defined media types (or extension points of a media type in the case of the XHTML profiles) and the basic semantics of HTTP.

Secondly, no one had to publish any service description documents, or define any API end points. No one had to define what operations could be carried out on specific resources, or what response codes would be returned. All information was found by traversing links and by following the semantics of HTTP.

Thirdly, the Spam-E application basically went through a series of state transitions triggered by what media types it received when requesting certain URIs. The application is basically a simple state machine.

Anyway, hopefully that is a useful example. Again, I'm very happy to take feedback. Comments are disabled on this blog, but feel free to drop me a mail (see the Feedback link).

English , , , , , , , ,

Web of Data Practitioners Days, 1st Session: Tweaking Turtles

October 22nd, 2008

Good morning from Vienna:) The Web of Data Practitioners Days really kicked off with a bang today - with Michael Hausenblas doing a strip! Only to expose the Semantic Web t-shirt he wore underneath his smart suit and tie, of course, but he really got the attention of attendees at 9:15 in the morning:)

First session - Web of Data 101 by Yves Raimond and Keith Alexander - explained the implications of the move from a Web of Documents to a Web of Data: With the Semantic Web architecture, data can be made explicit on the web. Data here means not only data contained in documents, but data describing persons, cities, bands, events, finally arriving at the “Web of Things” (see also this presentation by Dave Raggett, W3C, - PDF 2,7 MB). The Web of Data wouldn’t be a Web if the data weren’t interlinked - here is an overview of the principles of Linked Data:

  • always use URIs as names for things
  • more specifically, use HTTP URIs so that people can look up those names on the web
  • when someone looks up an URI, provide useful RDF information (RDF is the data model used for data on the web of data)
  • include RDF statements that link to other URI (otherwise it wouldn’t be a web).

Please also watch out for what is already happening and is going to happen in the future on www.bbc.co.uk/music/beta. This beta site is powered by MusicBrainz, the open content music database that is also part of the Linked Data cloud. Yves is collaborating with the BBC in the Programmes ontology project, the aim of which is to provide a simple vocabulary for describing programmes.

Yves’ intro was followed by a Turtle hacking session led by Keith Alexander. Turtle is a serialisation format for RDF, i.e. a format in which you can write RDF statements. The Turtle session is documented here on Keith’s Talis website. Even though I copied and pasted most of the code, I didn’t manage to produce a piece of valid code in N3 right away (i.e. not valid according to this validator). It only worked after I had removed the statements about who I know or what I am interested in - without these connections, what remains is a bit boring, I guess. But this looks like I managed to post at least something to the test store!

EDIT: Problem was that I had terminated the statements to soon, with a dot where a semicolon should have been; the demo didn’t allow me to overwrite the first post to the store, but here is my FOAF self-description in Turtle:

@prefix foaf:<http://xmlns.com/foaf/0.1/> .
@prefix owl:<http://www.w3.org/2002/07/owl#> .
@prefix people:<http://api.talis.com/stores/wod-pd-sandbox/items/People/> .

people:JanaHerwig a foaf:Person ;
foaf:name “Jana Herwig” ;
foaf:nick “digiom” ;
foaf:homepage <http://digiom.wordpress.com> ;
owl:sameAs <http://dbtune.org/last-fm/jezobeljones> ;
foaf:knows people:MichaelHausenblas, people:YvesRaimond, people:WolfgangHalb ;
foaf:topic_interest <http://dbpedia.org/resource/Semantic_Web>, <http://dbpedia.org/resource/Web>, <http://dbpedia.org/resource/Popular_Culture>, <http://dbpedia.org/resource/Lolcat>.

Achieved with zero Semantic coding skills - the Web of Data cannot be so hard to achieve:)

EDIT: Did do the update, too - just posted my first SPARQL query to this endpoint. Are the results going to be preserved in this link? Here is the query “by foot”:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX people: <http://api.talis.com/stores/wod-pd-sandbox/items/People/>
DESCRIBE people:JanaHerwig

English , , , , , , , , , , , , ,

All the better to hear us with….

October 18th, 2008
[inline]
SpringWidgets
Nodalities » Podcast
Talking with Talis - Nodalities From Semantic Web to Web of Data
var flashVars = {param_param:'http%3A%2F%2Fblogs.talis.com%2Fnodalities%2Fcategory%2Fpodcast%2Ffeed', param_compactView:'-1', param_blurbLength:'512', param_style_borderColor:'0xBEBEBE', param_style_brandUrl:'http%3A%2F%2Fdownloads.thespringbox.com%2Fhosted_content%2Fimages%2F9745a6d280c897419da82f6b4ee0e8fc.jpg'};var params= {wmode:'transparent', quality:'high', allownetworking:'all', allowscriptaccess:'always', allowfullscreen:'true', bgcolor:'0x000000'};swfobject.embedSWF('http://downloads.thespringbox.com/web/wrapper.php?file=61830.sbw', 'springwidgets_61830', '250', '318', '8.0.0', 'http://downloads.thespringbox.com/web/expressInstall.swf', flashVars, params); [/inline]

Talis watchers will be well aware of the significant number of podcasts that my colleagues and I produce here at Talis.  Apart from the semantic web focused podcasts here on Nodalities, there are the more library focused ones on our sister blog Panlibus, education focused ones on the Project Xiphos Blog, and of course the Library 2.0 and Semantic Web gangs.

Keeping up with these streams of podcast output can be a bit of a challenge, so we have taken a few steps to make things easier.

Firstly, on Panlibus, Nodalities, and Xiphos we have created a ‘podcast’ category.   By selecting podcast in the category selector, you can view only podcast postings for that at blog.

Next we have implemented a feed aggregator which brings all the Talis podcasting output in to a single feed under the Talking with Talis brand.  The displayed version of this feed is not as elegant as each dedicated blog feed, but all the information is there and it is a great place to select the aggregated feed for your favourite RSS reader.

iTunes is tool that many use to track download and listen to podcasts.   The Talking with Talis iTunes feed has now been updated to include all of our podcasts.  If you don’t already have this free feed set up in your iTunes, click here to do it now.

Last but not least, you will have noticed the RSS feed widget at the top of this blog post.  This widget is freely available from SpringWidgets for you to add to your favourite environment such as Pageflakes, Facebook, Wordpress, iGoogle and many others including [after a small software download] your PC desktop.

I have set up these widgets for the following podcast feeds – to get one in your environment follow the link and the click the ‘Click Here to Get the Code!’ link.

SpringWidgets have loads more in their Widget Gallery that have been created by their community, and I must give credit to one of their number, Minerva, who created the first Panlibus podcasts feed.

English , , , , ,

All the better to hear us with….

October 18th, 2008
[inline]
SpringWidgets
Panlibus » Podcast
Talking with Talis podcasts from the Panlibus Blog
var flashVars = {param_param:'http%3A%2F%2Fblogs.talis.com%2Fpanlibus%2Farchives%2Fcategory%2Fpodcast%2Ffeed', param_compactView:'-1', param_blurbLength:'512', param_style_borderColor:'0xBEBEBE', param_style_brandUrl:'http%3A%2F%2Fdownloads.thespringbox.com%2Fhosted_content%2Fimages%2Fb47e75792724979d445e2eba681e3039.jpg'};var params= {wmode:'transparent', quality:'high', allownetworking:'all', allowscriptaccess:'always', allowfullscreen:'true', bgcolor:'0x000000'};swfobject.embedSWF('http://downloads.thespringbox.com/web/wrapper.php?file=61829.sbw', 'springwidgets_61829', '250', '318', '8.0.0', 'http://downloads.thespringbox.com/web/expressInstall.swf', flashVars, params); [/inline]

Talis watchers will be well aware of the significant number of podcasts that my colleagues and I produce here at Talis.  Apart from the mainly library focused podcasts here on Panlibus, there are the more semantic web based ones on our sister blog Nodalities, education focused ones on the Project Xiphos Blog, and of course the Library 2.0 and Semantic Web gangs.

Keeping up with these streams of podcast output can be a bit of a challenge, so we have taken a few steps to make things easier.

Firstly, on Panlibus, Nodalities, and Xiphos we have created a ‘podcast’ category.   By selecting podcast in the category selector, you can view only podcast postings for that at blog.

Next we have implemented a feed aggregator which brings all the Talis podcasting output in to a single feed under the Talking with Talis brand.  The displayed version of this feed is not as elegant as each dedicated blog feed, but all the information is there and it is a great place to select the aggregated feed for your favourite RSS reader.

iTunes is tool that many use to track download and listen to podcasts.   The Talking with Talis iTunes feed has now been updated to include all of our podcasts.  If you don’t already have this free feed set up in your iTunes, click here to do it now.

Last but not least, you will have noticed the RSS feed widget at the top of this blog post.  This widget is freely available from SpringWidgets for you to add to your favourite environment such as Pageflakes, Facebook, Wordpress, iGoogle and many others including [after a small software download] your PC desktop.

I have set up these widgets for the following podcast feeds – to get one in your environment follow the link and the click the ‘Click Here to Get the Code!’ link.

SpringWidgets have loads more in their Widget Gallery that have been created by their community, and I must give credit to one of their number, Minerva, who created the first Panlibus podcasts feed.

English , , , , ,

OpenID, OAuth UI and tool links

October 15th, 2008

A quick link roundup:

From ‘Google OAuth & Federated Login Research‘:

“The following provides some guidelines for the user interface define of becoming an OAuth service provider”

Detailed notes on UI issues, with screenshots and links to related work (opensocial etc.).

Myspace’s OAuth Testing tool:

The MySpace OAuth tool creates examples to show external developers the correct format for constructing HTTP requests signed according to OAuth specifications

Google’s OAuth playground tool (link):

… to help developers cure their OAuth woes. You can use the Playground to help debug problems, check your own implementation, or experiment with the Google Data APIs.

If anyone figures out how to post files to Blogger via their AtomPub/OAuth API, please post a writeup! We should be able to use it to post RDFa/FOAF etc hopefully…

Yahoo’s OpenID usability research. Really good to see this made public, I hope others do likewise. There’s a summary page and a full report in PDF, “Yahoo! OpenID: One Key, Many Doors“.

Finally, what looks like an excellent set of introductory posts on OAuth: a Beginner’s Guide to OAuth from Eran Hammer-Lahav.

English , , , ,

Release 3.1 - Now in Technology Preview

October 1st, 2008

Release 3.1 now in Technology Preview

Well, it’s been over two weeks since we released something cool (www.semanticproxy.com) – time to get cracking on some new stuff.

We’ve placed Release 3.1 of Calais into technology preview status. Just as a reminder, technology preview is a separate instance of Calais that allows developers to evaluate new features and test their software prior to our moving the release to production. You can access the Preview by simply pointing your tool to http://beta.opencalais.com rather than http://api.opencalais.com. Just like Calais, the preview version requires that you have a developer API key – your existing key will work just fine.

This will be a relatively extended Preview – most likely lasting throughout October 2008. We want to give everyone the opportunity to test some significant new features and make sure we have adequate time to respond to any issues you discover. That being said – please don’t wait until the last minute to give things a spin.

As you may have noticed, our releases are getting significantly larger and incorporating substantial new functionality on a monthly basis. Release 3.1 is no different – it contains everything from major new capabilities such as company and geography disambiguation to performance improvements to new output formats to some significant expansions of the types of information it can extract.So, in our tradition of lengthy blog postings – here’s an overview of what’s new in 3.1. I’ve broken this up into a few high-level focus areas. You can also visit the release notes right here.

New and Significant (at least to some of us)

Release 3.1’s big new functionality is disambiguation of company names and geographies. One of the big challenges of automated entity recognition is how to deal with ambiguity – for example “IBM”, “IBM Corp” “International Business Machines”. For the vast majority of use cases you want each of these variations resolved to a single entity called “IBM”. There are similar challenges around geography such as Calais, Maine vs. Calais, France.

For companies we’ve implemented a sophisticated disambiguation capability that is driven by a reference database of tens of million of company names and their variations. This database is primarily focused on public companies – but we’ll be expanding it to contain a broader range of companies in the future. In addition to variations on a company name, we also use hints that may exist in the text, such as location or industry, as additional evidence.

For geography we’re utilizing elements of DBPedia and other public data assets to dive in and figure out which Calais or Paris or wherever the text is really talking about. We base this disambiguation not just on the name itself – but hints in the surrounding text (for example longhorns are seldom discussed in the same article as Paris, France – but Paris, Texas is another story). To jumpstart mapping applications we also return the geo coordinates of the geography we’ve detected.

Efficiency and Scalability

We’ve implemented a couple of changes to make life easier for our higher-volume users. First, you now have the option to tell Calais you do not want a copy of the original text returned to you. If your application doesn’t care about offsets of detected items in the text you might consider turning this option on to reduce your bandwidth utilization.

Second, Calais now supports HTTP traffic compression. Given that we’re dealing with text on the input and output sides of the transaction, this can dramatically reduce the size of your transaction, again reducing your bandwidth utilization.

New Output Formats and Integrations

Please take a look at the Release Notes for details on a number of small changes to the RDF, MicroFormats and Simple format outputs. We’ve also added a JSON output format that’s covered in more detail here.

Calais now also talks PopFly! Microsoft’s PopFly is an interesting mashup building platform with a visual development interface. You can now directly integrate Calais within your PopFly mashups. Our documentation for this capability is available here.

Getting Smarter

In keeping with prior releases Calais is also getting smarter. We’ve added a number of new elements to the Calais vocabulary. These include PatentFiling, PatentIssuamce, FDAPhase, PersonEmailAddress, PersonEmployment, new elements for PersonAttributes, and SecondaryIssuance. In addition to these elements, we have one particularly interesting one: PersonRelation. The PersonRelation entity extracts references to symmetric relationships between people in the areas of business, friends, academic, military service or politics. This is one you’ll have to play with to get an idea of – but here’s a simple example:

The text:

The two served together in combat, and McDonald said Odierno was an "absolute joy to work with”.

Would result in:

Person1:  Mark McDonald
Person2:  Ray Odierno
PersonRelationType: Military Service

That’s it for R3.1. Any questions, please feel free to post to the forums or drop us a note at questions@opencalais.com. I’ll be posting an update on what’s in the pipeline for R4 in the next few days – lots of interesting stuff is on the way.

English , , , , , , , , , , , , ,

Interesting links for September 15th

September 17th, 2008

Interesting links for September 15th

September 16th, 2008

Representing Content in RDF and HTTP Vocabulary in RDF Drafts Published

September 15th, 2008

The W3C Evaluation and Repair Tools Working Group today published Representing Content in RDF as a First Public Working Draft. This document provides a vocabulary to represent content in RDF, and is flexible for any type of content available on the Web or in local storage media. The Working Group also published an an updated Working Draft of HTTP Vocabulary in RDF, which defines terms to allow HTTP headers that have been exchanged between a client and a server to be recorded in RDF. These documents can be used to extend the Evaluation and Report Language (EARL) 1.0 Schema, an RDF vocabulary to record test results such as those generated by Web accessibility evaluation tools. They are part of the EARL Specification.

English , , , ,

Google Chrome prefers XHTML

September 2nd, 2008

The blogosphere is restless about Chrome, the new open source browser developed by Google. But I´m not going to discuss its software design, its performace or it usability, there are many people talking about it. I´ll talk about a technical detail: it prefers XHTML instead the classic HTML.

How to know it? As probaby you know in HTTP there is a header called Accept to specify format types which are acceptable. Requesting this service developed by Richard Cyganiak with Chrome, we can get the value of that header:

text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

English, Spanish , , , ,

links for 2008-07-25

July 25th, 2008

Uncategorized ,

Be your own twitter: laconi.ca microblog platform and identi.ca

July 10th, 2008

The laconi.ca microblogging platform is as open as you could hope for. That elusive trinity: open source; open standards; and open content.

The project is led by Evan Prodromou (evan) of Wikitravel fame, whose company just launched identi.ca, “an open microblogging service” built with Laconica. These are fast gaining feature-parity with twitter; yesterday we got a “replies” tab; this morning I woke to find “search” working. Plenty of interesting people have  signed up and grabbed usernames. Twitter-compatible tools are emerging.

At first glance this might look the typical “clone” efforts that spring up whenever a much-loved site gets overloaded. Identi.ca’s success is certainly related to the scaling problems at Twitter, but it’s much more important than that. Looking at FriendFeed comments about identi.ca has sometimes been a little depressing: there is too often a jaded, selfish “why is this worth my attention?” tone. But they’re missing something. Dave Winer wrote a “how to think about identi.ca” post recently; worth a read, as is the ever-wise Edd Dumbill on “Why identica is important”. This project deserves your attention if you value Twitter, or if you care about a standards-based decentralised Social Web.

I have a testbed copy at foaf2foaf.org (I’ve been collecting notes for Laconica installations at Dreamhost). It is also federated. While there is support for XMPP (an IM interface) the main federation mechanism is based on HTTP and OAuth, using the openmicroblogging.org spec. Laconica supports OpenID so you can play  without needing another password. But the OpenID usage can also help with federation and account matching across the network.

Laconica (and the identi.ca install) support FOAF by providing a FOAF files  - data that is being indexed already by Google’s Social Graph API. For eg. see  my identi.ca FOAF; and a search of Google SGAPI for my identi.ca account.  It is in PHP (and MySQL) - hacking on FOAF consumer code using ARC is a natural step. If anyone is interested to help with that, talk to me and to Evan (and to Bengee of course).

Laconica encourages everyone to apply a clear license to their microblogged posts; the initial install suggests Creative Commons Attribution 3. Other options will be added. This is important, both to ensure the integrity of this a system where posts can be reliably federated, but also as part of a general drift towards the opening up of the Web.

Imagine you are, for example, a major media content owner, with tens of thousands of audio, video, or document files. You want to know what the public are saying about your stuff, in all these scattered distributed Social Web systems. That is just about do-able. But then you want to know what you can do with these aggregated comments. Can you include them on your site? Horrible problem! Who really wrote them? What rights have they granted? The OpenID/CC combination suggests a path by which comments can find their way back to the original publishers of the content being discussed.

I’ve been posting a fair bit lately about OAuth, which I suspect may be even more important than OpenID over the next couple of years. OAuth is an under-appreciated technology piece, so I’m glad to see it being used nicely for Laconica. Laconica installations allow you to subscribe to an account from another account elsewhere in the Web. For example, if I am logged into my testbed site at http://foaf2foaf.org/bandri and I visit http://identi.ca/libby, I’ll get an option to (remote-)subscribe. There are bugs and usability problems as of right now, but the approach makes sense: by providing the url of the remote account, identi.ca can bounce me over to foaf2foaf which will ask “really want to subscribe to Libby? [y/n]“, setting up API permissioning for cross-site data flow behind the scenes.

I doubt that the openmicroblogging spec will be the last word on this kind of syndication / federation. But it is progress, practical and moving fast. A close cousin of this design is the work from the SMOB (Semantic Microblogging) project, who use SIOC, FOAF and HTTP. I’m happy to see a conversation already underway about bridging those systems.

Do please consider supporting the project. And a special note for Semantic Web (over)enthusiasts: don’t just show up and demand new RDF-related features. Either build them yourself or dive into the project as a whole. Have a nose around the buglist. There is of course plenty of scope for semwebbery, but I suggest a first priority ought to be to help the project reach a point of general usability and adoption. I’ve nothing against Twitter just as I had nothing at all against Six Apart and Movable Type, back before they opensourced. On the contrary, Movable Type was a great product from great people. But the freedoms and flexibility that opensource buys us are hard to ignore. And so I use Wordpress now, having migrated like countless others. My suspicion is we’re at a “Wordpress/MovableType” moment here with Identica/Laconica and Twitter, and that of all the platforms jostling to be the “new twitter”, this one is most deserving of success. With opensource, Laconica can be the new Laconica…

You can follow me here identi.ca/danbri

Uncategorized , , , , , , , , , ,

The Flickcurl Story

June 28th, 2008

In January 2007 I started playing with the Flickr API - the HTTP-based web service that lets you manipulate Flickr. At that point I was using it to play with machine tags and to see how the most popular Web Service API worked, especially in the area of authentication. This was in the days before OAuth if you can remember that far back.

I started with a test program in C that called libcurl and did some of the signing and parameter marshaling of the flickr.photos.getInfo call which is where all the juicy metadata about photos is. I started thinking about ways to map photo metadata into RDF for manipulating and querying; there is an existing Perl Flickr RDF mapping but it didn’t contain everything. This state of sources was useful; it contained a small library with the one API method plus a command-line utility to call it. Since I was using libCurl to call Flickr, I named it Flickcurl. Also CFlickr was taken! (Flickcurl also uses libxml but flickcurlibxml is just nuts).

Apart from playing with photo metadata I also had some personal reasons to make something new. I wanted a lighter weight and less formal project than the way I had been building the Redland RDF Libraries. More of it compiles, ship it model and less of the unit tests, test cases and continual make check, worrying about portability approach. Maybe more fun would be a way to put it. I’m happiest as a free software / open source software tool-builder and at this point in 2007 I was spending a lot of time at work doing non-coding things such as designing specifications and doing technical leadership and the chance to work on some different code now and then was appealing to counterpoint the work stuff.

Redland is a set of libraries that have been growing since mid-2000 with more and more features as the semantic web technology stack grows so at any point in time there is no clear end state. For this project I wanted a clear goal to reach so I could be clearly done at some point. This is possible with the Flickr API since there are at any time a finite number of API calls (something like 100) so progress can be measured… although Flickr did add API calls while I was working on it. The result was I made a Flickcurl API coverage page with embedded API changelog (automatically generated from source code comments).

Flickcurl 0.1 was “released” 2007-01-21 although I didn’t announce it to anyone at that point. It was more of a tarball than an actual release.

One more thing I wanted to do was to experiment with different ways to tell people about software, compared to the ways I as using with Redland which was mostly email based but also via SourceForge and Freshmeat. So for Flickcurl I tried a bunch of different ways:

That was kind of fun, and I also followed a similar light weight process with Triplr but that’s another story. I think caring less worked out fine; people did use it and submit patches. Right now I still use the Flickr mailing list, API group, and freshmeat project.

As the library headed towards 100% of the API and beyond it did get a bit more formal and I imported what I think are the best practices from the Redland libraries:

  • objects in C design
  • always refactoring the source code: refactoring is not just for dynamic languages
  • source code docu-comments generating an HTML API reference via gtk-doc
  • folding in portability fixes
  • make it work with optional libraries for extra functionality (Raptor in this case, to allow serialising to other RDF syntaxes)
  • built in portable ANSI C
  • taking care about memory leaks with valgrind
  • comes with a utility program able to exercise the entire API (called flickcurl)
  • Debian packages (created by somebody else, yay!)
  • manual pages for the command line utilities

The general aim was to get 100% of the Flickr API done by the end of 2007 and I actually reached it for Flickcurl 1.0 on 2008-01-12 which was pretty close.

So right now the library has gone beyond 1.0; the latest release is Flickcurl 1.4 which was released last Tuesday 24th June (see release notes) which primarily added video support but I also updated the photo metadata mapping to RDF by adding a serializer class for abstracting the photo-to-triples process.

The RDF triple mappings is something that has always been there but not part of the core library. It could be optionally used inside Raptor to automatically read Flickr photo URIs as RDF data sources. I doubt it’ll ever be presented inside a public web service like Triplr since it would require passing in Flickr API authentication tokens and user credentials.

The RDF triples mapping I’ve made for the Flickr photo metadata has mixture of vocabularies which are in 3 buckets:

  1. Existing Vocabularies: well known RDF schemas (class and properties) that have been developed over many years by multiple people and organisations, sometimes with a lot of formality.
  2. Flickr-specific Vocabularies: vocabularies I made up mostly for Flickr video and places API terms.
  3. Machine Tag Vocabularies: I made them up using machinetags.org/ns URIs as a root for the namespaces associated with the vocabularies. The terms in the vocabularies come from how people used machine tags on Flickr and are not always defined.

This is a range of what might be called semantic web heavy to light although there is absolutely nothing wrong with mixing things up if you are not worried about inference. This is OK! I should probably put some html/schema documents at the vocabularies and get the redirects and all that # and / business sorted so that the linked data works out properly but what I have now is just a start and I’d be interested to see what people think. There are more details of the vocabularies and terms I’m using in the Flickcurl 1.4 release notes although I should probably add vocabulary information to the documentation too.

That’s all for now but I’ll expand some more in another post about the Flickr API itself and my experience with it and impressions of it as a both a software developer and HTTP Web Service designer.

Uncategorized , , , , , , , , , , , , , ,

SFSW 2008

April 7th, 2008

People who know me know really well that I like scripting languages, specially Python. So SFSW2008 (co-located with ESWC2008) is one of my favourite workshops. With the travel this weekend I didn’t have time to comment that this year the workshop has three papers accepted from CTIC:

  • Diego Berrueta, Jose Emilio Labra and Ivan Herman. XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT stylesheets.
  • Diego Berrueta, Sergio Fernández and Iván Frade. Cooking HTTP content negotiation with Vapour.
  • Cosmin Basca, Stéphane Corlosquet, Richard Cyganiak, Sergio Fernández and Thomas Schandl. Neologism: Easy Vocabulary Publishing.

I’m very proud to say that I was involved in two of them (about Vapour and Neologism), really good news :-)

English, Spanish , , , , , , , , , , ,

Semantic Web Yahoo - Part Deux

August 13th, 2007

It’s been nearly 2 years since I joined Yahoo! and the the semantic web-based technology I helped develop has been deployed in production for some time. It has been encouraging to see the ideas get more accepted since today I noticed that in a hotjobs search for rdf yahoo near Sunnyvale there 5 jobs open - not in my group, but in Yahoo! Local.

Our group in Sunnyvale is continuing to look for HTTP and web caching experts, designers and coders for building REST-based web services. Right here and now we have interesting, large scale, rich data problems and are applying semweb techniques to them. Contact me if any of this sounds exciting to you.

Semantic Web Yahoo - Part one

Uncategorized , , , , , ,

Flickcurl - C API to Flickr

August 3rd, 2007

In January 2007 just for fun I started writing a C API to Flickr using the Flickr web services called Flickcurl. The name was because it was originally built using Flickr via libCurl to do the HTTP work … although right now it contains more use of libxml than of libcurl.

I started this for a bunch of reasons, including to learn more about “web 2.0″ web APIs, see how RESTy the Flickr API really is (Answer: not much, it’s very much an RPC model) and the issues with developing a Web API. It’s clear this is an evolved and evolving one since now and then I discover undocumented returned attributes in the XML and cases where it is not clear why attributes were used instead of elements. It’s very suited towards dynamic scripting languages where it is easy to pass around dictionaries / hashes / associative arrays of parameters that can grow. So in some sense, making something feel like a natural API in a static language like C is rather going against the grain and rather slow work.

There are, however, things available to help. There are method reflection APIs so I wrote a code generating program that can nicely automate writing many of the simpler calls that return no value or just a single one. I also used a lot of similar patterns so that parsing tags xml is quite similar to parsing comments xml. The XML is primarily read via XPath and a little DOM.

One other nice thing about this is that this a piece of work with a fixed size, albeit growing slowly. The Flickr API currently has 104 calls - depending on how you measure them - so it’s easy to check progress, and that’s how I’ve been doing it. I built tools to read the docu-comments (javadoc, gnome-doc, kernel-doc style) and mark the Flickcurl coverage release by release.

The news today is that I have reached the half way point: 50% of the APi with the release of Flickcurl 0.11 at least until they add something more! I have also done most of what I think are the trickier parts - the uploading, searching and getting info about photos. The remaining API parts are more regular, so I feel like I’m coding downhill now.

Now there’s something else it does - and this won’t be a surprise to most given my interests. Flickcurl generates RDF descriptions from Flickr photos with a flickrdf utility, including reading Machine Tags. The namespaces are either well known ones, or invented by me, pointing at the machinetags.org wiki - you can create your own definition.

flickrdf uses Raptor to do nicer serializing when it is available. So this means I can turn jellyfish into Turtles. W00t! (*)

$ ./flickrdf -o turtle http://www.flickr.com/photos/dajobe/196308964/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://www.flickr.com/photos/dajobe/196308964/>
    dc:creator [
        a foaf:Person;
        foaf:maker <http://www.flickr.com/photos/dajobe/196308964/>;
        foaf:name "Dave Beckett";
        foaf:nick "dajobe"
    ];
    dc:dateSubmitted "2006-07-23T18:16:13Z"^^xsd:dateTime;
    dc:rights <http://creativecommons.org/licenses/by-nc-sa/2.0/>;
    dc:modified "2007-02-25T07:45:46Z"^^xsd:dateTime;
    dc:issued "2006-07-23T18:16:13Z"^^xsd:dateTime;
    dc:created "2006-07-23T05:28:50Z"^^xsd:dateTime;
    geo:lat "36.620487";
    geo:long "-121.904468";
    dc:title "Jellyfish at Monterey Aquarium";
    dc:subject "jellyfish" .

After that bad joke (and it could have been worse if I had a picture of a Turtle) here’s what you need to know. Get it at flickcurl-0.11.tar.gz (md5sum eea351e4d35e8d1c63b124cd8ee257ba, sha1sum d220f6371c0c5334c824a51ba848d9358d73e533) or the latest in the Flickcurl Subversion It’s licensed under the GPL2 / LGPL2 / Apache 2.0 or any newer versions of any of them.

Note: I work for Yahoo! and although Flickr is part of Yahoo! this project is my own personal work.

(*) Actually I’m slightly cheating with this example, there’s a couple of bug fixes in SVN after the release which are needed to get this output.

Uncategorized , , , , , , , , ,

20:20 talk on hardware hacking for software people

May 19th, 2007

I just got back from XTech 2007 in Paris. It was an excellent conference this year and I'm really proud of having contributed in a small way by being on the programme committee. Every year the speaker lineup gets better and better.

The theme this year was 'The Ubiquitous Web'. HTTP isn't just for computers any more, and I'm particularly interested in how developers like me can learn to make their own network-connected objects in the real world. To spread the word, I gave a lightning talk on my experiences with the Arduino hardware hacking boards and other toys from tinker.it.

I put the slides on SlideShare.

Permalink

Uncategorized , ,

Neutrality of the Net

May 2nd, 2006

Net Neutrality is an international issue. In some countries it is addressed better than others. (In France, for example, I understand that the layers are separated, and my colleague in Paris attributes getting 24Mb/s net, a phone with free international dialing and digital TV for 30euros/month to the resulting competition.) In the US, there have been threats to the concept, and a wide discussion about what to do. That is why, though I have written and spoken on this many times, I blog about it now.

Twenty-seven years ago, the inventors of the Internet[1] designed an architecture[2] which was simple and general. Any computer could send a packet to any other computer. The network did not look inside packets. It is the cleanness of that design, and the strict independence of the layers, which allowed the Internet to grow and be useful. It allowed the hardware and transmission technology supporting the Internet to evolve through a thousandfold increase in speed, yet still run the same applications. It allowed new Internet applications to be introduced and to evolve independently.

When, seventeen years ago, I designed the Web, I did not have to ask anyone's permission. [3]. The new application rolled out over the existing Internet without modifying it. I tried then, and many people still work very hard still, to make the Web technology, in turn, a universal, neutral, platform. It must not discriminate against particular hardware, software, underlying network, language, culture, disability, or against particular types of data.

Anyone can build a new application on the Web, without asking me, or Vint Cerf, or their ISP, or their cable company, or their operating system provider, or their government, or their hardware vendor.

It is of the utmost importance that, if I connect to the Internet, and you connect to the Internet, that we can then run any Internet application we want, without discrimination as to who we are or what we are doing. We pay for connection to the Net as though it were a cloud which magically delivers our packets. We may pay for a higher or a lower quality of service. We may pay for a service which has the characteristics of being good for video, or quality audio. But we each pay to connect to the Net, but no one can pay for exclusive access to me.

When I was a child, I was impressed by the fact that the installation fee for a telephone was everywhere the same in the UK, whether you lived in a city or on a mountain, just as the same stamp would get a letter to either place.

To actually design legislation which allows creative interconnections between different service providers, but ensures neutrality of the Net as a whole may be a difficult task. It is a very important one. The US should do it now, and, if it turns out to be the only way, be as draconian as to require financial isolation between IP providers and businesses in other layers.

The Internet is increasingly becoming the dominant medium binding us. The neutral communications medium is essential to our society. It is the basis of a fair competitive market economy. It is the basis of democracy, by which a community should decide what to do. It is the basis of science, by which humankind should decide what is true.

Let us protect the neutrality of the net.


  1. Vint Cerf, Bob Kahn and colleagues
  2. TCP and IP
  3. I did have to ask for port 80 for HTTP

Uncategorized , , , , , , , , , , , , , ,

Give yourself a URI

January 25th, 2006

Do you have a URI for yourself? If you are reading this blog and you have the ability to publish stuff on the web, then you can make a FOAF page, and you can give yourself a URI.

A lot of people have published data about themselves without using a URI for themselves. This means I can't refer to them in other data. So please take a minute to give yourself a URI. If you have a FOAF page, you may just have to add rdf:about="" and voila you have a URI http://example.com/Alan/foaf.rdf#ABC. (I suggest you use your initials for the last bit). Check it works in the Tabulator.

The URI will start with "http" (so I can look it up using HTTP) and it will have # in it, so the URI of your foaf file is different from the URI for you.

Me, I make my foaf file in N3 and convert it to the foaf file in RDF. that's my choice.

The AWWW says that everything of importance deserves a URI. Go ahead and give yourself a URI. You deserve it!

Uncategorized , ,