Archive

Archive for the ‘comment’ Category

Leaving Yahoo – Joining Digg

August 26th, 2010

I’m heading to a new adventure at Digg in San Francisco to be a lead software engineer working on APIs and syndication.

I’ve been at Yahoo! nearly 5 years so it is both a happy and sad time for me, and I wish all the excellent people I worked with the best of luck in future.

Here is a summary of the main changes:

  • Silicon Valley -> San Francisco
  • 15,000 staff -> 100 staff
  • Architect -> Software engineer
  • strategizing, meeting -> coding
  • Powerpoint, OmniGraffle, twiki -> emacs, eclipse, …?
  • (No coding!) -> Python, Java, Hadoop, Cassandra, …?
  • Sunny days -> Foggy days
  • 15 min commute -> 2.5hr commute (until I move to SF)
  • Public company -> private company

Exciting!

English, comment

Rasqal RDF Query Library 0.9.20

August 22nd, 2010

I just released a new version of my Rasqal RDF Query Library for two main new features:

  1. Support more of the new W3C SPARQL working drafts of 1 June 2010 for SPARQL 1.1 Query and SPARQL 1.1 Update.
  2. Support building with Raptor V2 API as well as Raptor V1 API..

The main change is to start to add to Rasqal’s APIs and query engine changes for the new SPARQL 1.1 working drafts. This release adds support the syntax for all the changes for Query and Update. The new draft syntax is available via the ‘laqrs’ query language name, until the SPARQL 1.1 syntax is finalized. The ‘sparql’ query language provides SPARQL 1.0 support.

On Query 1.1, the addition is primarily syntax and API support for the new syntax. There is expression execution for the new functions IF(), URI(), STRLANG(), STRDT(), BNODE(), IN() and NOT IN() which are noew usable as part of the normal expression grammar. The existing aggregate function support was extended to add the new SAMPLE() and GROUP_CONCAT() but remains syntax-only. Finally the new GROUP BY with HAVING conditions were added to the syntax and had consequent API updates but no query engine execution of them.

For Update 1.1 the full set of update operations syntax were added and they create API structures. Note, however there seem to be some ambiguities in the draft syntax especially around multiple optional tokens in a row near WITH which are particularly hard to implement in flex and bison (aka “lex and yacc”).

The main non-SPARQL 1.1 related change is to allow building Rasqal with Raptor V2 APIs rather than V1. Raptor V2 is in beta so this is not a final API and is thus not the default build, it has to be enabled with --enable-raptor2 with configure. When raptor V2 is stable (2.0.0), Rasqal will require it.

The changes to Rasqal in this release, in summary, are:

  • Updated to handle more of the new syntax defined by the SPARQL 1.1 Query and SPARQL 1.1 Update W3C working drafts of 1 June 2010
  • Added execution support for new SPARQL 1.1 query built-in expressions IF(), URI(), STRLANG(), STRDT(), BNODE(), IN() and NOT IN().
  • Added an ‘html’ query result table format from patch by Nicholas J Humfrey
  • Added API support for group by HAVING expressions.
  • Added XSD Date comparison support.
  • Support building with Raptor V2 API if configured with --with-raptor2.
  • Many other bug fixes and improvements were made.
  • Fixed Issues: #0000352, #0000353, #0000354, #0000360, #0000374, #0000377 and #0000378

See the Rasqal 0.9.20 Release Notes for the full details of the changes.

Get it at http://download.librdf.org/source/rasqal-0.9.20.tar.gz.

PS The source code control has also moved to GIT and hosted at GitHub.

English, comment

Raptor RDF Syntax Library V2 beta 1

August 16th, 2010

Today I released the first beta version of Raptor 2. This is the culmination of about 9 months work refactoring the Raptor 1 codebase. In hindsight, I should have done this years ago, but I knew it would be a lot of work, and it was.

The reasoning behind doing this is multi-fold, but basically the code had a lot of cruft and bad design choices that couldn’t be removed without breaking the APIs in lots of ways, and at some point it’s easier to just do it all at once, and that’s where we are now.

Cruft meant removing stuff deprecated for a long time but also renaming all the functions to follow the same “objects in C” style used throughout Redland’s libraries which has standard naming forms:

  • raptor_class_method()
  • Constructors: raptor_new_class() (core constructor or 1 arg constructor) and raptor_new_class_from_extras()
  • Copy constructor: raptor_class_copy()
  • Destructors: raptor_free_class()

The major addition was a raptor_world object that is used as a single object to hold on to all shared resources and configuration. This was a design pattern I put in librdf and Rasqal but for some reason, never considered it for raptor. This turned out to be a mistake since I had to then pass around a lot of parameters and configuration to individual object instances, more than was really needed. Examples of this include the error handling which added two parameters to several constructors. The error handling, now expanded to a general log mechanism after librdf’s handles multiple structured log record types and the logging policy is once-per-world.

The addition of the world object meant that each constructor for an object in raptor now takes that object, so it can get access to the shared configuration and resources. That itself meant the change was extensive, broad in scope. The single place to manage resources means it’s easier to ensure proper cleanup and deal with library-wide issues.

One other pain point was Raptor’s simplistic (but functional!) URI class. It manipulated URIs as plain old C strings (char*). I knew from building librdf, that this could be more efficient by interning the strings so a URI for a particular string is held only once, and reference counted. I used the already built raptor AVL-Tree to implement it, and as a bonus, moved that AVL Tree to the public API, so it can be reused (Rasqal has a copy of the implementation). The resulting reference-counted URIs mean that after URI construction, comparison and copying are very cheap – and given that this is RDF, those are done a lot. The old URI code also had a swappable implementation which added a lot of complexity to the code and that has gone now, since the new implementation is more sophisticated. There is probably more work that can be done here to make this URI work better, such as caching the URI structure so that it’s quicker to generate relative URIs. Also one day I should really validate that all the URIs built are legal to the syntax.

Another long term problem was the triple itself, which I had called ‘statement’ way back when I was creating it. Unfortunately a raptor_statement had hard-coded the RDF specifics – the subject can only be URI or blank node, predicate can only be a URI etc. That meant the code was twisty. That has been replaced by an array of 3 or 4 raptor terms (URI or blank node or literal) so it can handle both triples, quads and any possible extension beyond RDF (2004), although today none of the current parsers or serializers expect non-RDF statements. That change also made a lot of the internal code simpler to understand and quicker. The RDF terms were also introduced in a reference count manner, along with adding reference counting to the statements, it meant that passing triples around which used to involve a lot of copying, is now a simple integer increment of the reference. More speed!

That sorted out the fundamentals of statements, terms and URIs and changed pretty much every piece of code that touched them in all the parsers and serializers and core code.

There were a few pieces of new work added – two new serializers and one new parser. Two of those were written by Nicholas J Humfrey who is now a core committer.

I’d also like to call out thanks to Lauri Aalto for keeping raptor, rasqal and librdf relatively buildable while I was refactoring and breaking things. He wrote the code to make Rasqal and librdf build and work with raptor V1 and V2 at the same time.

Other work included updating all the reference documentation, tutorials, examples and sundry documentation for the new APIs including admin code to automate some of the documentation so it always included accurate details about formats.

There is lots more that changed in detail, listed in the Raptor 1.9.0 Release Notes, help on upgrading and there’s even a perl script docs/upgrade-script.pl thrown in (generated by another perl script!) that may help with applying the changes. The reference manual contains a full reference on changes between raptor 1.4.21 and 1.9.0 in the form of old / new mappings with explanations.

I know that Raptor 2 is not going to place Raptor 1 for applications for some time, so this is a separately installed library with a new location for the header file and a new shared library base. However, once this hits 2.0.0 it’ll be a dependency of Rasqal and librdf.

Summary of release:

  • Removed all deprecated functions and typedefs.
  • Renamed all functions to the standard raptor_class_method() form.
  • All constructors take a raptor_world argument.
  • URIs are interned and there is no longer a swappable implementation.
  • Statement is now an array of 3-4 RDF Terms to support triples and quads.
  • World object owns logging, blank node ID generation and describing syntaxes.
  • Features are now called options and have typed values.
  • GRDDL parser now saves and restores shared libxslt state.
  • Added serializers for HTML ‘html’ and N-Quads ‘nquads’.
  • Added parser ‘json’ for JSON-Resource centric and JSON-Triples.
  • Switched to GIT version control hosted by GitHub.
  • Added memory-based AVL-Tree to the public API.
  • Fixed reported issues:

    0000357, 0000361, 0000369, 0000370, 0000373 and 0000379

It turns out that after all that, the resulting libraries for raptor 2 are actually 4% smaller than raptor 1 when installed (Debian, i386):

 -rw-r--r-- 1 root root 379780 Mar 10 06:59 /usr/lib/libraptor.so
 -rw-r--r-- 1 root root 364448 Aug 16 17:30 /usr/lib/libraptor2.so

The gzipped tarball itself is as small as raptor 1.4.17 from 2008!

Get it at http://download.librdf.org/source/raptor2-1.9.0.tar.gz

PS The source code control has also moved to GIT and hosted at GitHub.

English, comment, raptor

Happy 10th Birthday Redland

June 28th, 2010

Redland‘s 10th year source code commit birthday is today 28th Jun at 9:05am PST – the first commit was Wed Jun 28 17:04:57 2000 UTC.

Happy 10th Birthday! Please celebrate with tea and cake.

(New releases soon, but it’s nice and sunny here in California, it’s very distracting.)

English, comment

Command Line Semantic Web with Redland

March 20th, 2010

I gave a ‘lightning’ talk (actually in about 15 mins) Command Line Semantic Web with Redland on 15th March 2010 at the Semantic Web Austin Meetup during SXSW at Texas Coworking, Austin, TX, USA. Today I recorded it as a screencast and put it online.

The embedded Vimeo version is below (best to view full screen to see all the text), but you can also get alternate hosted and downloadable versions (iPhone, 3GP, Full size) from my site.

Command Line Semantic Web With Redland from Dave Beckett on Vimeo.

English, comment

Flickcurl C API to Flickr 1.17 Released

March 8th, 2010

In the last few days I released Version 1.17 of my Flickcurl C library interface to the Flickr API. It has new complete support for three new recent sets of new APIs.

Added 15 new functions for the new Stats API calls announced 2010-03-03:
flickr.stats.getCollectionDomains, flickr.stats.getCollectionReferrers, flickr.stats.getCollectionStats, flickr.stats.getPhotoDomains, flickr.stats.getPhotoReferrers, flickr.stats.getPhotosetDomains, flickr.stats.getPhotosetReferrers, flickr.stats.getPhotosetStats, flickr.stats.getPhotoStats, flickr.stats.getPhotostreamDomains, flickr.stats.getPhotostreamReferrers, flickr.stats.getPhotostreamStats, flickr.stats.getPopularPhotos and flickr.stats.getTotalViews.

Added 8 new functions for the new People and “photos of” people API calls announced 2010-01-21:
flickr.photos.people.add, flickr.photos.people.delete, flickr.photos.people.deleteCoords, flickr.photos.people.editCoords and flickr.photos.people.getList, flickr.people.getPhotosOf.

Added 3 new functions for the new, unannounced (and seems incomplete) Gallery API calls:
flickr.galleries.addPhoto, flickr.galleries.getList and flickr.galleries.getListForPhoto .

Updated the flickcurl(1) to support the new gallery, people photos and stats API calls.

See the Release Notes for full details.

Get it at: http://download.dajobe.org/flickcurl/flickcurl-1.17.tar.gz (GPL2 / LGPL2 / Apache2.)

This is what I do for fun between releasing Redland RDF libraries more of which soon…

English, comment

Rasqal 0.9.18 RDF Query Library Released

February 14th, 2010

Update: you want 0.9.19 not 0.9.18 after package configuration issue found. Links fixed.

This release of Rasqal adds draft syntax support for the SPARQL 1.1 Update language being developed by the W3C SPARQL Working Group. The SPARQL 1.1 Update W3C Working Draft of 2010-01-26 introduces the first syntax design with some uncertainties and gray areas still present (no grammar spec section yet). I added what I thought would work, avoiding the ambiguous WITH forms where everything is optional. Since this is draft work, this extra parsing is only done when the ‘laqrs’ query language syntax is chosen. LAQRS stands for LAQRS adds to Querying RDF in SPARQL.

This is just syntax and API support in Rasqal, so it means you can prepare the upload queries, but there is no code to execute it. The API allows getting access to the decoded sparql update (INSERT, DELETE with or without DATA) and graph operations (CLEAR, DROP etc.). There is still more to do, when the syntax gets changed in later drafts and there is no API to stream triple insert/deletes during parsing, to handle uploading and downloading large triple blocks. That would required a rewrite of the SPARQL parser to use a different technology than flex+bison (maybe lemon, maybe Ragel) as well as new APIs.

Rasqal has several things to finish for SPARQL 1.0 support (UNION and nested OPTIONALs don’t work) but the recent rewrite of the query engine internals should make other SPARQL 1.1 parts such as aggregate functions and nested queries, a lot easier to do than with the old query engine. I will probably remove the old query engine from the codebase soon.

The second substantial change is a set of APIs moved from private to public in rasqal.h to enable the construction of query result sets and query result set rows (rasqal_row) via the public API. This allows query results to be read from a syntax or constructed by API as well as serialized to result formats, without any query being executed. Rasqal can be used with this addition to provide the sparql results syntax support for other applications that may have created query results via a different method. It can read query results formats from the SPARQL XML format (the standard format), and write or serialize them to SPARQL XML, SPARQL JSON, CSV, TSV and an ASCII Table format. This functionality is all available via Triplr where you can make HTTP GET URLs for saved queries.

The final change is in the area of resilience. The functions in the public API have been updated so that when invalid or NULL pointers are given, the functions return failure or NULL / false rather than try to use the pointer and probably crash. Hopefully I caught all of them. The release testing (as usual) included valgrind memory leak checking of all of the 100s of tests and there were no leaks or buffer overruns found.

This is also the first Rasqal release since switching to GIT as the source control for the Redland libraries so the source pointers have moved to git.librdf.org where details of how to check it out can be found.

So in summary, the main changes in this release are:

  • 0.9.19: Fix rasqal.pc to Requires raptor again.
  • Add initial draft parsing and API (NOT execution) support for SPARQL 1.1 Update W3C Working Draft of 2010-01-26.
  • Add public APIs (row, results, result formatter, variables table) so that query results can be built, read and written without a query.
  • Add API resilience checks for invalid NULL pointer arguments.
  • Many other bug fixes and improvements were made.

Fixed Issues:

  • 0000320: Add a void* user_data field to rasqal_variable
  • 0000323: Official MIME Type for JSON isn’t text/json
  • 0000343: Mime type for ‘table’ results format is text/plan
  • 0000345: MIME Type and URI for TSV and CSV
  • 0000347: rasqal linking fix

See the Rasqal 0.9.19 Release Notes for the full details of the changes.

Download: at http://download.librdf.org/source/rasqal-0.9.19.tar.gz

English, comment

Raptor 1.4.21 released – Raptor 2 GIT work

January 30th, 2010

I just released version 1.4.21 of my Raptor RDF parsing / serialising library to the world. This release is just bug fixes:

  • RDFa parser buffer management problems were fixed.
  • The Turtle parser and serializers now use QNames correctly as required by the specification.
  • The RDF/XML parser now resets correctly to detect duplicate rdf:IDs when a parser object is reused.
  • A few other minor bug and build fixes with made.
  • Fixed reported issues: 0000318, 0000319, 0000326, 0000331, 0000332 and 0000337

This is the first release since switching to GIT as the source control for the Redland libraries. The above release is on branch ‘raptor1′ in the new Redland GIT.

In parallel to this is the ongoing Raptor 2 ABI/API updating which is cleaning up 10 years of API and internal cruft. GIT is really helping speed up the ease of this work with the branching, staging/index and stash concepts it supports allowing false paths to be managed. The results can be seen on branch ‘master’ of raptor.

The updating is going well in the sense that make distcheck test suite passes, but there are still things to decide including:

  • Rename all raptor_CLASS_copy copy constructors to something else: either raptor_new_CLASS_from_CLASS (also used in raptor – Doh!) or to raptor_CLASS_addref which signifies better that it just adds a reference to the object, it’s a shallow copy, not a deep one.
  • Unify raptor_world, rasqal_world and librdf_world – which might help share classes between the libraries. Not sure if this is a good idea yet.
  • Add a graph term to the (subject, predicate, object) triple returned from parsing. I am probably going to do this.
  • Turn the raptor_locator object into a more of a log (like librdf_log) or exception object, with inner log/exceptions.
  • Improve the callback interface that passes error, warning etc. messages to user code.

I need to decide at what point to roll out an alpha release of Raptor 2, which will probably be numbered 1.9.0. Some of the above possibilities might be worth putting in a later alpha release.

This can all be seen in the GIT repository which includes instructions for checkout at git.librdf.org.

English, comment

RDF Syntaxes 2.0

January 24th, 2010

I’ve been diligently ignoring the RDF 2.0 threads on the semantic-web interest list, especially on Syntax since I’ve been there before (Modernising Semantic Web Markup). Firstly I’d endorse what Jeremy Carroll says about the features.

I think I’m qualified as an expert on RDF graph serializations / syntax since:

and I implemented all of the above plus GRDDL, RDFa (via librdfa), Atom and RSS*es, RDF/JSON, … in Raptor

People moan about RDF/XML and have for years. I even wrote down in great detail the flaws in Modernising Semantic Web Markup. Over all that time nobody has come up with a credible and complete XML syntax alternative that stuck, even myself. Let me summarize the ones I know:

  • TriX: had little takeup
  • RXR: ditto
  • GRIT: new, but flawed since it can only represent trees (no named bnodes)

The fundamental problem I think with using XML to write down graphs is:

People looking at XML expect they are looking at a hierarchical Tree.

So writing a Graph in an XML Tree is just going to always fail the simplicity test. This might come from using the XML DOM or looking at HTML, XHTML, but it’s pretty embedded in the mind.

Right now I’d dismiss any XML format for any “simple” or “obvious” way to write down RDF graphs that will be accepted by new users.

(Aside: There’s also a technical argument that no XML format can ever represent all RDF graphs since RDF allows Unicode codepoints that are not allowed in XML).

Now this isn’t a problem just with XML, it’s also true of other non-XML formats that are serial hierarchical documents. That means formats like JSON, which cannot even out-of-the-box represent anything that is not a tree, since it has no ID/REF mechanism.

Of course, apart having dealt with the RDF/XML I also invented Turtle (based on the N3 syntax, simplified) and although it’s a non-XML syntax, does seem to be in the sweet spot for users understanding it, without having the hierarchical document expectation. Yes, Turtle is close to JSON/python in syntax design space but this doesn’t seem to have been a problem.

So I’m happy with how Turtle turned out and that should be the focus of RDF syntax formats for users. It does need an update and I’ll probably work on that whether or not a new syntax is part of some future working group – I have a pile of fixes to go in. Adding named graphs (TRIG) might be the next step for this if it was a standard.

It may be there is a need for a better machine format, but please don’t mix them. Also, machines can read Turtle RDF :)

Consider this stream of conciousness RDF syntax thoughts as the basis of my position paper for the W3C RDF Next Steps workshop.

English, RDF, comment, grddl, raptor, syntax, turtle

Merry Winter Festival (Northern Hemisphere) with Rasqal, Redland, Triplr releases and updates.

December 22nd, 2009

Merry winter solstice to those in the northern hemisphere. Hope the summer is doing fine for the southern folk.

Co-incidentally, I released some software:

  • Rasqal 0.9.17“the new query engine one” – rewritten internals with a new query engine that handles more of SPARQL 1.0 (95%+) and will be able to add features much easier for SPARQL 1.1 changes. ABI/API change too. 15 months of changes summarised in the release notes.
  • Redland 1.0.10 and bindings 1.0.10.1 to go with it – new Virtuoso triplestore backend developed by OpenLink (in 2008; sorry it took so long) plus support for the new Rasqal. Release notes.
  • Triplr upgraded with the above packages which is especially useful for the Triplr RDF Query (a SPARQL endpoint). Triplr news

That is all. Early next year: switch from SVN to GIT and start on raptor2 ABI/API break.

English, comment

Raptor 1.4.20 RDF syntax library released

November 28th, 2009

Released Raptor 1.4.20 as a bug fix release – no ABI or API changes but fixes for wrong-to-spec bugs, crashes and performance. Raptor 2 will contain ABI/API changes and have new features when it is released – no ETA. My main development focus returns to Rasqal, it’s new query engine and full SPARQL 1.0 support, which is coming along well.

The main changes in the new Raptor version are:

  • Turtle serializing performance improvement by Chris Cannam
  • librdfa RDFa parser updates to fix empty datatype, xml:lang and 1-char prefixes by Manu Sporny
  • Fix a crash when the GRDDL parser reported errors
  • Enable large file support for 32-bit systems
  • Several resilience improvements by Lauri Aalto
  • Other minor portability and bug fixes
  • Fixed reported issues: 0000306 0000307 0000310 and 0000312.

See the Raptor 1.4.20 Release Notes for the full details of the changes.

Download it from: raptor-1.4.20.tar.gz

Raptor RDF Parser Library Version 1.4.19 Released

July 19th, 2009

It has been 13 months and thus long overdue for a new release of Raptor my RDF parsing and serializing library – if it involves going between syntax and triples, Raptor can do it.

(My excuse for the long delay I was busy working on Rasqal which is also due a release. Plus it’s lovely living in California :) )

WARNING: FUTURE ABI and API CHANGES.
The next release of Raptor 1.4.x will include bug fixes only and no new features. New development will move to Raptor 2 where a planned ABI and API break will happen. There may be preview releases of Raptor 2 with 1.9.x numbering.

The changes were as follows:

  • Many improvements to RSS tag soup (RSSes and Atom) parser and the RSS 1.0 and Atom serializers
  • Several fixes and improvements to the N-Triples, RDFa and RDF/XML parsers and Turtle serializer
  • Improved the use and configuration of static libxml functions for better compatibility
  • Several Win32 portability fixes – Lou Sakey
  • Many internal changes for upcoming Raptor V2 – primarily by Lauri Aalto
  • Many other fixes and resilience improvements.
  • Fixed reported issues: 0000259, 0000262, 0000263, 0000266, 0000269, 0000270, 0000276, 0000277, 0000287, 0000288, 0000289, 0000290, 0000293, 0000296, 0000299 and 0000303.

See the Raptor 1.4.19 Release Notes for the full details of the changes.

English, comment

Flickcurl 1.12 – C API to Flickr

July 4th, 2009

Flickcurl (C library to Flickr API) Version 1.12 is out with:

See the Release Notes for full details.

Get it at the Flickcurl home page (GPL2 / LGPL2 / Apache2.)

English, comment

FSF Introduces RDF descriptions of GNU licenses

June 17th, 2009

Birthdays – XML is 10 and RDF/XML is 9

February 10th, 2008

Happy 10th Birthday XML.

It’s clear you are going to be around for some time. People know your good points and bad and have got the kinks worked out using you in production, in diversity and at scale.

Take care not to be distracted in the next 10 years by sexy new text formats that overlap in some features, but don’t replace you for many uses. I’m talking about you, JSON.

In the RDF world, RDF/XML is the syntax people love to hate, or just love/hate. It is 1 year younger than you, so maybe in February 2009 we’ll have something to celebrate about that. Yeah, it might happen :)

I recently made a new textual RDF syntax sibling Turtle with TimBL whose official birthday was last month, although it’s actual birth was January 2004 in Bristol, or earlier if you look into it’s ancestry. In 6 (10?) more years it’ll be something we can properly rely on, like XML is today.

Dave

P.S. For more memories, check out Tim, Eve and Norm who were involved in XML from very early on when I was just an observer.

English, comment

Semantic Web Yahoo – Part Deux

August 13th, 2007

It’s been nearly 2 years since I joined Yahoo! and the the semantic web-based technology I helped develop has been deployed in production for some time. It has been encouraging to see the ideas get more accepted since today I noticed that in a hotjobs search for rdf yahoo near Sunnyvale there 5 jobs open – not in my group, but in Yahoo! Local.

Our group in Sunnyvale is continuing to look for HTTP and web caching experts, designers and coders for building REST-based web services. Right here and now we have interesting, large scale, rich data problems and are applying semweb techniques to them. Contact me if any of this sounds exciting to you.

Semantic Web Yahoo – Part one

English, RDF, Yahoo, comment

Flickcurl – C API to Flickr

August 3rd, 2007

In January 2007 just for fun I started writing a C API to Flickr using the Flickr web services called Flickcurl. The name was because it was originally built using Flickr via libCurl to do the HTTP work … although right now it contains more use of libxml than of libcurl.

I started this for a bunch of reasons, including to learn more about “web 2.0″ web APIs, see how RESTy the Flickr API really is (Answer: not much, it’s very much an RPC model) and the issues with developing a Web API. It’s clear this is an evolved and evolving one since now and then I discover undocumented returned attributes in the XML and cases where it is not clear why attributes were used instead of elements. It’s very suited towards dynamic scripting languages where it is easy to pass around dictionaries / hashes / associative arrays of parameters that can grow. So in some sense, making something feel like a natural API in a static language like C is rather going against the grain and rather slow work.

There are, however, things available to help. There are method reflection APIs so I wrote a code generating program that can nicely automate writing many of the simpler calls that return no value or just a single one. I also used a lot of similar patterns so that parsing tags xml is quite similar to parsing comments xml. The XML is primarily read via XPath and a little DOM.

One other nice thing about this is that this a piece of work with a fixed size, albeit growing slowly. The Flickr API currently has 104 calls – depending on how you measure them – so it’s easy to check progress, and that’s how I’ve been doing it. I built tools to read the docu-comments (javadoc, gnome-doc, kernel-doc style) and mark the Flickcurl coverage release by release.

The news today is that I have reached the half way point: 50% of the APi with the release of Flickcurl 0.11 at least until they add something more! I have also done most of what I think are the trickier parts – the uploading, searching and getting info about photos. The remaining API parts are more regular, so I feel like I’m coding downhill now.

Now there’s something else it does – and this won’t be a surprise to most given my interests. Flickcurl generates RDF descriptions from Flickr photos with a flickrdf utility, including reading Machine Tags. The namespaces are either well known ones, or invented by me, pointing at the machinetags.org wiki – you can create your own definition.

flickrdf uses Raptor to do nicer serializing when it is available. So this means I can turn jellyfish into Turtles. W00t! (*)

$ ./flickrdf -o turtle http://www.flickr.com/photos/dajobe/196308964/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://www.flickr.com/photos/dajobe/196308964/>
    dc:creator [
        a foaf:Person;
        foaf:maker <http://www.flickr.com/photos/dajobe/196308964/>;
        foaf:name "Dave Beckett";
        foaf:nick "dajobe"
    ];
    dc:dateSubmitted "2006-07-23T18:16:13Z"^^xsd:dateTime;
    dc:rights <http://creativecommons.org/licenses/by-nc-sa/2.0/>;
    dc:modified "2007-02-25T07:45:46Z"^^xsd:dateTime;
    dc:issued "2006-07-23T18:16:13Z"^^xsd:dateTime;
    dc:created "2006-07-23T05:28:50Z"^^xsd:dateTime;
    geo:lat "36.620487";
    geo:long "-121.904468";
    dc:title "Jellyfish at Monterey Aquarium";
    dc:subject "jellyfish" .

After that bad joke (and it could have been worse if I had a picture of a Turtle) here’s what you need to know. Get it at flickcurl-0.11.tar.gz (md5sum eea351e4d35e8d1c63b124cd8ee257ba, sha1sum d220f6371c0c5334c824a51ba848d9358d73e533) or the latest in the Flickcurl Subversion It’s licensed under the GPL2 / LGPL2 / Apache 2.0 or any newer versions of any of them.

Note: I work for Yahoo! and although Flickr is part of Yahoo! this project is my own personal work.

(*) Actually I’m slightly cheating with this example, there’s a couple of bug fixes in SVN after the release which are needed to get this output.

English, comment