Archive

Posts Tagged ‘API’

Join us at Mashup Camp to Compete in the first Calais Mashup Contest

October 31st, 2008

 INTRODUCING THE CALAIS MASHUP CONTEST @ MASHUP CAMP

Please join us for Mashup Camp to compete in our first Calais Mashup Contest at the Computer History Museum in Mountain View, CA, Nov. 17-19, 2008.

THREE WAYS TO WIN
We are offering a 4TB Drobo as top prize for Calais mashups in three categories:

     

  1. Best Calais Mashup addressing the delivery and/or display of news
  2. Best Calais Mashup addressing the needs of business users
  3. Most creative Calais Mashup overall (in any category serving any user)

Mashup Camp is a unique opportunity to work directly with other Calais developers - including members of our expert internal team who are flying in from the ClearForest development center in Israel - to create new and compelling mashups.

See the complete rules for the Calais mashup contest on the Mashup Camp wiki.  You must be a Mashup Camp participant in order to compete.  An unlimited Calais API key as well as a variety of helpful APIs and feeds will be made available to entrants as well.

To participate, register for Mashup Camp and note your intention to enter the Calais Mashup Contest here.  If you want to get started early, and need the special Calais key, shoot us a note at MashupContest (at) OpenCalais (dot) com.

We look forward to seeing you there.

  

English , , , , , ,

Interview for Journalism.co.uk… Journalists get to know the Semantic Web!

October 29th, 2008

I was interviewed last week by Colin Meek from Journalism.co.uk on the topic of “Web 3.0″ and what it means for journalists… You can read the full article in two parts (1, 2). My original answers are part of an interview on their Insite blog. I also had the chance to talk about various DERI offerings in the Semantic Web area including SIOC, SWSE, Sindice, Semantic Radar, etc.

Colin also asked me about other readable data that is being crawled by Semantic Web search engines like Sindice, SWSE or Swoogle. These search engines can usually match keywords in any data that has been crawled or integrated into a semantic store, not just people. It could be from structured information about people, places, dates, library documents, blog items or topics, whatever. In fact, there is no limit to the types of things that can be indexed and searched - since RDF (an open data model that can be adapted to describe pretty much anything) is used as the data format. Anyone can reuse existing RDF vocabularies like SIOC to publish data, or they can publish data using their own custom vocabularies (e.g. to describe stamp collecting or Bollywood movie genres or whatever), or they can combine public and custom vocabularies (e.g. take FOAF and your own vocabulary about soccer to describe players and managers on a soccer team). Geotemporal information is particularly useful across a range of domains, and provides nice semantic linkages between things. For example, having geographic information and time information is useful for describing where people have been and when, for detailing historical events or TV shows, for timetabling and scheduling of events, etc., and for connecting all of these things together (”I’m travelling to Edinburgh next week: show me all the TV shows of relevance and any upcoming events I should be aware of according to my interests…”).

The keyword searches in the Sindice search engine allow you to find more information on where resources of interest are (searching for “john breslin” will point to all public pages that contain semantic information about yours truly). Sindice also has an API that can provide results in a resuable (semantic) format that can be leveraged by other applications. Alternately, SWSE (Semantic Web Search Engine) shows you semantic information about the object of interest (e.g. my phone number, my friends, etc.) which may be derived from multiple sources (e.g. this information on me comes from tens of sources consolidated together via unique identifiers for me or through what’s called “object consolidation”).

For me, this article highlighted the fact that the Semantic Web community needs to be very aware that one of the key features of the Social Web for journalists and for many others is the ability to find a lot of personal and sensitive information on people, and with the advent of “Web 3.0″, we need to realise that (”with great power comes great responsibility”) the availability of contextual and semantically-related information is going to become even more apparent, and people will talk about it in both positive and negative terms. Educating site owners about what semantic data they may be publishing (knowingly or unknowingly, even if it’s just RSS feeds) is needed, and developers should determine exactly what opt-in or opt-out mechanisms are required before implementing semantic solutions. Users also should be aware of the benefits and other potential uses of their semantic data.

I think now is the time to avert any scares, because in reality, the data that is on the Web or the Social Web can be used in new ways anyway, whether metadata is present or not (some facts can be derived). Google have recently implemented some discussion forum parsing algorithms to determine how many posts are on a thread, how many users posted on that thread and when the last post was made. You can see this in a search result I did for “irish pubs boards.ie” below. It’s not complete, and probably relies on identifying certain HTML structures for non-Google discussion sites, e.g. you can see two threads in the middle that don’t show details of the total posts or commenters. But it’s moving towards the SIOC vision of providing more metadata about discussions on the Web to help you in finding more relevant information - whether the site owners want to provide Semantic Web data or not!

Making data available semantically enables computers to help us do things we cannot easily do (or cannot do at all) right now, and this is what makes it so powerful. We also need to think more towards educating people about the benefits as well as how we can minimise any hazards. Is this a job for W3C SWEO? As my colleague James Cooley said: “I think scientists thought the benefits of GM food were so obvious that there was no case to make. Then you got Frankenstein Food and the game was up.”

For journalists interested in the Semantic Web, I’d recommend reading this paper entitled “SemWebbing the London Gazette” by Jeni Tennison and John Sheridan which describes how they have exposed information from their newspaper website using RDFa so that it becomes easy to re-use (slides here). You can also view some interesting slides by Colin Meek from a seminar he gave to journalists about the Social Web in Olso a few days ago. It’s in three parts (1, 2, 3). I’ve embedded the third part (on the Semantic Web) below…

Other posts referencing this article:

English , , , , , , , , , , , , , , , , , , , , , , , , , ,

Explaing REST and Hypertext: Spam-E the Spam Cleaning Robot

October 23rd, 2008

I'm going to add to Sam Ruby's amusement and throw in my attempt to explicate some of Roy Fielding's recent discussion of what makes an API RESTful. If you've not read the post and all the comments then I encourage you to do so: there's some great tidbits in there that have certainly given me pause for thought.

The following attempts to illustrate my understanding of REST. Perhaps bizarrely, I've chosen to focus more on the client than on the design of the server, e.g. what resources it exposes, etc. This is because I don't think enough focus has been placed on the client, particularly when it comes to the hypermedia constraint. And I think that often, when we focus on how to design an "API", we're glossing over some important aspects of the REST architecture which includes after all, other types of actors, including both clients and intermediaries.

I've also deliberately chosen not to draw much on existing specifications, again its too easy to muddy the waters with irrelevant details.

Anyway, I'm well prepared to stand corrected on any or all of the below. Will be interested to hear if anyone has any comments.

Lets imagine there are two mime types.

The first is called application/x-wiki-description. It define a JSON format that describes the basic structure of a Wiki website. The format includes a mixture of simple data items, URIs and URI templates that collectively describe:

  • the name of the wiki
  • the email address of the administrator
  • a link to the Recent Changes resource
  • a link to the Main page
  • a link to the license statement
  • a link to the search page (as a URI template, that may include a search term)
  • a link to parameterized RSS feed (as a URI template that may include a date)

Another mime type is application/x-wiki-page-versions. This is another JSON based format that describes the version history of a wiki page. The format is an ordered collection of links. Each resource in that list is a prior version of the wiki page; the most recent page is first in the list.

Spam-E is a little web robot that has been programmed with the smarts to understand several mime types:

  • application/x-wiki-description
  • application/x-wiki-page-versions
  • RSS and Atom
  • XHTML

Spam-E also understands a profile of XHTML that defines two elements: one that points to a resource capable of serving wiki descriptions, another that points to a resource that can return wiki page version descriptions..

Spam-E has internal logic that has been designed to detect SPAM in XHTML pages. It also has a fully functioning HTTP client. And it also has been programmed with logic appropriate to processing those specific media types.

Initially, when starting Spam-E does nothing. It waits to receive a link, e.g. via a simple user interface. Its in a steady state waiting for input.

Spam-E then receives a link. The robot immediates dereferences the link. It does so by submitting a GET request to the URL, and includes an Accept header:


Accept: x-wiki/description;q=1.0, x-wiki/page-versions;q=0.9, application/xhtml+xml;q=0.8, application/atom+xml;q=0.5, application/rss+xml;q=0.4

This clearly states Spam-E's preference to receive specific mime-types.

In this instance is receives an XHTML document in return. Not ideal, but Spam-E knows how to handle it. After parsing it, it turns out that this is not a specific profile of XHTML that Spam-E understands, so it simply extract all the anchor elements from the file and uses it to widen its search for wiki spam. Another way to say this is that Spam-E has changed its status to one of searching. This state transition has been triggered by following a link, receiving and processing a specific mimetype. This is "hypermedia as the engine of application state" in action.

Spam-E performs this deference-parse-traverse operation several times before finding an XHTML document that conforms to the profile it understands. The document contains a link to a resource that should be capable of serving a wiki description representation.

Spam-E is now in discovery mode. Spam-E uses an Accept header of application/x-wiki-description when following the link and is returned a matching representation. Spam-E parses the JSON and now has additional information at its disposal: it knows how to search the wiki, how to find the RSS feed, how to contact the wiki administrator, etc.

Spam-E now enters Spam Detection mode. It requests, with a suitable Accept header, the recent changes resource, stating a preference for Atom documents. It instead gets an RSS feed, but thats fine because Spam-E still knows how to process that. For each entry in the feed, Spam-E requests the wiki page, using an Accept header of application/xhtml+xml.

Spam-E now tries to find if there is spam on the page by applying its local spam detection logic. In this instance Spam-E discovers some spam on the page. It checks the XHTML document it was returned and discovers that it conforms to a known profile and that embedded in a link element is a reference to the "versions" resource. Spam-E dereferences this link using an Accept header of application/x-wiki-page-versions.

Spam-E, who is now in Spam Cleaning mode, fetches each version in turn and performs spam detection on it. If spam is found, then Spam-E performs a DELETE request on the URI. This will remove that version of the wiki page from the wiki. Someone browsing the original URI of the page will now see an earlier, spam free version.

Once it has finished its cycle of spam detection and cleaning, Spam-E reverts to search mode until it runs out of new URIs.

There are several important points to underline here:

Firstly, at no point did the authors of Spam-E have to have any prior knowledge about the URL structure of any site that the robot might visit. All that Spam-E was programmed with was logic relating to some defined media types (or extension points of a media type in the case of the XHTML profiles) and the basic semantics of HTTP.

Secondly, no one had to publish any service description documents, or define any API end points. No one had to define what operations could be carried out on specific resources, or what response codes would be returned. All information was found by traversing links and by following the semantics of HTTP.

Thirdly, the Spam-E application basically went through a series of state transitions triggered by what media types it received when requesting certain URIs. The application is basically a simple state machine.

Anyway, hopefully that is a useful example. Again, I'm very happy to take feedback. Comments are disabled on this blog, but feel free to drop me a mail (see the Feedback link).

English , , , , , , , ,

Session 4: Using the Web of Data [WOD-PD]

October 23rd, 2008

This morning’s first session was dedicated to Using the Web of Data, or, as Alan Dix put it: “In the end, it’s not about data - it’s about use!” Alan and Richard Cyganiak were the keynoters for this session.

Alan Dix is a Professor at the Computing Department of Lancaster University, and author (with Janet Finlay, Gregory Abowd, and Russel Beale) of Human-Computer Interaction.

To start with, Alan pointed to the two sides of achieving the web of data: Firstly generating the web of data (a billion triples, as mighty as this may sound, is actually tiny, says Alan) and then, secondly, accessing the web of data.

Alan Dix giving a talk

With regard to generating the Web of Data, Alan distinguished between top down and bottom up approaches, counting to the former the creation of the web of data from legacy sources (i.e. where you take existing data and semantically lift them, e.g. from structured data) or web scraping such as DBpedia’s extraction of data from Wikipedia.

N.B.: This notion of ‘top-down’ does not imply a hierarchical relationship, but rather means that there is already a plan for what is going to be put on the web of data (e.g. ‘all semi-structured information on Wikipedia’ or ‘dataset XY from project Z’). The bottom-up idea here implies that data is added as the result of an action, or interaction, as the user/s go, e.g. relationships are created as the user expands his or her social network. For instance on Amazon, user interaction is used to generate semantics: People do not tell Amazon what they like, they simply buy it.

Having relationships of course does not imply yet that these relationships are part of the Semantic Web. Or, as Alan put it, “why should I be RDFizing my online presence if none of my friends are?”

Alan is going to publish his slides on his keynote page later - what I cannot reproduce here is a chart he developed, which was very useful for describing current scenarios on the web and which posed a twofold question:

Does a website/platform have the web of data implemented? YES/NO
Is the web of data on ta website/platform apparent to the user? YES/NO

The possible combinations (YES/YES, YES/NO, NO/YES, NO/NO) provide a good heuristic tool for describing what is currently available, with and without the Semantic Web. Take, for instance, the shiny interface of Talis’ Project Cenote: Cenote’s vision is to “make library data visible in many contexts, inside and outside of the library, making the data much more accessible and visible to a wider audience - benefiting current and potential users of library services wherever they are.” On Cenote, the user doesn’t see that it’s got the Web of Dat in it - it is actually implemented, but not in a way that is apparent to the user.

On the other end of the spectrum, you have a platform like Facebook: Alan referred to Facebook as “the user’s own web of data”, i.e. web of relationships: The user is aware of these relationships (they actually shape his interaction and communication with the site), and the (numerous!) apps on Facebook continually add relationships, but, regrettably, insulated from one another and not using RDF (and don’t you try to take data out of Facebook!).

Two examples of public data that Alan cited and that grow as people/institutions add data do them are Freebase (the “open database of the world’s information” - see previous posts on this blog about Freebase) and Swivel. Swivel allows people, institutions, anyone to upload and explore data, also featuring official data sources such as (links go to their Swivel pages): New York Federal Reserve Bank, UNESCO Institute for Statistics, DukeResearch or EUROSTAT. According to Alan, there is already more data on Swivel now than in the whole Linked Data cloud.

Alan also mentioned the Social Graph API - o yesterday evening Luca Hammer (one of the web 2.0 people who had joined the Open Hacking Session) introduced me to the Wordpress Plugin “Meet your commenters” - Meet you commenters uses Social Graph to find social relations on the web, and adds these data to the commenter profiles it creates in Wordpress.

Two Christmas crackersImage via WikipediaOn a different note: I took sometime today to explore Alan’s homepage and found the cute Christmas Cracker’s application which was first developed in 1999 and which is now also available on Facebook. As trivial as it may sound at first - sending virtual Christmas Crackers (with more than 5000 possible combinations!) is a good showcase for developing Human Interaction Scenarios, and a number of papers have been written about the application. Here is the casestudy which Alan recommends to begin with: Designing experience - virtual Christmas Crackers.

The abstract and a list of links to all websites and demos Alan discussed can be found here. Full reference: A. Dix and R. Cyganiak (2008). Using the Web of Data. Keynote at WOD-PD 2008 | Web of Data Practitioners Days, Vienna, Austria - Oct 22-23, 2008. http://www.hcibook.com/alan/papers/WOD-PD-2008/

Even if you have not met Richard Cyganiak in person, you have certainly come across one of his creations: The Linked Data Cloud. Richard is a research assistant at DERI Galway. In his demo, he gave us the opportunity to gain hands on experience, introducing a tool he dubbed Snorql, which is basically an easier to use version of a SPARQL-endpoint, as it already has the required prefixes ‘pre-installed’:

Using the Snorql interface, we could explore the dataset we had created collaboratively during Keith Alexander and Yves Raimond’s session. Writing SPARQL queries manually can be a challenge, but is next to impossible if you (like me) don’t know the syntax. But today we could just copy and paste all the queries from a website Richard had put up prior to his session - thanks a lot for the excellent preparation and demonstration!

Richard also showed a couple of RDF browsers in action, e.g. the Tabulator Plugin (”a Firefox extension which allows Firefox to handle data as well as documents”), or the Marbles Linked Data browser which is running right on beckr.org/marbles; enter, for instance http://api.talis.com/stores/wod-pd-sandbox/items/People/JanaHerwig (learn more about Marbles here).

Thank you, Alan and Richard - the combination of talk and demo was indeed a perfect intro towards using the Web of Data.

Reblog this post [with Zemanta]

English , , , , , , , , , , , , , , , , , , , , , ,

Scotland’s Information

October 15th, 2008

Scotland_s Information I’ve been given a heads-up on a new site from SLIC and CILIP in Scotland, which has been developed by the Centre for Digital Library Research (CDLR) at the University of Strathclyde.

Scotland’s Information is a service to help identify and locate Scotland’s wealth of collections held in libraries, archives and museums.

Although the site is live now, it’s official launch will be on 24th October 2008, the centenary of CILIP in Scotland.

Scotland_s Information1 One could be flippant and note that this is just another Google Maps mashup, which it is, but this is a good example of producing something much greater than the sum of it’s parts.  Google Maps mashups have only been around for three years – Google announced their API in June 2005, as we covered here on Panlibus – yet they are a widely used tool on the web and a key part of many a web site that users are familiar and comfortable in using.  It is amazing to note how rapidly the click/drag/zoom metaphor for interacting with a map became the de facto way to do it.

Back to Scotland’s Information, this site draws together a wealth of information about libraries, archives and museums in Scotland and the topics, people and organisations they represent.

Scotland_s Information2 What is in my opinion different and very useful in the way the site works is how you can filter your way through this data (often with the use of tag clouds) to arrive at a map containing pins for each location (museum, library, or archive) that can help you.  For instance this is the result of clicking on Robert Burns in the People tag cloud - ‘Information collections about Robert Burns (1759-1796)’.  35 locations associated with that famous Scot, which can then be limited further for those with wheelchair and/or internet access.

A final link in the chain is that from information about individual collections, there is a link through to the relevant OPAC or search interface.  I would suggest that this could be made even more intuitive, especially if a user has arrived at a link by filtering on a person or subject, by making use of the Silkworm Directory service and it’s API to deep-link in to those collections, directly delivering the results of a relevant search.

A great start, that from day one will deliver a valuable service to those visiting, interested in, and residing in Scotland.  It will be interesting to see how it develops.

English , , , , , , , ,

This Week’s Semantic Web

October 1st, 2008

Special Edition : SIOC Update

I had a man cold when I should have been doing my duty, but with no apologies (fairly safely assuming John has a CC-with-attribution kind of policy) here’s a good proxy :

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271

#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” when you add items.

English , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Release 3.1 - Now in Technology Preview

October 1st, 2008

Release 3.1 now in Technology Preview

Well, it’s been over two weeks since we released something cool (www.semanticproxy.com) – time to get cracking on some new stuff.

We’ve placed Release 3.1 of Calais into technology preview status. Just as a reminder, technology preview is a separate instance of Calais that allows developers to evaluate new features and test their software prior to our moving the release to production. You can access the Preview by simply pointing your tool to http://beta.opencalais.com rather than http://api.opencalais.com. Just like Calais, the preview version requires that you have a developer API key – your existing key will work just fine.

This will be a relatively extended Preview – most likely lasting throughout October 2008. We want to give everyone the opportunity to test some significant new features and make sure we have adequate time to respond to any issues you discover. That being said – please don’t wait until the last minute to give things a spin.

As you may have noticed, our releases are getting significantly larger and incorporating substantial new functionality on a monthly basis. Release 3.1 is no different – it contains everything from major new capabilities such as company and geography disambiguation to performance improvements to new output formats to some significant expansions of the types of information it can extract.So, in our tradition of lengthy blog postings – here’s an overview of what’s new in 3.1. I’ve broken this up into a few high-level focus areas. You can also visit the release notes right here.

New and Significant (at least to some of us)

Release 3.1’s big new functionality is disambiguation of company names and geographies. One of the big challenges of automated entity recognition is how to deal with ambiguity – for example “IBM”, “IBM Corp” “International Business Machines”. For the vast majority of use cases you want each of these variations resolved to a single entity called “IBM”. There are similar challenges around geography such as Calais, Maine vs. Calais, France.

For companies we’ve implemented a sophisticated disambiguation capability that is driven by a reference database of tens of million of company names and their variations. This database is primarily focused on public companies – but we’ll be expanding it to contain a broader range of companies in the future. In addition to variations on a company name, we also use hints that may exist in the text, such as location or industry, as additional evidence.

For geography we’re utilizing elements of DBPedia and other public data assets to dive in and figure out which Calais or Paris or wherever the text is really talking about. We base this disambiguation not just on the name itself – but hints in the surrounding text (for example longhorns are seldom discussed in the same article as Paris, France – but Paris, Texas is another story). To jumpstart mapping applications we also return the geo coordinates of the geography we’ve detected.

Efficiency and Scalability

We’ve implemented a couple of changes to make life easier for our higher-volume users. First, you now have the option to tell Calais you do not want a copy of the original text returned to you. If your application doesn’t care about offsets of detected items in the text you might consider turning this option on to reduce your bandwidth utilization.

Second, Calais now supports HTTP traffic compression. Given that we’re dealing with text on the input and output sides of the transaction, this can dramatically reduce the size of your transaction, again reducing your bandwidth utilization.

New Output Formats and Integrations

Please take a look at the Release Notes for details on a number of small changes to the RDF, MicroFormats and Simple format outputs. We’ve also added a JSON output format that’s covered in more detail here.

Calais now also talks PopFly! Microsoft’s PopFly is an interesting mashup building platform with a visual development interface. You can now directly integrate Calais within your PopFly mashups. Our documentation for this capability is available here.

Getting Smarter

In keeping with prior releases Calais is also getting smarter. We’ve added a number of new elements to the Calais vocabulary. These include PatentFiling, PatentIssuamce, FDAPhase, PersonEmailAddress, PersonEmployment, new elements for PersonAttributes, and SecondaryIssuance. In addition to these elements, we have one particularly interesting one: PersonRelation. The PersonRelation entity extracts references to symmetric relationships between people in the areas of business, friends, academic, military service or politics. This is one you’ll have to play with to get an idea of – but here’s a simple example:

The text:

The two served together in combat, and McDonald said Odierno was an "absolute joy to work with”.

Would result in:

Person1:  Mark McDonald
Person2:  Ray Odierno
PersonRelationType: Military Service

That’s it for R3.1. Any questions, please feel free to post to the forums or drop us a note at questions@opencalais.com. I’ll be posting an update on what’s in the pipeline for R4 in the next few days – lots of interesting stuff is on the way.

English , , , , , , , , , , , , ,

Tales from the SIOC-o-sphere #8

October 1st, 2008

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271
#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” to add items.

English , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Thomson Reuters Sends Zotero a $10 Million EndNote

September 28th, 2008

Reuters-Zotero George Mason University is being sued by Thomson Reuters to prevent the distribution of the excellent Firefox plugin, Zotero.  As reported via the Courthouse News Service:

Thomson Reuters demands $10 million and an injunction to stop George Mason University from distributing its new Web browser application, Zotero software, an open-source format that allows users to convert Reuters’ EndNote Software. Reuters claims George Mason is violating its license agreement and destroying the EndNote customer base.

Subject of a Talking with Talis podcast last year with Trevor Owens, Zotero is an impressive free open source tool for capturing, organising and citing research resources, that has been building a successful community of users around it.

Thomson Reuters is complaining about the 1.5 preview release of  Zotero, announced on July 8th, which introduces several new features including:

Support for thousands of existing Endnote® export styles.

Following that link to Endnote export styles you end up on a page containing the following words:

EndNote output styles are provided solely for use by licensed owners of EndNote and with the EndNote product.

That seems to be the bit that is behind the legal action taken.  The question is can they, or should they, enforce such a restriction – not being a legal expert I’ll stop ruminating further in that direction.

The folks in the Center for History and New Media at George Mason, must be wondering what has hit them, but you can’t go rattling the current business model of a someone the size, history and market position of Thomson Reuters without expecting some form of backlash.

I can imagine the cries of outrage that will emanate from the Open Source and Open Data communities because of this.  They will no doubt be matched by indignation and litigious thoughts from the commercial sector as other publishers check to see how Zotero is helping to distribute their output but not necessarily in a way they would like.

It’s ironic then that somewhere else in the Thompson  Reuters organisation there is a site/service with the following ambition:

We want to make all the world’s content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the Semantic Web or the Giant Global Graph - we call our piece of it Calais.

Calais (Powered by Thomson Reuters) is a semantic web technology based project which in simple terms provides an API to information about people, organisations, geographies, books, authors, events, facts about them, and links between them.  It is a free API service can be used openly, for commercial and non-commercial use, to enrich applications.  (For an insight in to Calais and how it fits with Reuters’ business, I can recommend the podcast Paul Miller recorded with Barak Pridor of ClearForest, the technology with which Calais has been built).

The action being taken against Zotero is symptomatic of the classic growing pains as technology and distribution mechanisms move on.  Just like the scribes complaining  about movable type in the 1400’s, or  the music industry complaining about the mp3 download culture that emerged some 600 years later.

I predict that this will only one skirmish in a series of battles that will ensue as the information and knowledge publishing and distribution industry morphs into something new.  Will actions like this prevent it happening? - of course not.  Will it slow it down? – possibly.   If I was part of the Zotero project would I be worried? – yes, I might be;  some of the early vanguards of the music download revolution were forced out of the race by such legal challenges.  Nevertheless, be it the opening of access to newly created knowledge or providing useful open access to traditionally controlled data, things are a changing.  We will look back on actions like the one against Zotero, viewing them as inevitable battles to try to preserve rapidly outdating business models – anybody read the Innovator’s Dilemma recently!

I hope  that the Zotero folks survive to reap the rewards of their pioneering efforts.

English , , , , , , , , , , , , , , ,

Released today: SemanticProxy.com

September 23rd, 2008

We released Calais a little less than nine months ago. It’s been a fascinating process and an edifying period.

On the one hand we’ve seen a level of interest and adoption well beyond anything we’d anticipated: 6,000 registered developers. Well over 1,000,000 transactions per day. Dozens of creative and inspirational applications. It’s been great.

On the other, we have been reminded that semantically enabling the web is primarily a challenge of critical mass. Publishers are waiting for semantic consumers (search engines, news aggregators and applications) before they work on adding semantic metadata to their content.  Meanwhile application developers are waiting for the publishers to act.

We know we’ll get there in the end – but it’s slower than we’d like to see.

SemanticProxy is our attempt to jumpstart the semantic consumer end of the equation. We have all the standards we need.  What we’re missing is a critical mass of semantically enhanced content.

SemanticProxy doesn’t solve that problem, but it can act as a catalyst.

SemanticProxy makes any web site – particularly news sites – behave like a semantically enabled web site. Instead of making you write the programs to fetch a page, clean the HTML, process it with Calais and then get the resulting RDF, SemanticProxy does the heavy lifting.

You hand it a URL, and SemanticProxy hands back rich semantic metadata as RDF or MicroFormats.

SemanticProxy follows the standards for publishing linked data on the web – a good overview of which can be found here.

The best way to experiment with it is to get a Calais API key and use the URL Builder. Copy the resulting URL, paste it into your browser and you’ll see the results.

While doing that will show you what’s going on, SemanticProxy is really meant for machines to talk to. You could construct a simple web crawler that fetches the semantic content of each page. You could build a browser plugin that exposes the underlying semantic content of a page while you’re browsing. We’re looking forward to seeing your ideas.

It’s still in beta. We’ve optimized it for the top 30 English language news sites – but it works quite well on Wikipedia and other sites as well. Go forth and experiment.

We’ve designed SemanticProxy to scale well with demand. It runs almost entirely in the cloud.  It can handle tens of millions of transactions a day right now, and it can scale to hundreds of millions whenever we need to.

Visit SemanticProxy.com and let us know what you think. We’d appreciate feedback, ideas and critiques.

 

English , , , , , ,

Released today: SemanticProxy.com

September 23rd, 2008

We released Calais a little less than nine months ago. It’s been a fascinating process and an edifying period.

On the one hand we’ve seen a level of interest and adoption well beyond anything we’d anticipated: 6,000 registered developers. Well over 1,000,000 transactions per day. Dozens of creative and inspirational applications. It’s been great.

On the other, we have been reminded that semantically enabling the web is primarily a challenge of critical mass. Publishers are waiting for semantic consumers (search engines, news aggregators and applications) before they work on adding semantic metadata to their content.  Meanwhile application developers are waiting for the publishers to act.

We know we’ll get there in the end – but it’s slower than we’d like to see.

SemanticProxy is our attempt to jumpstart the semantic consumer end of the equation. We have all the standards we need.  What we’re missing is a critical mass of semantically enhanced content.

SemanticProxy doesn’t solve that problem, but it can act as a catalyst.

SemanticProxy makes any web site – particularly news sites – behave like a semantically enabled web site. Instead of making you write the programs to fetch a page, clean the HTML, process it with Calais and then get the resulting RDF, SemanticProxy does the heavy lifting.

You hand it a URL, and SemanticProxy hands back rich semantic metadata as RDF or MicroFormats.

SemanticProxy follows the standards for publishing linked data on the web – a good overview of which can be found here.

The best way to experiment with it is to get a Calais API key and use the URL Builder. Copy the resulting URL, paste it into your browser and you’ll see the results.

While doing that will show you what’s going on, SemanticProxy is really meant for machines to talk to. You could construct a simple web crawler that fetches the semantic content of each page. You could build a browser plugin that exposes the underlying semantic content of a page while you’re browsing. We’re looking forward to seeing your ideas.

It’s still in beta. We’ve optimized it for the top 30 English language news sites – but it works quite well on Wikipedia and other sites as well. Go forth and experiment.

We’ve designed SemanticProxy to scale well with demand. It runs almost entirely in the cloud.  It can handle tens of millions of transactions a day right now, and it can scale to hundreds of millions whenever we need to.

Visit SemanticProxy.com and let us know what you think. We’d appreciate feedback, ideas and critiques.

 

English , , , , , ,

LibraryThing’s Million Cover Giveaway