Archive

Posts Tagged ‘HTML’

The Day after Freebase went RDF

October 30th, 2008

So what’s been happening on the blogosphere after John Giannandrea’s keynote at ISWC and the revelation that Freebase now produces Linked Data from an RDF service

Tetherless World sums up the Freebase facts (e.g. 156,000,000 assertions made; 1370 published types; 75 domains; graph model, identity, web based) and further points out that ontology creation “is a social process, and both freebase and semantic wiki are tools that enable users to create ontological vocabulary without worrying too much on building a comprehensive ontology.”

Inkdroid notes that the RDF service release “is important news because Freebase is an active community of content creators, creating rich data-centric descriptions with a wiki style interface, fancy data loaders, and useful machine APIs.” This is followed up by a quick and handy tutorial how you can get machine readable data back from freebase using a URI with Freebase. Conclusion:

So why is this important? Because following your nose in HTML is what enabled companies like Lycos, AltaVista, Yahoo and Google to be born. It allowed for agents to be able to crawl the web of documents and build indexes of the data to allow people to find what they want (hopefully). Being able to link data in this way allows us to harvest data assets across organizational boundaries and merge them together. It’s early days still, but seeing an organization like Freebase get it is pretty exciting.

Yves Raimond was the first to wonder on the public W3C LOD mailinglist: “now, to see whether it links to other datasets :-)” - the idea of having linked data without the linkage would indeed seem like love’s labour lost. Semantic Focus / James Simmons seconds: “One downside is the data doesn’t appear to link to external resources, in a sense walling itself in. It should be trivial to link the topics that came from Wikipedia back to Wikipedia as well as DBpedia (which would be killer, by the way).” This is followed up a later post, where James expresses concerns regarding the relationship DBpedia / Freebase: “Freebase may see a drop in userbase growth and participation if it becomes a mirror of DBpedia (or vice-versa) and the popularity once garnered by one project may shift towards the other, or away entirely.”

More News / Andrew Newman puts the Freebase RDF service release in context with Cathrin Weiss’ “250 million triples on your iphone” submission, iMoCo, to the Billion triples challenges, also DBpedia and Semaplorer, developed at the University of Koblenz:

DBPedia stood out because it was the only one that allowed you to write data to the Semantic Web rather than just read the carefully prepared triples. For a similar reason I though SemaPlorer was good because they tried to do more than just the standard triples but went that extra bit further by making it more generic like integrating flickr. But they were all excellent, all of them showing what you get with a billion or more triples and inferencing.

That combined with the guys at Freebase making all of their data available as RDF and it was a big day for the Semantic Web.

ARQtick / AndyS plays a bit with the Blade Runner example cited by Freebase, e.g. takes a look at the graph, looks for interesting properties and extracts author names

N.B. If you want to follow ARQtick’s example: use the Linked Data browser plugin Tabulator or go to the Marbles site to view the RDF - without a data browser you’ll be redirected to the HTML page. You will also need it to make sense of rdf.freebase.com.

English , , , , , , , , , ,

Freebase Officially Linked Data with Release of RDF Service

October 29th, 2008

At ISWC2008 Freebase released its new RDF service for generating RDF representations of Freebase topics, allowing Freebase to be used as Linked Data! To obtain the RDF data for a topic send a GET request to http://rdf.freebase.com/rdf/some.topic.id where "some.topic.id" is replaced by the desired topic identifier (slashes in the identifier must be replaced by dots). Topic data can be represented as N3, RDF/XML or Turtle depending on the preferences expressed in your client's HTTP Accept header. Try it out with the Freebase topic Semantic Web.

You can also cater to clients that prefer HTML output by using the /ns end-point (http://rdf.freebase.com/ns/some.topic.id). The service performs the content negotiation automatically; delivering human-friendly HTML representations to Web browsers, and redirecting clients expecting RDF to the /rdf URL (via 302 redirect).

One downside is the data doesn't appear to link to external resources, in a sense walling itself in. It should be trivial to link the topics that came from Wikipedia back to Wikipedia as well as DBpedia (which would be killer, by the way).

Got something to say? Leave a comment!

English , , , , ,

Interview for Journalism.co.uk… Journalists get to know the Semantic Web!

October 29th, 2008

I was interviewed last week by Colin Meek from Journalism.co.uk on the topic of “Web 3.0″ and what it means for journalists… You can read the full article in two parts (1, 2). My original answers are part of an interview on their Insite blog. I also had the chance to talk about various DERI offerings in the Semantic Web area including SIOC, SWSE, Sindice, Semantic Radar, etc.

Colin also asked me about other readable data that is being crawled by Semantic Web search engines like Sindice, SWSE or Swoogle. These search engines can usually match keywords in any data that has been crawled or integrated into a semantic store, not just people. It could be from structured information about people, places, dates, library documents, blog items or topics, whatever. In fact, there is no limit to the types of things that can be indexed and searched - since RDF (an open data model that can be adapted to describe pretty much anything) is used as the data format. Anyone can reuse existing RDF vocabularies like SIOC to publish data, or they can publish data using their own custom vocabularies (e.g. to describe stamp collecting or Bollywood movie genres or whatever), or they can combine public and custom vocabularies (e.g. take FOAF and your own vocabulary about soccer to describe players and managers on a soccer team). Geotemporal information is particularly useful across a range of domains, and provides nice semantic linkages between things. For example, having geographic information and time information is useful for describing where people have been and when, for detailing historical events or TV shows, for timetabling and scheduling of events, etc., and for connecting all of these things together (”I’m travelling to Edinburgh next week: show me all the TV shows of relevance and any upcoming events I should be aware of according to my interests…”).

The keyword searches in the Sindice search engine allow you to find more information on where resources of interest are (searching for “john breslin” will point to all public pages that contain semantic information about yours truly). Sindice also has an API that can provide results in a resuable (semantic) format that can be leveraged by other applications. Alternately, SWSE (Semantic Web Search Engine) shows you semantic information about the object of interest (e.g. my phone number, my friends, etc.) which may be derived from multiple sources (e.g. this information on me comes from tens of sources consolidated together via unique identifiers for me or through what’s called “object consolidation”).

For me, this article highlighted the fact that the Semantic Web community needs to be very aware that one of the key features of the Social Web for journalists and for many others is the ability to find a lot of personal and sensitive information on people, and with the advent of “Web 3.0″, we need to realise that (”with great power comes great responsibility”) the availability of contextual and semantically-related information is going to become even more apparent, and people will talk about it in both positive and negative terms. Educating site owners about what semantic data they may be publishing (knowingly or unknowingly, even if it’s just RSS feeds) is needed, and developers should determine exactly what opt-in or opt-out mechanisms are required before implementing semantic solutions. Users also should be aware of the benefits and other potential uses of their semantic data.

I think now is the time to avert any scares, because in reality, the data that is on the Web or the Social Web can be used in new ways anyway, whether metadata is present or not (some facts can be derived). Google have recently implemented some discussion forum parsing algorithms to determine how many posts are on a thread, how many users posted on that thread and when the last post was made. You can see this in a search result I did for “irish pubs boards.ie” below. It’s not complete, and probably relies on identifying certain HTML structures for non-Google discussion sites, e.g. you can see two threads in the middle that don’t show details of the total posts or commenters. But it’s moving towards the SIOC vision of providing more metadata about discussions on the Web to help you in finding more relevant information - whether the site owners want to provide Semantic Web data or not!

Making data available semantically enables computers to help us do things we cannot easily do (or cannot do at all) right now, and this is what makes it so powerful. We also need to think more towards educating people about the benefits as well as how we can minimise any hazards. Is this a job for W3C SWEO? As my colleague James Cooley said: “I think scientists thought the benefits of GM food were so obvious that there was no case to make. Then you got Frankenstein Food and the game was up.”

For journalists interested in the Semantic Web, I’d recommend reading this paper entitled “SemWebbing the London Gazette” by Jeni Tennison and John Sheridan which describes how they have exposed information from their newspaper website using RDFa so that it becomes easy to re-use (slides here). You can also view some interesting slides by Colin Meek from a seminar he gave to journalists about the Social Web in Olso a few days ago. It’s in three parts (1, 2, 3). I’ve embedded the third part (on the Semantic Web) below…

Other posts referencing this article:

English , , , , , , , , , , , , , , , , , , , , , , , , , ,

This Week’s Semantic Web

October 1st, 2008

Special Edition : SIOC Update

I had a man cold when I should have been doing my duty, but with no apologies (fairly safely assuming John has a CC-with-attribution kind of policy) here’s a good proxy :

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271

#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” when you add items.

English , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Tales from the SIOC-o-sphere #8

October 1st, 2008

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271
#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” to add items.

English , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Publishing Linked Data With PHP

September 30th, 2008

For a while now I’ve been experimenting with writing my own little PHP applications that run against the Talis Platform. Most of these have never been seen in public because they’re mainly just for scratching an itch I have at the time. I’ve also used a lot of them to validate my own thinking around the types of services that the platform needs to provide to build interesting applications. The core of most of those applications became Moriarty my PHP library for accessing the platform. I use Moriarty extensively now to kick start any development I do. I’m even using it to write PHP scripts for running at the command line. I’m not sure that PHP is going to usurp Perl from my toolbox, but it’s certainly becoming my language of choice for working with RDF.

I’ve been looking carefully at the core patterns that my PHP applications have been following to see if there’s anything else I could pull out. This is generally how I prefer to build new libraries: extracting them from several different projects. Assuming you know how your library is going to work before you’ve written any applications is almost always wrong. I like using libraries that have distilled the essence of repeated attempts at solving the same problem. That’s why I never think about modularization of a codebase until I need to.

I’ve been gravitating towards Konstrukt because it appears to be the least intrusive of the PHP web application frameworks out there and it keeps fairly true to REST principles. I used it to build Kniblet as part of a platform tutorial. However, there are some quirks that it has that I don’t like. For example, to return anything other than HTML requires you to throw an exception. That mechanism works quite well for most applications but doesn’t really suit data-rich applications that have multiple output formats.

It’s with this in mind that I’ve started a new PHP web application framework called Paget. Calling it a framework is somewhat of an overstatement. It’s a few classes that make it easy to publish RDF as linked data. It’s very primitive at the moment, but it’s quite versatile.

It uses a simple configuration array that is passed to a dispatcher that handles the request. The application’s default behaviour is specified using this configuration. One part sets up a series of regular expressions that match URI paths handled by the application and map them to the resources it provides. The data about each resource is obtained by using one or more “generators”. These are simply classes that generate RDF for the given resource. Paget runs each generator to gather the RDF data describing the resource and then handles the serving up of that data according to linked data principles. Right now that’s just enough behaviour to function as a generic linked data publishing framework.

I have three different deployments of Paget that are publishing three RDF data sets using different generators. Each of these was quite trivial to set up, being a few lines of confiiguration. For my own site’s data space I wrote a generator that fetched RDF directly from one of my platform stores (this one) and served it up as HTML and various flavours of RDF. See, for example, http://iandavis.com/id/me which is URI that identifies me.

My second deployment was for PlaceTime, a URI space that I have operated since 2003. It provides RDF data for timelike entities like instants and intervals and spacelike points. However, it hasn’t been fully linked data compliant (mainly because it pre-dated the decision on httpRange-14). I wrote a generator for each type of entity that creates trivial RDF for each valid URI in the space. Some examples:

Finally, I created a generator that reads a local RDF file. I then used it to serve up the whisky vocabulary that Tom, I and several others created at the recent VoCamp Oxford

Admittedly, all these datasets and spaces look pretty similar but this is still early days for Paget. I have some ideas for future development that will flesh out Paget into a fully-fledged RDF driven application framework. For example: as well as generators I plan to add filters, augmenters and transformers that alter the generated data in arbitrary fashions. These could be used to trim the data down, or to convert it to a more usable structure. I can imagine that it would be very useful to be able to pull in more RDF from arbitrary locations on the Web to supplement data in the initial set, e.g. with schema information or additional details. In my opinion that’s one of the significant differences between the web of data and the web of documents: the web of data is going to enable more information to be brought automatically together for the user rather than forcing them to seek it out.

Paget’s HTML rendering of RDF is very primitive at the moment, making only basic attempts to make it human readable. It’s still extremely tabular which is hardly a great use of structured information. One area that I’ve been interested in exploring is that of dynamic user interfaces that adapt to the underlying data automatically. RDF is particularly amenable to building these kinds of interfaces because of its uniform data model. A lot of work on this was done by the Fresnel project and it would be interesting to apply some of the learnings from that project to building dynamic web applications. My goal here is to code as little specific behaviour into the application as possible, instead making the application detect patterns in the data and provide suitable user interface behaviours at runtime. This is really the only way we’re going to be able to build true open world applications, i.e. those that are tolerant of missing data and can adapt to new and unanticipated data.

What I’m still experimenting with is whether these user interface additions should be server-side or passed on to the client. Some of the augmentations could make more sense when actioned by the client based on user activity.

There’s lots to research here and hopefully some of these ideas will make it into Paget very soon.

English , , , , , , , , , ,

Why Faviki is able to suggest tags in 13 languages

September 26th, 2008

Just got in touch with Vuk Miličić from Faviki recently - Faviki has been selected as a featured project on Google code, and in that context, Vuk describes the process of how Faviki retrieves its suggestions in a little more detail. It’s really interesting! It also sheds more light on the way that DBpedia is used in Faviki: Not immediately for the retrieval of tags, but for the translation of tags - long live the smartness of linked data!

  1. Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
  2. Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
  3. The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia - titles are used as semantic tags.
  4. If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of Wikipedia’s titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
  5. Finally, suggested tags are offered to a user.

Read the whole blog post on Vuk’s Faviki blog

Reblog this post [with Zemanta]

English , ,

Released today: SemanticProxy.com

September 23rd, 2008

We released Calais a little less than nine months ago. It’s been a fascinating process and an edifying period.

On the one hand we’ve seen a level of interest and adoption well beyond anything we’d anticipated: 6,000 registered developers. Well over 1,000,000 transactions per day. Dozens of creative and inspirational applications. It’s been great.

On the other, we have been reminded that semantically enabling the web is primarily a challenge of critical mass. Publishers are waiting for semantic consumers (search engines, news aggregators and applications) before they work on adding semantic metadata to their content.  Meanwhile application developers are waiting for the publishers to act.

We know we’ll get there in the end – but it’s slower than we’d like to see.

SemanticProxy is our attempt to jumpstart the semantic consumer end of the equation. We have all the standards we need.  What we’re missing is a critical mass of semantically enhanced content.

SemanticProxy doesn’t solve that problem, but it can act as a catalyst.

SemanticProxy makes any web site – particularly news sites – behave like a semantically enabled web site. Instead of making you write the programs to fetch a page, clean the HTML, process it with Calais and then get the resulting RDF, SemanticProxy does the heavy lifting.

You hand it a URL, and SemanticProxy hands back rich semantic metadata as RDF or MicroFormats.

SemanticProxy follows the standards for publishing linked data on the web – a good overview of which can be found here.

The best way to experiment with it is to get a Calais API key and use the URL Builder. Copy the resulting URL, paste it into your browser and you’ll see the results.

While doing that will show you what’s going on, SemanticProxy is really meant for machines to talk to. You could construct a simple web crawler that fetches the semantic content of each page. You could build a browser plugin that exposes the underlying semantic content of a page while you’re browsing. We’re looking forward to seeing your ideas.

It’s still in beta. We’ve optimized it for the top 30 English language news sites – but it works quite well on Wikipedia and other sites as well. Go forth and experiment.

We’ve designed SemanticProxy to scale well with demand. It runs almost entirely in the cloud.  It can handle tens of millions of transactions a day right now, and it can scale to hundreds of millions whenever we need to.

Visit SemanticProxy.com and let us know what you think. We’d appreciate feedback, ideas and critiques.

 

English , , , , , ,

Released today: SemanticProxy.com

September 23rd, 2008

We released Calais a little less than nine months ago. It’s been a fascinating process and an edifying period.

On the one hand we’ve seen a level of interest and adoption well beyond anything we’d anticipated: 6,000 registered developers. Well over 1,000,000 transactions per day. Dozens of creative and inspirational applications. It’s been great.

On the other, we have been reminded that semantically enabling the web is primarily a challenge of critical mass. Publishers are waiting for semantic consumers (search engines, news aggregators and applications) before they work on adding semantic metadata to their content.  Meanwhile application developers are waiting for the publishers to act.

We know we’ll get there in the end – but it’s slower than we’d like to see.

SemanticProxy is our attempt to jumpstart the semantic consumer end of the equation. We have all the standards we need.  What we’re missing is a critical mass of semantically enhanced content.

SemanticProxy doesn’t solve that problem, but it can act as a catalyst.

SemanticProxy makes any web site – particularly news sites – behave like a semantically enabled web site. Instead of making you write the programs to fetch a page, clean the HTML, process it with Calais and then get the resulting RDF, SemanticProxy does the heavy lifting.

You hand it a URL, and SemanticProxy hands back rich semantic metadata as RDF or MicroFormats.

SemanticProxy follows the standards for publishing linked data on the web – a good overview of which can be found here.

The best way to experiment with it is to get a Calais API key and use the URL Builder. Copy the resulting URL, paste it into your browser and you’ll see the results.

While doing that will show you what’s going on, SemanticProxy is really meant for machines to talk to. You could construct a simple web crawler that fetches the semantic content of each page. You could build a browser plugin that exposes the underlying semantic content of a page while you’re browsing. We’re looking forward to seeing your ideas.

It’s still in beta. We’ve optimized it for the top 30 English language news sites – but it works quite well on Wikipedia and other sites as well. Go forth and experiment.

We’ve designed SemanticProxy to scale well with demand. It runs almost entirely in the cloud.  It can handle tens of millions of transactions a day right now, and it can scale to hundreds of millions whenever we need to.

Visit SemanticProxy.com and let us know what you think. We’d appreciate feedback, ideas and critiques.

 

English , , , , , ,

Twine massive CCK08 invitations

September 13th, 2008

I am a CCk08 student and an active user of Twine. Days ago I announced the creation of a Twine on conectivism in the course forum, offering everyone invitations to test it.

Other participants  have proposed to send massive invitations to CCK08 participants. Although I thought that Nova Spivack, Radar networks CEO would show some reluctance (we´re more than a thousand), he has written me today agreeing to provide us, not only whatever invitations we could need but also a contact that would help us to organize the massive shipment. 

I believe that it would be positive to create a semantic knowledge basis on Connectivism.

So, if you want to test Twine, you can post a comment in this post, contact me or send an email and you´ll be invited to join as soon as posible. I´ve created a Twine about Connectivism and this course related questions. You´re invited, once at Twine, to join it. 

Thanks in advance to George Siemens and Stephen Downes for collaborate.
 

English Resources: 
-Twine Tutorials.
-Screencasts.

Share and Enjoy: Digg Sphinn del.icio.us Facebook Mixx Google BarraPunto blinkbits BlinkList blogmarks BlogMemes Blogsvine connotea De.lirio.us description e-mail Furl LinkaGoGo Live Ma.gnolia Meneame MisterWong NewsVine Pownce Propeller Reddit Slashdot SphereIt Spurl StumbleUpon Technorati TwitThis Wikio YahooMyWeb E-mail this story to a friend! LinkedIn Print this article! Blogosphere News

English , , ,

Google Chrome prefers XHTML

September 2nd, 2008

The blogosphere is restless about Chrome, the new open source browser developed by Google. But I´m not going to discuss its software design, its performace or it usability, there are many people talking about it. I´ll talk about a technical detail: it prefers XHTML instead the classic HTML.

How to know it? As probaby you know in HTTP there is a header called Accept to specify format types which are acceptable. Requesting this service developed by Richard Cyganiak with Chrome, we can get the value of that header:

text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

English, Spanish , , , ,

This Week’s Semantic Web

September 1st, 2008

Selected links related to Semantic Web technologies for the week ending 2008-09-01, all weeks. Also available in RDF as linked data or via GRDDL.

Summer Special!

(or Winter Special! down under)

FOAFlets of the Carribean

FOAFlets of the Caribbean

No blurb, just links.

In the Media

Docs

Software News

Vocabs/Ontologies