Archive
The Day after Freebase went RDF
So what’s been happening on the blogosphere after John Giannandrea’s keynote at ISWC and the revelation that Freebase now produces Linked Data from an RDF service…
Tetherless World sums up the Freebase facts (e.g. 156,000,000 assertions made; 1370 published types; 75 domains; graph model, identity, web based) and further points out that ontology creation “is a social process, and both freebase and semantic wiki are tools that enable users to create ontological vocabulary without worrying too much on building a comprehensive ontology.”
Inkdroid notes that the RDF service release “is important news because Freebase is an active community of content creators, creating rich data-centric descriptions with a wiki style interface, fancy data loaders, and useful machine APIs.” This is followed up by a quick and handy tutorial how you can get machine readable data back from freebase using a URI with Freebase. Conclusion:
So why is this important? Because following your nose in HTML is what enabled companies like Lycos, AltaVista, Yahoo and Google to be born. It allowed for agents to be able to crawl the web of documents and build indexes of the data to allow people to find what they want (hopefully). Being able to link data in this way allows us to harvest data assets across organizational boundaries and merge them together. It’s early days still, but seeing an organization like Freebase get it is pretty exciting.
Yves Raimond was the first to wonder on the public W3C LOD mailinglist: “now, to see whether it links to other datasets :-)” - the idea of having linked data without the linkage would indeed seem like love’s labour lost. Semantic Focus / James Simmons seconds: “One downside is the data doesn’t appear to link to external resources, in a sense walling itself in. It should be trivial to link the topics that came from Wikipedia back to Wikipedia as well as DBpedia (which would be killer, by the way).” This is followed up a later post, where James expresses concerns regarding the relationship DBpedia / Freebase: “Freebase may see a drop in userbase growth and participation if it becomes a mirror of DBpedia (or vice-versa) and the popularity once garnered by one project may shift towards the other, or away entirely.”
More News / Andrew Newman puts the Freebase RDF service release in context with Cathrin Weiss’ “250 million triples on your iphone” submission, iMoCo, to the Billion triples challenges, also DBpedia and Semaplorer, developed at the University of Koblenz:
DBPedia stood out because it was the only one that allowed you to write data to the Semantic Web rather than just read the carefully prepared triples. For a similar reason I though SemaPlorer was good because they tried to do more than just the standard triples but went that extra bit further by making it more generic like integrating flickr. But they were all excellent, all of them showing what you get with a billion or more triples and inferencing.
That combined with the guys at Freebase making all of their data available as RDF and it was a big day for the Semantic Web.
ARQtick / AndyS plays a bit with the Blade Runner example cited by Freebase, e.g. takes a look at the graph, looks for interesting properties and extracts author names
N.B. If you want to follow ARQtick’s example: use the Linked Data browser plugin Tabulator or go to the Marbles site to view the RDF - without a data browser you’ll be redirected to the HTML page. You will also need it to make sense of rdf.freebase.com.
Utility computing in the Cloud
It is usually more interesting and educational to see a good heart-felt debate than complete agreement so you are in for a treat if you take the time to read the following from Nick Carr, Tim O’Reilly and the Smoothspan blog.
You can see from the debate that economics is at the heart of the discussion yet not understood in the same way by the three. I find myself pretty much in agreement with Tim, but it might be worth pulling out some of the strands to clarify. I think there is real confusion between economies of scale, direct and indirect network effects.
In this post I will focus on the utility computing layer in the cloud. I think the economics of platform as a service (PAAS), especially the cruical distinction between direct versus indirect network effects for defensibility, needs its own post.
It’s pretty clear that utility cloud computing is highly capital intensive so it should come as no surprise that there are powerful economies of scale to be had. But the bottom line is that you are talking about plant and power. These are rival goods, scarce resources that are created and consumed. This is not different from many utility industries with one exception: the distribution network has global reach, already exists and is very cheap compared to existing utility distribution networks. It is a lot cheaper to access a computing resource on the other side of the planet than it is to send electricity or gas across the globe. So maybe Hugh McLeod ) is right. What is to stop economies of scale turning this into a global natural monopoly?
Actually, unless there are some large network effects, quite a lot stops single companies ruling entire industries. For a start, without network effects, economies of scale tend to run out: the curve is usually U-shaped ( take a look at http://en.wikipedia.org/wiki/Economies_of_scale ). Telecoms, Gas, rail companies have strong network effects from their infrastructure—it makes little sense to have duplicate rail networks or gas networks in a country. Utility computing does not have this advantage because the distribution network is not owned by them.
Smoothspan argues there are two potential network effects that could cause a single winner.
1) Lower costs of data exchange between apps in the same cloud
2) Elasticity
There is a network effect based on increased costs for cross-cloud interoperability, exactly as we have with mobile phone networks today. I don’t think this is a significant, long-term issue because we are talking about a relatively small number of cloud providers thanks to capital costs. Ironically, that means the cost of providing massive high speed bandwidth BETWEEN different cloud providers is actually very small; especially when compared with the cost of providing large bandwidth to every single home and mobile phone in the world. And, of course, the backbone telecoms providers are already geared up to provide exactly this kind of point to point, high capacity infrastructure.
If a cloud provider artificially inflated their cross cloud costs, they would directly cut the available data-sharing applications for a customer and would suffer a big negative network effect compared to providers that ensured their cloud was as open to cross cloud use as possible. Would you choose the walled garden?
Regarding the second point: I think Smoothspan is confusing economies of scale with network effects. A larger provider can more easily deal with variation of demand, but this is an economy of scale (the cost of providing variable demand of size X to a customer is lower for a bigger player) and in fact is a negative network effect; just like your Internet connection at home. If every other customer stopped using the service there would be more capacity available for you. If everyone is using the service there is less capacity available: a negative network effect. Just as with the power grid, dealing with variation of demand is more easily managed with multiple providers that can be called on when require. In the single supplier model, they have no one to share demand peaks with and must over-provide capacity far in excess of a shared model.
For me the bottom line on utility computing is that it is very much like the provision of telecoms and power but without the network effect of owning the network. I would not be surprised to see backwards integration along the supply chain in this industry (i.e. a power generator and a bulk telecoms provider might have the infrastructure and capital structure to build data centres more cost effectively than Google, Amazon or MS as the market matures).
This market is no where near mature. I expect that Google, Amazon and MS are still there own biggest cloud customers.
With the rise of utility computing in the cloud, it will soon become very easy to create a PAAS offering because the utility computing provider absorbs the large fixed costs and rents the infrastructure to the PAAS provider on an incremental marginal cost basis. This is very similar to the virtual mobile network operators (like Virgin) which ride on the back of the network providers. The difference here is that the PAAS has the chance to create powerful network effects.
So to summarise, utility cloud computing is firmly built on economies of scale where as I think cloud based platforms (PAAS) need to be firmly built on the economics of network effects to be defensible. An interesting battle ground for PAAS seems to be centred around the difference between software centric and data centric network effects, but more on that in a later post.
Interview for Journalism.co.uk… Journalists get to know the Semantic Web!
I was interviewed last week by Colin Meek from Journalism.co.uk on the topic of “Web 3.0″ and what it means for journalists… You can read the full article in two parts (1, 2). My original answers are part of an interview on their Insite blog. I also had the chance to talk about various DERI offerings in the Semantic Web area including SIOC, SWSE, Sindice, Semantic Radar, etc.
Colin also asked me about other readable data that is being crawled by Semantic Web search engines like Sindice, SWSE or Swoogle. These search engines can usually match keywords in any data that has been crawled or integrated into a semantic store, not just people. It could be from structured information about people, places, dates, library documents, blog items or topics, whatever. In fact, there is no limit to the types of things that can be indexed and searched - since RDF (an open data model that can be adapted to describe pretty much anything) is used as the data format. Anyone can reuse existing RDF vocabularies like SIOC to publish data, or they can publish data using their own custom vocabularies (e.g. to describe stamp collecting or Bollywood movie genres or whatever), or they can combine public and custom vocabularies (e.g. take FOAF and your own vocabulary about soccer to describe players and managers on a soccer team). Geotemporal information is particularly useful across a range of domains, and provides nice semantic linkages between things. For example, having geographic information and time information is useful for describing where people have been and when, for detailing historical events or TV shows, for timetabling and scheduling of events, etc., and for connecting all of these things together (”I’m travelling to Edinburgh next week: show me all the TV shows of relevance and any upcoming events I should be aware of according to my interests…”).
The keyword searches in the Sindice search engine allow you to find more information on where resources of interest are (searching for “john breslin” will point to all public pages that contain semantic information about yours truly). Sindice also has an API that can provide results in a resuable (semantic) format that can be leveraged by other applications. Alternately, SWSE (Semantic Web Search Engine) shows you semantic information about the object of interest (e.g. my phone number, my friends, etc.) which may be derived from multiple sources (e.g. this information on me comes from tens of sources consolidated together via unique identifiers for me or through what’s called “object consolidation”).
For me, this article highlighted the fact that the Semantic Web community needs to be very aware that one of the key features of the Social Web for journalists and for many others is the ability to find a lot of personal and sensitive information on people, and with the advent of “Web 3.0″, we need to realise that (”with great power comes great responsibility”) the availability of contextual and semantically-related information is going to become even more apparent, and people will talk about it in both positive and negative terms. Educating site owners about what semantic data they may be publishing (knowingly or unknowingly, even if it’s just RSS feeds) is needed, and developers should determine exactly what opt-in or opt-out mechanisms are required before implementing semantic solutions. Users also should be aware of the benefits and other potential uses of their semantic data.
I think now is the time to avert any scares, because in reality, the data that is on the Web or the Social Web can be used in new ways anyway, whether metadata is present or not (some facts can be derived). Google have recently implemented some discussion forum parsing algorithms to determine how many posts are on a thread, how many users posted on that thread and when the last post was made. You can see this in a search result I did for “irish pubs boards.ie” below. It’s not complete, and probably relies on identifying certain HTML structures for non-Google discussion sites, e.g. you can see two threads in the middle that don’t show details of the total posts or commenters. But it’s moving towards the SIOC vision of providing more metadata about discussions on the Web to help you in finding more relevant information - whether the site owners want to provide Semantic Web data or not!
Making data available semantically enables computers to help us do things we cannot easily do (or cannot do at all) right now, and this is what makes it so powerful. We also need to think more towards educating people about the benefits as well as how we can minimise any hazards. Is this a job for W3C SWEO? As my colleague James Cooley said: “I think scientists thought the benefits of GM food were so obvious that there was no case to make. Then you got Frankenstein Food and the game was up.”
For journalists interested in the Semantic Web, I’d recommend reading this paper entitled “SemWebbing the London Gazette” by Jeni Tennison and John Sheridan which describes how they have exposed information from their newspaper website using RDFa so that it becomes easy to re-use (slides here). You can also view some interesting slides by Colin Meek from a seminar he gave to journalists about the Social Web in Olso a few days ago. It’s in three parts (1, 2, 3). I’ve embedded the third part (on the Semantic Web) below…
Other posts referencing this article:
- Sébastien Wiertz: “The Semantic Web Today: An Interview with John Breslin”
- Society of Professional Journalists: “Press Notes: Friday, October 24, 2008″
- The News About the News: “New Tool for Journalists is Unveiled”
- Jornalismo e Comunicação: “Web 3.0″
- Aubergine Cafe: “Semantic Web”
- Kristine Lowe: “Using the Social Web, Oslo 25/10: Live Notes”
- Thoughts of Nigel: “Herald Web 3.0″
Google Book Search pays authors $125M and opens up access to books in the US
Since 2005 Google have been in negotiations over US lawsuits brought by a group of authors and publishers, along with the Authors Guild and Association of American Publishers (AAP) around copyright issues.
Today Google have announced an agreement with AAP that brings the lawsuits to a close and will result in the establishment of the Book Rights Registry. To quote their New chapter for Google Book Search blog post:
Google is also funding the establishment of a Book Rights Registry, managed by authors and publishers, that will work to locate and represent copyright holders. We think the Registry will help address the "orphan" works problem for books in the U.S., making it easier for people who want to use older books. Since the Book Rights Registry will also be responsible for distributing the money Google collects to authors and publishers, there will be a strong incentive for rightsholders to come forward and claim their works.
The money collection they refer to, is from a new feature they will introduce, as explained:
…in addition to being able to find and preview books more easily, users will also be able to read them. And when people read them, authors and publishers of in-copyright works will be compensated. If a reader in the U.S. finds an in-copyright book through Google Book Search, he or she will be able to pay to see the entire book online. Also, academic, library, corporate and government organizations will be able to purchase institutional subscriptions to make these books available to their members. For out-of-print books that in most cases do not have a commercial market, this opens a new revenue opportunity that didn’t exist before.
Google in this announcement, also recognise the value of books to libraries, and obviously the value the Book Search service has gained from partnering with some of them:
In addition to expanding the commercial market for these books, Google, the authors and the publishers have worked hard with our library partners at Stanford, the University of Michigan, the University of California and the University of Wisconsin-Madison to ensure this agreement advances libraries’ efforts to preserve, maintain and provide access to books for students, researchers and readers. The agreement gives public and university libraries across the U.S. free, full-text viewing of books at a designated computer in each of their facilities. That means local libraries across the U.S. will be able to offer their patrons access to the incredible collections of our library partners — a huge benefit to the public.
So what does this mean to the public – if you are not in the US very little at the moment. Although they hint at intentions to spread this to ‘other countries’.
In libraries inside US boarders, there will be a computer [I wonder how many users will be allowed to logon to it at once] providing access to a massive collection which was not available before.
For the US public inside and outside the library walls, they will able to find and preview books more easily, and then be able to read them.
If a reader in the U.S. finds an in-copyright book through Google Book Search, he or she will be able to pay to see the entire book online. Also, academic, library, corporate and government organizations will be able to purchase institutional subscriptions to make these books available to their members. For out-of-print books that in most cases do not have a commercial market, this opens a new revenue opportunity that didn’t exist before.
As this comes out of a legal settlement with authors and publishers, the can be forgiven for the emphasis on new revenue opportunities.
For me the big story behind this is that Google have started to complete the links in the search-for-it-discover-it-get-it chain for books in the same way that we are accustomed to for web pages. It is all about getting to the information, and previous century’s legal frameworks have been getting in the way of what has been technically possible for ages. Google’s weight is starting to sweep away some of those restrictions.
The publishers are probably very pleased with their agreement [see AAP’s FAQs], but this could easily be one of the early significant steps in disintermediating their role in getting author’s works to their readers. Eventually, as Google becomes the de facto route to find and read, who needs publishers? We are already starting to see music being distributed with very different models that often don’t include the traditional music publishers.
I must stop hypothesising too much as the agreement behind this has yet to be finalised, and then we need to see how the details are fleshed out. Nevertheless I think the word ‘significant’ can most definitely be associated with this announcement.
The Future, Quantum Encryption, Privacy on the Social Semantic Web
Just two memos: There is a talk tonight with Thomas Länger from the Viennese quantum encryption project (BBC article about the project), co-organized by quintessenz (an organisation devoted to civil rights in the information age) and Transforming Freedom (who are dedicated to documenting the discourse of the battle zones of digital culture; I volunteer for them). ORF wrote a German article about it, with information about the venue and start time. The key issue quintessenz want to raise with this talk is: Who is going to benefit? Will “unbrekable” quantum encryption become available to citizens, too? Quantum encryption cartridges for your PC, anyone?
Secondly: I published an “inaugural interview” Marion Fugléwicz-Bren did with two of my colleagues, Matthias Samwald and Thomas Schandl (not so inaugural for the former, as he already joined SWC in January). I’d like to extract this quote by W3C member Samwald regarding privacy on the (corporation owned) social web and the future (user-managed) social semantic web:
I also think that Semantic Web technologies will receive a lot of media attention when the first big, public breach in security / privacy happens in one of the websites that currently dominate the whole world wide web. At the moment, we all are uploading most of our private and business lives to web sites such as Google, Facebook, Flickr and others. It is just a matter of time until a big scandal happens, be it the companies themselves that misuse the vast amounts of data they have, or be it a government agency in an overzealous effort of crime prevention.
When this will happen, people will re-evaluate the trend towards massive centralisation on the web, and will search for opportunities to make the same feeling of being ‘in the network’ happen in a distributed environment, without selling ones soul to a multinational corporation. Then we will find that such an opportunity already exists — the Semantic Web.
Jim Hendler at the INSEMTIVE 2008 Workshop
Along with a number of my colleagues, I’m currently attending the ISWC 2008 conference in Karlsruhe, Germany. Yesterday I attended the INSEMTIVE workshop (”Incentives for the Semantic Web”) which aimed to explore incentives for the creation of semantic web content, i.e. encourage the creation of more structured metadata. The workshop papers are available to browse online or you can download the complete proceedings. There were a real mix of papers, covering specific issues such as extraction of semantics from tagging, and identifying information needs of a community by analysing search patterns, through to position papers that attempted to highlight shortcomings in current semantic web applications that deter people from creating metadata.
I found the position papers most interesting if only because they provided confirmation of something that I’ve been thinking for a while now: that people will (and do) create metadata when there are obvious and immediate benefits in them doing so. No-one really consciously sits down to share or create metadata: they sit down to do a specific task and metadata drops out as a side-effect. For me this makes much of the problem highlighted by the workshop one of interaction design: how do we build good task-oriented user interfaces that encourage the creation of semantic web metadata, and how can we illustrate the benefits of semantic web technologies in an incremental fashion? In my opinion solving this will require close collaboration between semantic web researchers and developers, and interaction designers.
The end of the workshop was a discussion session chaired by Jim Hendler. Hendler chose to do a retrospective of some older presentations to explore how thinking has evolved (or not!) with respect to drivers towards the development of the semantic web.
Starting in 1999, Hendler showed some slides from DAML strategy talks that emphasised the need for a number of different areas to align before a real marketplace can be created for semantic web content and applications. These areas were tools, users, and languages (e.g. OWL, etc). Hendler noted that the Semantic Web community had mistakenly focused too heavily on languages and not enough on the other areas. He also thought that “Web 2.0″ had focused primarily on the users, to a lesser extent on the tools, and very little on the language aspects. Hendler thought that this alignment was now taking place.
Moving forward in time to show some slides from 2001-2002, Hendler introduced the idea that the development of the web itself will “force” the evolution of the semantic web, i.e. that internal pressures, such as the need to better manage and extract value from the massive amounts of online information, will require the semantic web to solve specific problems. Hendler observed that the web has demonstrated that people will do more work to share information with others than they will do to help themselves; i.e. people are lazy. When people want to, need to, or are rewarded for sharing information and content then they will work much harder than they would do to manage and organize information purely for their own uses. Hendler noted that there is a tendency to say “we’ll solve the data creation problem at the individual level, as solving it at a group level is harder to manage”, but a look at web history illustrates that the opposite is in fact the case.
Hendler also shared what he thought was the best piece of advice he’d been given by Tim Berners-Lee: start small but viral and you can change many things. Hendler’s slides characterized this as: “My friend sees it, wants one; My competitor sees it, needs one”.
Looking at slides from 2002, Hendler introduced the “Value proposition” supporting the creation of semantic web data & content, i.e. that there has to be some immediate return on the investment in creating metadata.
Hendler finished his retrospective with a slide from a 2008 talk that showed the range of commercial companies, government projects and vertical sectors that were now heavily engaged in the Semantic Web (I was happy to see Talis mentioned in the list!). In Hendler’s opinion there is a growing excitement, that the “next big thing” is going to come from the Semantic Web; not a “Google Killer”, but the next big revolutionary idea or service. The incentives here being the obvious one: money.
Hendler noted that there is a huge amount of data out there and that finding anything in the mess can be a win. So even a little semantics can make a difference here and could provide some competitive advantages. We don’t need perfect answers or solutions, just incremental improvements on what we have now.
I was also happy to see Hendler encourage researchers to “compete in the real world”, noting that they have to work within the context of a real world that is moving very fast, that they can’t really compete with the resources of commercial firms in creating semantic web applications and demonstrators and should instead try and work within that context to demonstrate real value from the technology. Hendler encouraged them to focus on issues of scalability. Does the fundamental technology scale? Do the concepts and ideas scale to a real user base? As an illustration Hendler noted that he was working with a number of companies that were using some simple OWL constructs in order to add semantics to applications, but that none of them were using a formal reasoner just “little pieces of procedural code that scale really well”.
Overall, an interesting workshop!
Paul Miller did a podcast with Jim Hendler back in March if you want to hear more about his thoughts on the Semantic Web.
Multimedia in the Web of Data - Annotating and Interlinking Photos, Music, Multimedia [WOD-PD]
The Web of Data Practitioners Days concluded with the session on Multimedia in the Web of Data, the first part of which was led by Ansgar Scherp (University of Koblenz-Landau, Germany).
Multimedia content, as Ansgar pointed out, is hardly annotated, badly organized, and hardly ever looked at again - just think of the 300 something pics you might take on an average week-end getaway, and which you never touch again. Annotating multimedia content requires a lot of work and dedication - but most of the time, these pictures eventually dissappear in the “digital shoe box” that is your photo management software.
The most obvious remedy is to annotate content as early as possible, ideally when creating the content, ideally already on your portable camera (formerly known as: mobile phone:) Ansgar suggested to provide incentives for people to encourage picture annotation - professionals could for instance receive a higher financial reward if the deliver already annotated pictures. And of course there are ‘Games with a purpose’ such as Google Image Labeler, where players tag images in pairs, with and against each other, and are rewarded with the entertainment factor of the game.
The slide below shows what has happened (or will happen) to the process of creating photo books in the digital age and the age of mashups:

After all, this is the age of the social semantic web, so why not try and (re-)use the content, structure and contexts that other users have already created on the web? Content augmentation, for the scope that Ansgar is concerned with, consists in the reuse of content and structures (e.g. from sources such as Flickr and Wikipedia, Geonames) made possible through the definition of rules, e.g.:
- If there are two or less pictures on a page*
- then automatically augment the page with additional photos using location information.
* Page here means a page in the album you are currently working on - you probably took a picture of yourself and your friend in Paris, and even though you went to the Centre Pompidou, you forgot to actually take a pic of the building itself - well, let the web be your library!
So the goal is clear: develop a procedure for applying automatic content augmentation in the creation of good photo books.
But what makes a ‘good’ photo book anyway? Here are some of the results of a structural analysis of real, human-created photobooks conducted at CeWe Color:
- % of photos with faces: 36%
- Number of album pages: 16.96
- Photos per page: 6.69
- Text fields per page: 1.45
- % of pages with text: 87%
There are many rules that can be established from the structural analysis, which can be applied in turn in the creation of photoboooks, e.g. rules like this one,
- If the text located in the upper third of a page
- if the font size is equal or larger that 16 points
- if the number of words is less than 10
- if there is no caption on the page that has a bigger font size
- then this page is the title
Ansgar recommended xSmart, which he described as a “context-driven authoring tool for page-based multimedia presentations.”
Ansgar’s presentation was followed by two more: one by Yves Raimond on Interlinking Music on the Web of Data, and one on Interlinking Multimedia - in spite of better intentions, I did not manage to cover these two in detail, but at least I gathered the links to relevant resources from all three sessions…
Links for Ansgar Scherp’s session
- Continuous Media Markup Language (CMML) (see also: CMML on Wikipedia)
- COMM - A Core Ontology for Multimedia
- Caliph and Emir - Java & MPEG-7 based tools for annotation and retrieval of digital photos and images
- X- COSIM - a framework for Cross(X)-COntext Semantic Information Management
Links for Yves Raimond’s session
- Music Ontology Specification
- The Timeline Ontology
- The Event Ontology
- Functional Requirements for Bibliographic Records
- www.SonicVisualiser.org - a program for viewing and analysing the contents of music audio files
- www.dbtune.org - music-related RDF
- Yves Raimond, Christopher Sutton and Mark Sandler 2008: Automatic Interlinking of Music Datasets on the Semantic Web. (PDF, 467 KB)
- Interview with Yves Raimond: Finding vegetarian music: What B.B. King and the Beastie Boys have in common
- DB-Tune Facet Demo
- Henry 1 and 2 - a SWI-Prolog N3 parser/reasoner, and DSP-driving SPARQL end point
Links for Michael Hausenblas’ session
- InterlinkingMultimedia.info - a wiki dedicated to Interlinking multimedia (iM), “a light-weight bottom-up approach to interlink multimedia content on the Web of Data”.
- Rammx - RDFa-deployed Multimedia Metadata
- CaMiCatzee - multimedia interlinking concept demonstrator.
Last not least: Ansgar Scherp allowed us a sneak peek of SemaPlorer, a Large-scale Semantic Faceted Browsing Application for Multimedia Data that is going to be revealed on Dec 2, 2008, at the BOEMIE Bootstrapping Ontology Evolution with Multimedia Information Extraction) workshop in Koblenz. Here is an abstract:
Navigating large media repositories is a tedious task, because it requires frequent search for the `right’ keywords, as searching and browsing do not consider the semantics of multimedia data. To resolve this issue, we have developed the SemaPlorer application. SemaPlorer facilitates easy usage of Flickr data by allowing for faceted browsing taking into account semantic background knowledge harvested from sources such as DBpedia, GeoNames, WordNet and personal FOAF files. The inclusion of such background knowledge, however, puts a heavy load on the repository infrastructure that cannot be handled by off-the-shelf software. Therefore, we have developed SemaPlorer’s storage infrastructure based on Amazon’s Elastic Computing Cloud (EC2) and Simple Storage Service. We apply NetworkedGraphs as additional layer on top of EC2, performing as a large, federated data infrastructure for semantically heterogeneous data sources from within and outside of the cloud. Therefore, SemaPlorer is scalable with respect to the amount of distributed components working together as well as the number of triples managed overall.
Steffen Staab, Information Systems and Semantic Web (ISWeb), University of Koblenz-Landau, Germany
Thank you, thank you, thank you, it was a lovely event with an unusually high amount of processable input!
Motorola developing social network friendly android mobile phone
My Treo 650 is long in the tooth and I’m anxious to replace it. I’d love an iPhone, but am not ready to switch service providers and am also somewhat wary about its closed nature. So an android based phone is intriguing. Now here is an interesting development: BusinessWeek reports that Motorola Readies Its Own Android Social Smartphone:
“As the wireless world awaits the Oct. 22 debut of the first phone based on the Google-backed Android software, engineers at Motorola (MOT) are hard at work on their own Android handset. Motorola’s version will boast an iPhone-like touch screen, a slide-out qwerty keyboard, and a host of social-network-friendly features, BusinessWeek.com has learned.”
This is a bit of a no-brainer and iPhone is sure to have support for social media and probably well before these Motorola phones will hit the street, which is expected to be in the second quarter of 2010. The BusinessWeek article notes that:
“In the next year, social networking phones are expected to be a hit with the 16- to 34-year-old crowd, analysts say. According to consultancy Informa (INF), the number of mobile social-networking users will rise from 2.3% of global cell-phone users at the end of 2007 to as many as 23% of all mobile users by the end of 2012.”
The Revolution Starts (near) Here
Most mashups are very one-off, self contained little things. They may be useful for vertical purposes, which is fine - I need this, now. But in general they don’t lend themselves to er, generalisation. Dan Brickley just spotted an instance which is vertical, but only because the end-user (I can use that phrase, ya?) chose to do it that way: Data Scraping Wikipedia with Google Spreadsheets
I didn’t really understand from the blog post, fortunately Dan summarised:
In which they use Google Spreadsheets to convert a Wikipedia table to
an RSS feed and thence to live population maps via Yahoo pipes with no
coding required.
Some smarts required is clear on this, but the ability to wire disparate services in an arbitrary fashion apparently is possible. Who’d have thunk.
OpenID, OAuth UI and tool links
A quick link roundup:
From ‘Google OAuth & Federated Login Research‘:
“The following provides some guidelines for the user interface define of becoming an OAuth service provider”
Detailed notes on UI issues, with screenshots and links to related work (opensocial etc.).
Myspace’s OAuth Testing tool:
The MySpace OAuth tool creates examples to show external developers the correct format for constructing HTTP requests signed according to OAuth specifications
Google’s OAuth playground tool (link):
… to help developers cure their OAuth woes. You can use the Playground to help debug problems, check your own implementation, or experiment with the Google Data APIs.
If anyone figures out how to post files to Blogger via their AtomPub/OAuth API, please post a writeup! We should be able to use it to post RDFa/FOAF etc hopefully…
Yahoo’s OpenID usability research. Really good to see this made public, I hope others do likewise. There’s a summary page and a full report in PDF, “Yahoo! OpenID: One Key, Many Doors“.
Finally, what looks like an excellent set of introductory posts on OAuth: a Beginner’s Guide to OAuth from Eran Hammer-Lahav.
Stephen Arnold – A conversation with the closing keynote speaker for Online Information 2008
Stephen E. Arnold’s career has lead him to be a prolific writer, speaker, and expert on web technologies and their application both inside the commercial enterprise and across the Internet. He is best known for his work on search and his insights in to the Google phenomenon.
He is presenting the keynote in the closing session of the Online Information Conference 2008 which is being held at Olympia in London from 2nd – 4th December and will have a wide range of speakers of broad interest to all information professionals from all sectors – libraries, academia, government, and commerce.
Stephen talks about his career so far and the themes for his presentation, explaining how the technologies that we have seen emerging over the last few years are ready for use inside the enterprise as well as maturing into delivering services across the web. He also explores how the componentised nature of these technologies and the applications they power, enables them to be moulded to satisfy the needs of their users.
This Week’s Semantic Web, Burningbird style
Sorry I’m a little tardy with this, anyhow last time here I asked for volunteers to give their own take on TWSW . Shelley Powers stepped up to the plate, and the content of her post is below. In this context Shelley’s best known for writing the first book on RDF, over five years ago.
Brian Manley also took the bait, and he’s started publishing The Week In Linked Data - TWILD (more like the style of TWSW but with better descriptions and a pronounceable acronym).
This post will likely push a lot of things below the bottom of the page, so I’d better link to Paul’s recent podcasts to keep him sweet.
Over to Shelley:
…
I decided to add a slight twist to my own version of This Week’s Semantic Web, focusing not only on the stories, but how I found them. After all, the real purpose of the semantic web technologies is to make information easier to find. How are we, in the semantic web community, doing in this regard?
To start, I subscribe to various feeds including Planet RDF, as a way to keep up with most of the semantic web news. This week, the stories from Planet RDF that caught my eye were the following:
- Tom Heath wrote How Will We Interact with the Web of Data for IEEE. In his article, Tom proposes that the web homepage, as we know it today, is dead. In its place we’ll have connected pieces of data, pulled together via RDF records (tuples), which are then used to generate the human readable content. So one could have weblog, browser, feeds, friend feeds, and other online “islands of data”, Flickr and other photos, videos on YouTube, etc.—all annotated with metadata and brought together, mechanistically, because of the metadata annotation. It’s interesting, and we already have some of this with various widget-enabled devices, but I’m not sure that most people are “geek enough” to make this a truly viable option. Not yet.
- Bob DuCharme wrote a follow-up piece to his Leaning more about SPARQL, related to forming SPARQL queries against DBPedia, the site dedicated to making Wikipedia information queriable. No, that’s not a word…but it should be. Bob’s example is important for two reasons. The first, and the most obvious, reason is that it, of course, demonstrates SPARQL against a published source—hopefully spurring on other efforts. More importantly, though, in my opinion, is that Bob is publishing his explorations, his learning experiences, not necessarily a finished, “Ta da!” work. We need more journals of discovery in the semantic web world.
I don’t only get my semantic web information from the Planet RDF feed. I find other entries on this topic, now and again, in other feeds. For instance, I wrote about two other items this week and I’ll repeat links to both because I feel they represent the semantic world “in the wild”.
- A List Apart featured an article titled Understanding Progressive Enhancement, which discussed the concept of building one’s website from the inside out—focusing on the properly semantically annotated content, first, before tossing in the pretties. I think this article complements some of the discussion about minimal design that was such a popular topic a few months back. The article not only focuses our attention back on the content, and hence the real purpose for the web site, it also drives home that we need to start doing a better job, semantically speaking, with our use of page markup. Speaking of markup…
- Tina Holmboe’s XHTML—myths and realities is both an important, and timely, look at XHTML, the importance of XHTML for the semantic world (RDFa), and the future of XHTML. It’s timely because it serves to remind us that we now have two divergent markup paths under the W3C leadership—paths that do not share a common model or focus, which seems to me to act counter to the ultimate goal of a truly semantic web.
In my quest for this week’s semantic web goodies, I also searched in Google on “Semantic Web” and then focused on News, not Web, in order to filter items down to recent events. With this approach, I found the following items to pass along:
- Paul Miller at ZDNet writes Does the Semantic web matter? He believes it does, a view offered up simply and elegantly. What the semantic web isn’t, though, according to Paul, is a goose to be punched and pummeled by the elitist and the avaricious until forced to deliver up the golden egg. To wit:
- Speaking of punching geese, oh look, Ask.com is back. It’s got mad semantic skillz. So I put Ask.com to the test, and asked it “How can I learn more about SPARQL”, and it responded with, “Did you mean, ‘How can I learn more about sparkle’?”. I paused a moment, and said sure, show me that one. Ummm, Swarovski crystal jewelry. Pretty sparkles. To be fair, before following this sparkly tangent, Ask.com did return the first of Bob Ducharme’s post, mentioned above. In fact, it returned exactly the same result list as Google and Yahoo, when I asked them the same question.
- Though not exactly “this week”, ReadWriteWeb writes a mean semantic web post, now and again, and had one last week subtitled, “Show me the Money!”—and wasn’t that a great movie moment? I digress, though. The RWW post focuses on a new report by a Semantic web entrepreneur on semantic web companies making money, but just at the moment when I clicked through to read the report, I got distracted by the flock of migrating geese overhead. I must pursue the report at a later time. What I found interesting, though, was the ReadWriteWeb Semantic Web Log search and…ah geez, there goes another flock, circling overhead.
Continuing landgrabs by startups that seek to attract, trap and exploit eyeballs stand unashamedly on the shoulders of Semantic Web promise whilst running counter to its basic tenets of linking and openness. On the other hand, companies ‘just’ doing perfectly reasonable - and valuable - things with the meanings of words, phrases and documents latch on to the Semantic Web’s buzz, whilst being all about Semantics and not at all about the Web.
New entrants, hopefully building viable and useful businesses upon the Semantic Web’s ideas, are pilloried by stalwarts of the ‘community,’ because the reality of their business model does not permit a whole-hearted embracing of the entire Semantic Web stack from Day One. Intellectual purity clashes with pragmatism and reality on a daily basis. Well-meaning guidelines and best practices morph in the minds of too many to become laws, ‘truths’, and rods with which to beat outsiders. Visions of Orwellian pigs fill my brain, and I don’t like what I see as they rise up onto two feet and gaze disdainfully around.
There were other sources I searched for information about the semantic web for this week, but the results were less than optimum. For instance, I searched on “semanticweb” in delicious, but the results show the items that were posted to delicious this week, not necessarily published this week. The problem is that while many services such as delicious have a way to tag items with terms like “semanticweb” the metadata annotation is limited, and doesn’t include information such as when was the posted item first published, nor allow you to search on the same. Most of the “semantics” are flat, simple, and two-dimensional, IE keyword-value pairs.
I next went in the opposite direction, looking for just published items, and then sought to filter on the semantic web. For instance, no other source is better for up-to-date discovery of minutiae than Twitter. However, as far as I can see, there is no way to search on specific topic in Twitter. You can look for people, but other subject material search is extremely limited. If you don’t know that Twitter user Kingsley Idehen exists, and posts frequently on semantic web related items, you may not discover a graph of linked data sources or an animation related to RDF as middleware.
I then turned to the Big Cheese, the Head Semantic Web honcho, Twine, and the twine related to the Semantic Web. Eureka! I finded the Semantic Web! Of course, on closer look, most of the items also could be found on Planet RDF. Still, meat that is both fresh, and relevant. I’ll just pick out a few for my version of This Week’s Semantic Web.
- Seven OWL 2 Drafts Published at the W3C. OWL 2 is an extension of the OWL, which is the Web Ontology Language. No, don’t try to fit the acronym. OWL is not necessarily directly important to thee and me. OWL is important, though, for designing systems that would understand exactly what I mean when I ask, “How can I learn more about SPARQL”, and that will return the definitive sources meeting my question, without being dependent on either language processing or obscure page ranking algorithms.
- Speaking of SPARQL, another item in the twine was SPARQL Update a submission to the W3C describing a way to use SPARQL to update graphs (semantically linked data stores). Interesting, considering that SPARQL means Simple Protocol and RDF Query Language. What works in one direction must work all directions, eh? Reminds me a little of HTML5 and JSON—the Swiss Army knives of technology.
And so ends my tenure for This Week’s Semantic Web, Burningbird style. What I discovered in the process of building my list was that we’re not close to the semantic web we seek. Without knowing about the people, such as Bob, Kingsley, or Danny, or the topic-focused resources such as Planet RDF or Twine, I would have had a much more difficult time finding out what is happening, this week, in the semantic web. However, among the results I did find are new technologies, new specifications, new efforts that assure us that though the semantic web doesn’t exist today, it surely will someday.
Surely. Someday.
Scotland’s Information
I’ve been given a heads-up on a new site from SLIC and CILIP in Scotland, which has been developed by the Centre for Digital Library Research (CDLR) at the University of Strathclyde.
Scotland’s Information is a service to help identify and locate Scotland’s wealth of collections held in libraries, archives and museums.
Although the site is live now, it’s official launch will be on 24th October 2008, the centenary of CILIP in Scotland.
One could be flippant and note that this is just another Google Maps mashup, which it is, but this is a good example of producing something much greater than the sum of it’s parts. Google Maps mashups have only been around for three years – Google announced their API in June 2005, as we covered here on Panlibus – yet they are a widely used tool on the web and a key part of many a web site that users are familiar and comfortable in using. It is amazing to note how rapidly the click/drag/zoom metaphor for interacting with a map became the de facto way to do it.
Back to Scotland’s Information, this site draws together a wealth of information about libraries, archives and museums in Scotland and the topics, people and organisations they represent.
What is in my opinion different and very useful in the way the site works is how you can filter your way through this data (often with the use of tag clouds) to arrive at a map containing pins for each location (museum, library, or archive) that can help you. For instance this is the result of clicking on Robert Burns in the People tag cloud - ‘Information collections about Robert Burns (1759-1796)’. 35 locations associated with that famous Scot, which can then be limited further for those with wheelchair and/or internet access.
A final link in the chain is that from information about individual collections, there is a link through to the relevant OPAC or search interface. I would suggest that this could be made even more intuitive, especially if a user has arrived at a link by filtering on a person or subject, by making use of the Silkworm Directory service and it’s API to deep-link in to those collections, directly delivering the results of a relevant search.
A great start, that from day one will deliver a valuable service to those visiting, interested in, and residing in Scotland. It will be interesting to see how it develops.
Earn $100 by designing an ontology for one of these domains
This offer just showed up in a Google alert triggered by its mention of Swoogle. Some poor Australian student (poor in ethics and ability, not money) is willing to pay $100 to have someone do his project for a Semantic Web course.
homeworkanytimehelp4 is behind on several assignments and in a bit of a fix. He needs his ontology assignment done by 12 October, just two days after he posted his offer.
Is this cheating? Well, the studentOfFortune.com site has thought deeply about this, and it turns out that it’s not.
Q: It still seems like cheating
A: We’ve thought long and hard about this. We believe that users who write solutions which not only help provide answers but also help teach how the answers were achieved will be the solutions that are purchased more often than not. And for that reason, we believe that Student of Fortune is a teaching and research tool, not a tool for cheating. But it’s up to you how you use it. We’re not going to judge you. We’re just here to help.
Times are hard right now. If you are tempted to help homeworkanytimehelp4, you owe it to yourself to find out if the dollars are USD or AUD.
Has the Semantic Web Industry become a reality yet?
Well, no. Or maybe not quite. But an innocent reader might have gathered this from the title of David Provost’s recent publication which promisingly read “On the Cusp. Global Review of the Semantic Web Industry.”
Provost’s review is a nice and readable attempt at evangelizing semantic technologies and their adoption by the industry. Its seeks to spread the news outside of the echo chambers and avoids any community jargon and cryptic acronyms irrelevant to strategic decision makers. He really derserves great credits here.
But in the end Provost’s description of a “Semantic Web Industry” is reductionist. By just analysing the commercial availabilty of technology provided by vendors, the bigger picture of the industry gets blurred. He misses the point when it comes to analysing the actual demands for semantic applications. But they could be easily identified e.g. enabling cost-efficient interoperability and reusability of data. So Provost gets stuck in a supply-driven view of the semantic web industry. And as we have learned from history, supply driven markets - technology markets in special - are extremely vulnerable. Hence concluding that the Semantic Web Industry is on the cusp might seem a little “misworded”.
What might be a nice addition for a follow up study is to look at the commercialization strategies of semantic web technologies and its capitalization logic as a network good. Further on, it might be worth it looking at the










![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=8459a337-d777-4ed6-bbd1-dfa9edbb4af5)