Archive

Archive for the ‘Uncategorized’ Category

Graphical “more like this” Query Building

August 29th, 2010

I promised in an earlier blog post to talk about how to create queries over OWL in RDF.  So here it is.

As Ivan alluded in his comment, there are some syntax issues with talking about OWL restrictions in RDF.  What is he referring to?  Well, let's take the same example in the last blog post, a datatype restriction about things with age>=21.  We could write this in Manchester Syntax as 

hasAge only xsd:integer [>=21]

But the OWL/RDF rendition of this is where the 'arcane' syntax comes in.  We can see it just by looking at the source code in turtle, where it looks like this:

[] a owl:Restriction ;
owl:allValuesFrom
[ a rdfs:Datatype ;
owl:onDatatype xsd:integer ;
owl:withRestrictions
([ xsd:minInclusive 21])
] ;
owl:onProperty :hasAge .

In the last blog entry, we saw a rule that would match this sort of definition, so that we could classify persons of appropriate ages as Adults.  That rule looked like this:

CONSTRUCT {
    ?x a ?restriction .
}
WHERE {
    ?datatype owl:onDatatype xsd:integer .
    ?datatype owl:withRestrictions ?var .
    ?datatype a rdfs:Datatype .
    ?restriction owl:allValuesFrom ?datatype .
    ?restriction a owl:Restriction .
    ?restriction owl:onProperty ?datatypeproperty .
    ?var rdf:first ?var1 .
    ?var1 xsd:minInclusive ?mval .
    ?x ?datatypeproperty ?val .
    FILTER (?val >= ?mval) .
}

How do you write a rule like that?  By looking up in the standard how to express datatype restrictions, and how to link those to restricted value sets, and . . . . if that seems labor intensive and error-prone to you, then you're right.  It is.

But we can use a power-tool to help make this happen. The power tools aren't included in the free version of TopBraid Composer, so if you want to follow along here, you'll need the Maestro Edition; a 30-day trial is available for free.

Start by loading http://workingontologist.org/Examples/adult.rdf into Composer, just as shown before, and open it. We're going to use the model itself as a prototype to create a query. Let's start by looking at an example of the restriction we want to match - look at the definition of Adult in the model:

Man 

You can type it in just like that.  But that doesn't help us write a SPARQL query to match any restriction of this form.  How can we do that?   If you click on "Graph" at the bottom of the pane, you can explore this definition, in RDF.  If you drill down to the Datatype Restriction itself, you get a view like the top of this figure:

Minequery 

This is just a graphic representation of triples in the model - you can see all the structure of the RDF representation of the restriction. 

Now comes the fun part - let's turn this image into a query (which, to avoid suspense, is already shown at the bottom of the figure).  We want a query that will match "things like this" restriction.  What does "like this" mean?  That's what we have to specify - there are some aspects of this example that should be included in the match (like the fact that it is a owl:Restriction, on a rdfs:Datatype xsd:integer, and that it is a owl:minInclusive restriction), and others should not be included in the match (that the property is :hasAge; after all, we this to match for restrictions on any property).  So, we select the things that we want to keep in the query, marked with a small "x" (you can set/reset the "x" by clicking on the small box in each node in the graph).  

Once you have selected the aspects that specify what you mean by "like this" (a Datatype Restriction, on some property, with minInclusive over xsd:integers), you can generate the query automatically by clicking the  Starbutton.  You can see the generated query at the bottom of the figure. 

All the generator did was to take the triples shown in the figure, and render them in the query.  Selected nodes (with "x") appear in the query as themselves; unselected nodes (no "x") become variables.  Properties always show up as themselves.   Best guesses are made for meaningful variable names; it uses type information for the guesses.  

There are a few differences between the generated query and the WHERE clause of the rule:

WHERE {
?datatype owl:onDatatype xsd:integer .
?datatype owl:withRestrictions ?var .
?datatype a rdfs:Datatype .
?restriction owl:allValuesFrom ?datatype .
?restriction a owl:Restriction .
?restriction owl:onProperty ?datatypeproperty .
?var rdf:first ?var1 .
?var1 xsd:minInclusive ?mval .
?x ?datatypeproperty ?val .
FILTER (?val >= ?mval) .
}

The first difference is ordering of triples - the generator isn't very fussy about the order in which triples are generated, so it is different each time (if you are following along at home, your generated query will probably be different from the one shown here, and also from the rule).  

The second difference is the inclusion of a triple to match data, to wit:

 ?x ?datatypeproperty ?val .

After all, in a rule, we want to say "when some data satisfies this restriction, ..." This clause uses the same variable for the property (?datatypeproperty) as used in the rest of the query. 

The final difference has to do with the constant "21".  The generated query includes the constant, whereas the rule turns it into a variable (?mval) and adds a filter to compare it to the actual data (?val).  After all, the value "21" comes from the model, and shouldn't be built in to the rule. 

So yes, these modifications have to be made by hand (using the SPARQL editor, where the generator put the query).  The query generator should be seen as a power tool; you still need an operator who knows how to use it, but it simplifies a lot of the heavy lifting for query writing.  In this case, we have a rule with 10 clauses (9 triples and a filter).  The generator created seven of the triples, and most of the eighth one; the human only had to write the last two clauses.  That is, the power tool took care of the "arcane syntax" that Ivan referred to, leaving the human to figure out what they really want the rule to mean.

I use this feature of TopBraid Composer all the time, in this pattern.  I want to write a query that matches some 'arcane' bit of RDF (e.g., from dbpedia, the OWL in RDF standard, the XML DOM, SKOS, etc.). Instead of trying to write a query from scratch, I find (or even build) an example of the thing I want to match.  Then I generate the query - automatically guaranteeing that I didn't leave out any triples, that I got all the namespaces and property names correct, that I didn't accidentally collide bnodes by giving them the same variable name, etc.  Then I beat up the result to create the query that I really want - in which I define what I want to do with the match. 

So when you see an elaborate query with dozens of triples in it, and you wonder what sort of geek can write or maintain such a thing, keep in mind that it might not have been written at all; it might have been generated from an example.

Uncategorized

Extending OWL RL

August 25th, 2010
I've always been a fan of describing OWL in terms of rules. When introducing a someone to a new technology, it is nice to be able to describe it simply (a lesson that facebook taught us again recently). And while it is a bit of a white lie to say that OWL is defined just by a set of rules, it makes it very easy to explain what something in OWL (or RDFS) means, by stating a rule that it follows.

I've actually been using a rule-based definition of OWL for years now, starting back at Intellidimension years ago, and then using OWLIM, and nowadays SPIN. All of these technologies have been 'approximating' OWL for years using variations of Datalog technology - implementing OWL as a set of rules.

While OWL 2's creation of three profiles and a subset hardly counts as keeping the standard simple, I have to say I appreciate the legitimacy that the OWL 2 RL profile has given to a practice that many of us (more than just the ones I have listed) have been doing for years now - of using rule-based systems to process OWL. And the RIF folks have even done us the favor of writing out just what rules OWL 2 RL is made of.

One of the things I have always liked about this approach is the flexibility it gives the system builder in trading off performance vs. expressiveness in the modeling language. You don't need someValuesFrom restrictions? Fine - take those rules out, and speed up the system. I've taken systems from intractable 20-minute response times down to almost instant by fine-tuning the rule system, while still maintaining the same semantics - because my model didn't use the discarded rules.

But today I want to talk about another advantage of this approach - that you can extend your model semantics as well. Suppose there is something in OWL-Full that you want to use, but it doesn't appear in the OWL 2 RL list of rules? What can you do about it? You could switch approaches, and use another style of reasoner, but then you lose the advantage of being able to tune your rule base. Another approach is to encode just the extensions that you want in rules.

Let's take a simple example of this, using SPIN as our rule language. You can follow along yourself if you like - all you need is the Free Edition of TopBraid Composer.

OWL-Full allows something called Data Range Expressions, in which you can define a range to be a set of values. A simple example of this is the notion of Adult, that is a person who has an age greater or equal to 21. An example of a model with this definition can be found at http://www.workingontologist.org/Examples/adult.rdf.

You can import this file into TopBraid Composer by right-clicking on the TopBraid project, selecting "Import RDF or OWL File from the Web" and pasting in the URL of the model, http://www.workingontologist.org/Examples/adult.rdf (see first figure). 

CreateFile

Open the file adult.rdf by double-clicking, then expand owl:Thing to see the ontology. Click on "Adult" to see its definition - a Person who hasAge only from values greater or equal to 21 (see second figure).

OpenAdult

Notice that there are also three instances of the class Person - with ages 23, 18 and 45. Evidently, two of these are adults, and one is not.

Persons

Now we run SPIN inferences (by pressing the  Inference button), and we see that indeed just the people of appropriate age are classified as Adults.

Done

How did this work?

SPIN works by expressing the rules for OWL in SPARQL. Thanks to the RIF effort, mentioned above, we at TopQuadrant were able to write out all the OWL 2 RL rules in SPIN (since SPARQL has the same expressive power as RIF). This example simply imported these rules from http://topbraid.org/spin/owlrl-all. The SPIN inferencer finds these rules, and executes them when you press the Inference button. We can see one of these rules in the following figure - it is a familiar rule, telling us how rdfs:subPropertyOf works.

SubPRule

But that doesn't explain the whole thing - if you know OWL 2 RL well, you know that DataRange Expressions are not part of the OWL 2 RL profile. There are good technical reasons why it was left out, but that doesn't keep us from wanting to do these inferences. So we express them in SPARQL and add them in to our rule set for the SPIN inferencer to work on. One such rule is shown in the next figure;

MinInclusive
most of the rule matches the RDF rendition of the OWL data restriction. It matches restrictions of xsd:integer, where all the values come from the set defined by minexclusive for some value (in our case, 21). When all these things match, then we assert that the instance is a member of the restriction.

So in the case of :Person_1 who is 23 years old, the property :hasAge matches the variable ?datatypeproperty, and 21 matches the variable ?mval, while the actual age 23 matches the variable ?val. Since 23 > 21, ?val > ?mval, and the rule matches. Hence, :Person_1 is a member of the restriction, and by the rest of the rules from OWL-RL, is an :Adult.

This approach to OWL gives a lot of control to the modeler; they can use standard models (like the OWL 2 RL model we used here), but they can also augment this reasoning with new rules that do just as much inferencing as is needed for the application. These new rules can be consistent with the standard OWL-Full rules, or they could even be domain-specific business rules. In any case, the power lies in the hands of the modeler. In the particular case of SPIN, we have the added advantage that the modeler can write these rules in the standard SPARQL language.

Uncategorized

Miranda McKearney talks with Talis at the Summer Reading Challenge launch

July 22nd, 2010

In this podcast, Sarah Bartlett talks with Miranda McKearney, the Founder Director of The Reading Agency at the launch of the Summer Reading Challenge 2010 at The House of Commons. The Summer Reading Challenge is underpinned by a strong belief in the public library ethos and the ideal of equal access to reading opportunities. In the podcast we discuss the origins of the Summer Reading Challenge, The Reading Agency’s biggest and most successful model of reader development. Miranda explains how the agency arrives at a compelling reading theme every year that will engage children and facilitate a broad range of partnerships. This year’s theme, Space Hop, will enable libraries and schools to partner with the scientific domain, and is also designed to encourage boys to read. Miranda discusses other important hard-to-reach groups of children, emphasising that priorities will vary locally. Ultimately, the success of the Challenge depends on the school – librarian partnership, and Miranda emphasises how important it is for schools to recognise the importance of reading for pleasure. Miranda outlines the proven positive outcomes of involvement in the Challenge in terms of reading attainment and motivation levels. Finally we discuss the prospects of ongoing funding for the Summer Challenge.

English, Podcast, Public, Reading, Summer Reading Challenge, Uncategorized

Ed Vaizey pays homage to the Summer Reading Challenge at this year’s launch

July 20th, 2010

As co-sponsors of this year’s Summer Reading Challenge, a number of us here at Talis made our way to the House of Commons yesterday for the launch event. The Summer Reading Challenge is one of those initiatives that everyone loves, and it’s a privilege for Talis to be associated with something that has such broad and valuable outcomes.

In case you’re not aware of it, and as Miranda McKearney, Director of The Reading Agency, explained in the main address, The Summer Challenge is essentially very simple – children across the UK are challenged to read six books over the summer holidays. In what was a clarion call for the retention of reader development activities in public libraries in the current cost-cutting climate, Miranda emphasised the research that has repeatedly demonstrated tangible outcomes of the Challenge in terms of the reading levels, range and motivations of the increasing numbers of 4-11 year olds who take part every year.

Whilst Ed Vaizey, Minister of Culture, made ideological overtures about the Big Society flavour of the Summer Reading Challenge, he was clearly deeply impressed with its successes, as were all the speakers at the event. Around 750,000 children took part last year, and of these, 413,000 completed the challenge, involving 95% of libraries, and resulting in 47,000 new library members. And in case you’re wondering, there were 20 million loans of children’s materials, and 3 million books read as direct outcomes of the challenge.

To complement this quantitative view, his colleague Don Foster from the Liberal Democrat party testified that the reading habits of his then-8 year old grandson were transformed by the Challenge last year, and he is now the kind of boy who reads after bedtime with a torch under the covers, to the astonishment of his parents who had been deeply concerned about his disinclination to read.

On a more sobering note, Alan Davey from The Arts Council reminded us that beneath the statistic that 60% of the population read regularly for pleasure, lies a less comfortable reality that 40% of us don’t. As a long-term supporter of The Reading Agency, he concurs that encouraging the young to read is crucial.

Anne Sarrag from The Reading Agency took us back to the inception of The Summer Reading Challenge 11 years ago, round Miranda’s kitchen table, as the legend goes. Anne affirmed that children’s libraries are integral to The Summer Reading Challenge, which really operates as a big team, with The Reading Agency as a catalyst, and librarians customising it and prioritising partnerships to local needs. Its relationship to Big Society becomes clear at this point, and is driven home further by a wave of volunteer effort entering the Challenge this year as large numbers of young adults apply to volunteer in libraries over the summer holidays to encourage children’s reading. Many of them, according to Anne, took part themselves when they were younger, which is testament to the power of the Challenge.

Recently in CILIP Gazette, a French public librarian working on placement here praised the range of reader development initiatives in which British public libraries engage. The Summer Reading Challenge is a piece of good news that just gets better and better, as its participation rates improve every year, and its significance is validated by research such as the OECD Reading For Change report, cited by Anne, which demonstrates that reading for pleasure is essential to children’s life chances. And I’m sure we can all agree that it’s everyone’s responsibility to encourage reading – not just schools’ – in a fun and enjoyable way. Its fundamental objective, as Anne pointed out, is to give libraries and teachers the book knowledge, confidence and understanding of implementing initiatives and learning opportunities to develop young readers. Specifically, it helps to reduce what has become known as the “summer reading dip”, a common phenomenon in which emergent readers return to school in September, having lost the input of the school over the holidays, and struggle to regain their previous level of attainment.

Talis is proud to be supporting The Summer Reading Challenge, which builds both enjoyment of reading and a relationship with libraries. For many of us, it’s the embodiment of librarianship and the reason why we originally entered the profession. Long may it thrive.

Find out more….

These podcasts were recorded at the event and offer further insights:

English, Public, Reading, Summer Reading Challenge, Uncategorized

2010 top ten trends in academic libraries

June 30th, 2010

The US-based ACRL Research, Planning and Review Committee has produced a valuable 2010 top ten trends in academic libraries report, with plenty of relevance for UK librarians, as well as interesting insights into the US higher education sector. The trends are identified via a rigorous methodology which incorporates a literature review (the report incorporates an impressive array of recent industry sources) and a limited survey “to clarify the trends”.

Space oddity

One key trend is the challenge of the constant rebalancing of library physical and virtual space, noting, interestingly, that “in-person reference desk statistics are declining in many academic libraries, while online reference statistics are increasing”. The report points to the expansion of library virtual presence through course management and other institutional systems as well as social networking tools, and reminds us of the overarching need to “support the teaching and instruction mission of the university”.

A sea change in collection development

The report makes the point that academic library collection growth is now driven by user demand, in other words we’ve shifted from a “just in case” to a “just in time” approach.  Is “just in case” in fact integral to the library mission? Only time will tell how susceptible this makes the academic library to further disintermediation, facilitated by:

“… customized patron-driven acquisition programs from some major library book distributors, improved print-on-demand options for monographs, patron desire for new resource types, and resource sharing systems, such as RapidILL, offering 24-hour turnaround time for article requests.”

The report further acknowledges:

“Still to be determined are the long-term effects of this change on the ability of academic libraries to meet their clientele’s information needs, the stability of some of the new access methods, and implications for future scholarship

Digital data set management: a grower

A sub-set of this trend towards user-driven collection development is “the need to collect, preserve, and provide access to digital datasets”. A 2009 OCLC report is cited to make the point that libraries need to support discovery in this area, and notes that the 2010 Horizon Report identified visual data analysis tools as a technology trend on the 4-5 year horizon. Digitisation more generally is a trend in its own right and the report warns that this will require a larger share of resources in future. On the upside, the Coalition of Networked Information makes the point that the academic library has a real opportunity with the digitisation of special collections – “a nexus where technology and content are meeting to advance scholarship in extraordinary new ways”.

More mobile

The explosive growth of mobile devices is a standalone trend, alongside a more general Technology section. Again, the report brings us back to the need to consider not only user needs and preferences but also “the relationship of services to the academic program of their institution”.

Accountability and assessment

A particularly significant trend, in my view, is the increasing need for accountability and assessment, i.e. the library demonstrating the value provided to users and the broader institution, specifically:

“… the library’s impact on student learning outcomes, student engagement, student recruitment and retention, successful grant applications, and faculty research productivity.”

Bad moon rising

The report makes the points that you would expect about budget challenges, but there are some interesting vignettes around the US higher education sector generally, notably this, sourced from Chronicle of Higher Education:

“… the average return for college and university endowments in the 2009 fiscal year was -18.7%, the worst since 1974.”

Importantly though, the report doesn’t dwell on this unfortunate reality, and really does accentuate the positive. One area of optimism highlighted was increased opportunities for collaboration, the epitome of the service orientation of librarianship, as the report correctly notes:

“Collaboration efforts will continue to diversify: collaborating with faculty to integrate library resources into the curriculum and to seek out information literacy instruction, and as an embedded librarian; working with scholars to provide access to their data sets, project notes, papers etc. in virtual research environments and digital repositories.”

And another real area of opportunity is scholarly communications:

“Recent developments illustrate a trend toward proactive efforts to educate faculty and students about authors’ rights and open access publishing options and to recruit content for institutional repositories (IRs).

The report urges academic librarians to provide value-added intellectual property services, and there are some really interesting US exemplars highlighted:

“Some libraries have created scholarly communication librarian or copyright officer positions. Others have taken a more distributed approach. The University of Minnesota, for example, has included scholarly communication responsibilities in the position descriptions of all of its liaison librarians.”

The ACRL Research, Planning and Review Committee is to be commended for such a lucid report that is concise enough for everyone to read in full.

English, Uncategorized

Semantic Technologies monthly review. May 2010

June 1st, 2010
More activity related to semantic technologies continued during May month. In the field of new usages of Semantic Technologies we can hightlight some relevant  facts, as Jini is building a smart ‘Taste Engine’ for Google TV,  U.S. Defense intelligence Agency selected Inttensity software for text extraction solution for mission related work, Thomson Reuters has added financial video service, or [...]

English, Spanish, Uncategorized

Talis Open Day: Linked Data and Libraries

June 1st, 2010

Register to reserve your place for the latest in the series of free Talis Platform Open days which is specify for anyone interested  in understanding and applying Linked Data in the world of National, International, Cooperative, and other large libraries.

Talis Open Day: Linked Data and Libraries
10:00 – 16:00 – Wednesday 21st July 2010
British Library Conference Centre
St Pancras
London

Register for the event from the Platform events page.
Location Information, from the British Library.

These Open Days are designed to introduce you to the principles, practice and potential of Linked Data.  Included is a short tutorial on RDF and the SPARQL query language, pitched at a level which will engage the technical and inform the non-technical attendees.

Linked Data is being adopted by many significant organisations across the web.  data.gov.uk and the BBC are just two that are working with Talis on applying Linked Data Semantic Web techniques and technologies.   As can be seen from the provisional agenda below, this day will (in addition to addressing general Linked Data issues) be covering leading library specific initiatives in this area.

AGENDA

  • Introduction to Linked Data
  • Overview of the Talis Platform
  • The Bnf Pivot project – Emmanuelle Bermes, Bibliothèque nationale de France
  • W3C Library Linked Data Incubator Group
  • RDF/SPARQL tutorial
  • Bibo – The Bibliographic Ontology
  • Finding Semantic Relationships in MARC
  • Linked Data in action

This is an ideal free day for those wanting an insight in to the potential and the practicalities of applying Linked Data to library data.   Follow this page as we announce more speakers for each of the sessions.

English, Uncategorized

Forecast about Semantic Technology market in Europe

April 22nd, 2010
Value-it preliminary results In this and the few next posts we are going to  show data obtained in Value-it project. These results have been got from a market research that includes 50 in-depth interviews and a poll carried out at the end of 2009 to 625 directors, managers and people involved in the process of IT [...]

English, Europe, Spanish, Uncategorized, Value-it project, forecast, market, semantic technologies

Semantic technologies to manage ideas

April 13th, 2010
Semantic Technologies can be useful to manage ideas, improving the performance of trends as Open Innovation.  As commented in previous posts (Taxonomies vs Semantic Technologies to improve open innovation: Some data, Taxonomies vs Semantic Technologies to manage internal information: Some data), we carried out some tests to measure the impact of semantic technologies in the [...]

English, Spanish, Uncategorized

Understanding the Semantic Web – 1

March 24th, 2010

The American library technology commentator Karen Coyle has produced an ambitious report Understanding the Semantic Web: Bibliographic Data and Metadata under the auspices of the American Library Association. In our increasingly open world, it was galling to have to pay $43 for the privilege of reading it, and especially ironic given that the Semantic Web delivers its value exponentially according to the amount of linked data made available to it, a point that is not lost on Karen Coyle in relation to library data.

In this first chapter, Karen sets the scene, providing rich historical context in terms of a history of cataloguing, and thus builds up an irresistible argument as to why libraries need to embrace the semantic web.

What is metadata?

It does no harm to define metadata, and these three points provide a useful starting point:

  1. It’s constructed – it is fundamentally artificial.
  2. It’s constructive – it is purposeful.
  3. It’s actionable – it should be possible to act on the metadata in some way.

Even more useful, though, is the example Coyle uses to show good metadata in action, i.e. the subway map, and saying that “If you were to superimpose this map over the city it represents, you’d find that the subway isn’t “true”, in the sense that it is neither to scale nor are the stations located where they would be on a map based on longitude and latitude.”

And continues…

And yet they perform their job incredibly well, to the point that one can arrive in a city for the first time, perhaps even with only a limited understanding of the local language, and find one’s way. These maps are a good example of functionality in metadata.

Karen then develops the idea by comparing the old-style inert paper map with one that has “machine-actionable metadata” behind it, which has the effect of enabling users to reuse it in unforeseeable ways.

Historical evolutions

Karen explains that the basic functions of bibliographic metadata have extended over time, in response to related changes in the catalogue’s context.

The sharing of cataloguing between libraries has a surprisingly long lineage. In the nineteenth century, libraries apparently used to exchange their printed book catalogues, sometimes for a charge. The industrial revolution was accompanied by a dramatic increase in printed publications, and the card catalogue came about at this time. This proved to be a mixed blessing, because although it was easier to update, it lost the ability to be accessed remotely, until the dawning of the database era a century later.

Over the course of the twentieth century, the library underwent transformative growth, new technologies were introduced to meet the needs that arose from that growth, and library management became more complex, until we get to today’s situation where, as Karen illustrates, we have a “need to filter one’s retrieved set by language in order to reduce the number of items retrieved from thousands to ‘only’ three or four hundred.” Functional augmentation such as faceting and ranking results have, as Karen puts it “put pressure on the catalog record, pushing it to perform functions it was not consciously designed to do”. This strain has been compounded by the merging of diverse back-office catalogues such as the serials check-in records into what we know today as the library management system.

And Karen is perfectly correct to remind us that information overload predates the Internet by almost half a century. The post-war boom led to an explosion of research activity, and new retrieval mechanisms, such as the citation database, were invented to help people navigate through the morass of papers written.

Whose metadata is it anyway?

Despite all these innovations though, one incontrovertible truth remained in place – the separation of library data from data in other domains. It now needs to be an integral “part of the dominant information environment that is the web.” As Coyle emphasises, that is where library users are, so it’s where the library needs to be.

The important question now is: how can the library catalog move from being ‘on the Web’ to being ‘of the Web’? The linked data technology that has developed out of the Semantic Web provides an interesting path to follow. It is specifically designed to facilitate the sharing of information on the Web, much in the same way that the Web itself was developed to allow the sharing of documents. The library must become intertwined with that rich, shared, linked information space that is the Web. Rather than creating data that can be entered only into the library catalog, we need to develop a way to create data that can also be shared on the Web. This requires that we expand the context for the metadata that we create.

Coyle notes the overlap in content between the library and the Web, which as yet, is extremely under-exploited, citing the simple fact that the name “’Herman Melville’ and the fact that he wrote Moby Dick are facts that are not limited to the data in library catalogs…”

She has set up a context that is both broad and deep for chapter 2 in which she will consider the Semantic Web in much greater detail.

English, Libraries, Metadata, Semantic Web, Uncategorized, Understanding the Semantic Web

Semantic Technologies Monthly Review. February 2010

March 4th, 2010
A new month, shorter than others in length but no in news intensity, at least referred to semantics. Related to Knowledge Management, we find several pieces of news about Companies that work with semantic technologies. For instance, Empolis an Attensity group company and leading provider in business user applications that generate value from unstructured data has [...]

English, Spanish, Uncategorized, monthly review, semantic technologies

HTML5 and Semantics

February 25th, 2010
Author: José Manuel Cantera Fonseca, Telefónica I+D HTML4, the language of the Web, is intended to define the content of a web page from an structural and presentational point of view but not from a semantic point of view. For instance, in HTML4 a <table> element can be used to present information about different entities such [...]

English, Spanish, Uncategorized

Semantic Technologies can be profitable, of course

January 26th, 2010
In the latest posts we have reviewed the present situation of Semantic Technologies for enterprises from several points of view: providers, technologies, demand. Undoubtedly we can claim that these technologies are mature enough to go to market, there is a big number of providers with interesting solutions, and there is demand for these technologies in lots [...]

English, Spanish, Uncategorized

Value-it. First deliverables main results. Demand Driven Report (Key findings)

January 22nd, 2010
Value-it “Demand Driven Report”  is very rich in data, conclusions, results, undoubtedly a reference document for people who want to know about Semantic Technologies possibilities. But a one hundred pages document is perhaps too much document for people who only want to have a high level picture. For this reason it is worth to highlight the [...]

English, Spanish, Uncategorized, demand, semantic technologies

Value-it. First deliverables main results. Technovision Report

January 15th, 2010
This document, that you all can download here,  tries to bring us nearer the STE supply vision. To achieve this, TechnoVision Report provides a vision of the STE products, services, technologies, and expertise that play and will play a key role in configuring and further consolidating the Supply of STE products and services from the [...]

English, Spanish, Uncategorized, Value-it project, semantic technologies, supply side, technovision

Pew Research investigates the Internet in 2020

January 8th, 2010

Found this survey on an O’Reilly blogpost. Some questions are quite trivial but PEW also asks about the impact of the Semantic Web in 2020.

Take your chance: If you’d like to take the survey, you can currently visit http://www.facebook.com/l/c6596;survey.confirmit.com/wix2/p1075078513.aspx and enter PIN 2000.

English, Uncategorized

Thoughts on Enterprise Linked Data

December 27th, 2009

There have been a number of discussions about “Enterprise Linked Data” recently, and I took part on a panel on precisely that topic at ESTC 2009. Unfortunately the panel was cut short due to time pressures so I didn’t get chance to say everything I’d hoped. In lieu of that debate here’s a blog post containing a few thoughts on the subject.

When we refer to enterprise use of Linked Data, there are a number of different facets to that discussion which are worth highlighting. In my opinion the issues and justifications relating to each of them are quite different. So different in fact that we’re in danger of having a confused debate unless we tease out this different aspects.

Aspects of the Debate

In my view there are three facets to the discussion:

  • Publishing Linked Data, the key question here being: What does an Enterprise have to benefit by publishing Linked Data?
  • Consuming Linked Data: What does an Enterprise have to benefit from consuming Linked Data?
  • Adopting Linked Data: What benefits can an Enterprise gain by deploying Linked Data technologies internally?

I think these facets whilst obviously closely related are largely orthogonal. For example I could see a scenario in which an organization consumed Linked Data but didn’t store or use it as RDF, but just fed it into existing applications. Similarly businesses could clearly adopt Linked Data as a technology without publishing or using any data to the web at all.

These issues are also largely orthogonal to the Open Data discussion: an enterprise might use, consume and publish Linked Data but this might not be completely open for others to reuse. The data may only be available behind the firewall, amongst authorised business partners, or only available to licensed third-parties. So, while the issue as to whether to publish open data is a very important aspect of the discussion, its not a defining one.

Here’s a few thoughts on each of these different facets.

Publishing Linked Data

So why might an enterprise publish Linked Data? And if that is a worthwhile goal, then is it clear how to achieve it? Lets tackle the second question first as its the simplest.

There is an increasingly large amount of good advice available online, as well as tools and applications, to support the publishing of Linked Data. We’re making good strides towards making the important transition from moving Linked Data out of the research area and into the hands of actual practitioners. The How to Publish Linked Data on the Web tutorial is an great resource but to my mind Jeni Tennison’s recent series on publishing Linked Data is an excellent end-to-end guide full of great practical advice.

We can declare victory when someone writes the O’Reilly book on the subject and do for Linked Data what RESTful Web Services did for REST. (And the two would make great companion pieces).

But technology issues aside, what are the benefits to an organization in publishing Linked Data? There are several ways to approach answering that question but I think in most discussions Linked Data tends to get compared with Web APIs. The value of creating an API is now reasonably well understood, and many of the benefits that come from opening data through an API also apply to Linked Data.

However the argument that Linked Data married with a SPARQL endpoint is as easy for developers to use as a Web API is still a little weak at this stage. SPARQL can be off-putting for developers used to simpler more tightly defined APIs. As a community we ought to consider it as a power tool and look for ways to make it easier to get started with. It’s also worth recognising that a search API is also a useful addition to a SPARQL endpoint as part of Linked Data deployment.

But publishing Linked Data can’t be directly compared to just creating an API, because its also largely a pattern for web publishing in general. Its increasingly easier to instrument existing content management systems to expose RDF(a) and Linked Data. So rather than create a custom API, which will involve expensive development costs, particularly if its going to scale, its possible to simply expose Linked Data as part of an existing website.

By following the Linked Data pattern for web publishing, in particular the use of strong identifiers, an enterprise can end up with a single point of presence on the web for publishing all of its human and machine-readable data, resulting in a website that is strongly Search Engine Optimised. Search engines can better crawl and index well structured websites and are increasingly ingesting embedded RDFa to improve search results and rankings. That’s a strong incentive to publish Linked Data by itself.

Adopting Linked Data, particularly as part of a reorganization of an existing web presence, could deliver improved search engine rankings and exposure of content whilst saving on the costs of developing and running a custom API. The longer term benefits of being part of the growing web of data can be the icing on the cake.

Consuming Linked Data

Next we can consider why an enterprise might want to consume Linked Data.

To my knowledge organizations are currently only publishing Linked Open Data (albeit with some wide variations in licensing terms), so we’ll skip for the present whether enterprises have an option of consuming non-open Linked Data, e.g. as part of a privately licensed dataset.

The LOD Cloud is still growing and provides a great resource of highly interlinked data. The main issues that face an organization consuming this data are ones of quantity (there’s still a lot more data that could be available); quality (how good is the data, and how well is it modelled); and trust (picking and choosing reliable sources).

To some extent these issues face any organization that begins relying on a third-party API or dataset. However at present a lot of the data in the LOD cloud is still from secondary sources. The same can’t be said for the majority of web APIs, which tend to be published by the original curators of the data.

These issues should resolve themselves over time as more primary sources join the LOD cloud. Because Linked Data is all based on the same data model bulk loading and merging data from external sources is very simple. This gives enterprises the option of creating their own mirrors of LOD data sources which will provide some additional reassurances around stability and longevity.

Linked Data, with its reliance on strong identifiers, is much easier to navigate and process than other sources, even if you’re not storing the results of that processing as RDF. There’s also a much greater chance of serendipity, resulting in the discovery of new data sources and new data items. Whereas there is virtually no serendipity in a Web API as each API needs to be explicitly integrated.

But this benefit is only going to become evident if we continue to put effort into helping (enterprise) developers understand how to consume Linked Data. E.g. as part of existing frameworks or using new data integration patterns is another area that needs more attention. The Consuming Linked Data tutorial at ISWC 2009 was a good step in that direction, although the message needs to be circulated wider, outside of the core semantic web community.

In my opinion it will be easier for enterprises to consume Linked Data if they first begin to publish it. By publishing data they are putting their identifiers out into the wild. These identifiers become points for annotation and reuse by the community, creating liminal zones from which the enterprise can harvest and filter useful data. This is a benefit that I think is unique to Linked Data as with an Web API the end results are typically mashups or widgets displaying in a third-party application; these are just new silos one step removed from the data publisher.

Adopting Linked Data

Finally, what value could be gained if an organization adopts Linked Data internally as a means to manage and integrate data behind the firewall?

The issues and potential benefits here are largely a mixture of the above, except that there are little or no issues with trust as all of the data comes from known sources. In a typical enterprise environment Linked Data as an integration technology will be compared to a wider range of systems ranging from integrated developer tools through to middleware systems. There’s a reason why SOAP based systems are still well used in enterprise IT as most organizations aren’t (yet?) internally organized as if they were true microcosms of the web.

Its interesting to see that Linked Data can potentially provide a means for solving many of the issues that Master Data Management is trying to address. Linked Data encourages strong identifiers; clean modelling; and linking to, rather than replicating data. These are core issues for data consolidation within the enterprise. Coupled with the ability to link out to data that is part of the LOD Cloud, or published by business partners, Linked Data has the potential to provide a unifying infrastructure for managing both internal and external data sources.

Its worth noting however that semantic technologies in general, e.g. document analysis, entity extraction, reasoning and ontologies seem to be much more widely deployed in enterprise systems than Linked Data. This is no doubt in large part because the advantages of those technologies may currently be much more easily articulated as they’re more easily packaged into a product.

Summary

In this post I wanted to tease out some of the questions that underpin the discussions about enterprise adoption of Linked Data. I’ve presented a few thoughts on those questions and I’d love to hear your opinions.

Along the way I’ve attempted to highlight some areas where we need to focus to help transition from a researcher-led to a practioner-led community. More data, more documentation, and more tools are the key themes.

#linkeddata, English, Semantic Web, Uncategorized

Data.gov ConOps

December 8th, 2009

Lots of mention of Semantic Web in Data.gov ConOps.  I'll read it in detail on the plane . . .

Uncategorized

Middlemash

December 1st, 2009

MiddlemashI was a newbie to the library mashup scene, and took in a lot of information yesterday at Middlemash, hosted by Damyanti Patel and her colleagues at Birmingham City University. It was every bit the friendly and stimulating event that I’d expected to be, but by the time I, along with an impressive number of co-malingerers, got to the Barton Arms at the end of the day, I was able to pinpoint what had made me mildly uncomfortable at intermittent points of the day.

The discomfort had nothing to do with either the organisers or the participants, or indeed with the concept of mashing itself. The problem is that the same forward-thinking librarians who celebrate the advent of electronic resources and innovative technologies for discovering them, are the same people who, in a mashing context, are forced back into the world of print. And this has to be about ownership of data. Bibliographic data is much more “ours” than electronic resource metadata, that has traditionally been proprietary, locked away in abstract and index databases, available only in academic institutions and certainly not mashable by a bunch of librarians with a strange predilection for creating more exciting experiences of scholarly information.

Mashing the reading list

Like many people at the event, Edith Speller from Trinity College of Music was concerned about her institution’s reading lists. She felt that they were getting too static, and out of date, and, like many Talis Aspire customers, wanted to raise awareness of all those expensive subscriptions to e-resources among academics who would then be more likely to include them on resource lists. However, the solutions arrived at seem to be very book-specific, involving the following:

• Using the ISBN of a book on a resource list to look up recommendations (along the lines of “people who bought that also bought this”) using Amazon Web Services.
• Using the Mosaic API to:

• Perform an ISBN look-up to find the courses associated with the people who have borrowed that book.
• Use course codes to look up what other books were borrowed by people on those courses.

Paul Stainthorp at University of Lincoln is using RefWorks to create embeddable lists of new titles and communicate them to users, by sharing folders within RefWorks publicy and creating RSS fees on that folder. He’s also used Yahoo! Pipes (the mashup panacea du jour) to pull in the book cover image and description from Amazon. Because their academics prefer notifications by email, as opposed to running their own RSS feed, an email now comes in when a new book arrives in their subject area.

No doubt academics are availing themselves of current awareness services provided by publishers to find out about new e-journal articles, but it comes back to the disintermediation of the library from e-resource metadata. Owen Stephens from Open University reflected in the pub afterwards on the decisive break that occurred with the electronic journal, when the library no longer owned the item, but merely licensed it. Tony Hirst concurred that the library world had never challenged the proprietary nature of abstracts and indexes.

Mashing the library floor plan

Owen ran a workshop in the afternoon to develop his idea for mashing library floor plans with Google Maps. We used the University of Sheffield library floorplan as a working example, and it was fascinating to hear about how Open Layer (an Open Source mapping tool) works. Apparently maps are divided into tiles of 256 by 256 pixels, and then some javascript asks for each tile as needed as the user navigates around the map. And as the user zooms in, the map simply moves to a more detailed set of tiles. The exercise of converting a floorplan into a zoomable map forces the library to consider how granular and practicable their floorplans – is there enough detail to establish on which shelf a book is located? Maintenance is also an issue and Owen suggested augmenting the shelving workflow, so at the end of shelving, the librarian records the start and end classmark of the shelf. We also considered separate scenarios where the user wants a particular book, on the one hand, or books on a subject area on the other.

University of Sheffield plans to use heat maps to analyse how users are navigating the library. With the Ranganathan maxim in mind (positioning the stock to minimise the need for users to move around the library) they would then be able to optimise the library layout.

Sure it’s funky, but I just want to renew my books

Earlier in the day, Mark Van Harmelen from Hedtek Ltd. based at the University of Manchester, urged us all to listen more to the student voice, through focus groups and other mechanisms. I know that Owen Stephens and many other Middlemash attendees are making every effort to engage with students in the idea and design stage right now. It will be interesting to see whether we’re expending too much energy on over-sophisticated solutions for the dying format of print. As Chris Keene from University of Sussex stated, the response of students to tag clouds and other features at the discovery layer is, “Sure it’s funky, but I just want to renew my books.”

Personally, I’d love to see more focus on work-level data. The published works of an author or indeed a subject area plotted against an appropriate timeline could be tremendously useful – the works of Dickens plotted against key social legislation of the 19th century springs to mind. But the approach would come into its own with non-fiction, where there is a more direct relationship between published literature and real world events. That would really add scholarly value to bibliographic data, and would enable us to break out of transactions such as reservations that are rooted in the past not the future of scholarly life.

English, Libraries, Mashups, Metadata, Uncategorized

Karen Calhoun completes a conversation with Talis

November 27th, 2009

sm_calhoun_karen When recording my previous Talking with Talis podcast with OCLC’s Karen Calhoun, in a hotel lobby over the road from the British Library in London, we suffered a technology failure loosing the last third of our conversation.

Karen kindly agreed to spend some time in a follow up conversation so that listeners could get to hear her thoughts on a couple of further questions I asked, including one about the future for library metadata formats. 

In addition I also gained the opportunity to ask her reflect upon the presentation she gave on that day.  The slides for which are available to view from the OCLC site.  The other benefit being that we were not competing with the music, staff, and hotel guests during the recording.

Technorati Tags: ,

English, Uncategorized

Application of semantic technologies in Internet on 2020 (II): education

November 10th, 2009

Yesterday I was reading El caparazón, one of the most relevant blogs about semantic applications in Spanish language, when I found this post about “Education and Web 2.0”. Undoubtedly this is an area where semantic technologies will have an important say in the future.

Some experts state that most of the knowledge that an elementary school student will need to perform his job when he will grow up, don’t exist yet. How can we focus the education in so a rapidly changing environment?. Certainly knowledge is advancing so quickly that it is an almost impossible task trying to keep the pace. This raises a fundamental change in the education approach, as its key task will be to transmit information to the students, to help them to manage all this information, to help them to distinguish useful one from useless, to help them to extract knowledge from all this information… This is, learning how to learn.
No doubt Internet will change education approach at all levels. No longer students will go to university to pick up some notes, or to listen a one way explanation in which the teacher talks and students listen. Because to access to information we have Google, and to hear lectures, we can easily access to those of the outstanding experts in each subject.
Any country that wants to maintain a high level in the knowledge society must be capable of integrating technologies within education systems at all levels. We’re going to be bombarded along all our lives with millions and millions of information bytes, this is a real fact we must live with. In this situation it will be paramount to extract useful knowledge from this information, indeed this will mark the difference among efficient and no efficient people. In this environment semantic technologies will play an important role, because they will help us to navigate through information and to adapt it to our needs, that is to contextualize it. Nowadays, semantic technologies have got an important level of madurity and standards as RDF, or OWL will help us to give the jump form a “textual” management of information to a “concept” treatment of this information. This is a first step and a very important achievement. However, some years will be required to settle these concepts, and to develop technologies allowing us to extract knowledge from all the information around us: this means tools to show us to learn.

Education, English, Spanish, Uncategorized, Web 3.0, semantic technologies, virtual education

Interesting developments at the Bibliotheque Nationale de France

November 9th, 2009

BNFHaving read some documentation recently around the plans of the Bibliotheque Nationale de France (BNF) for what they call a “pivot” – a mechanism based on semantic technologies for optimising the value of the BNF’s entire web presence, including Gallica, its digital library, it was great to have the opportunity to hear Dominique Stutzmann from the BNF speak at the recent Eurolis Seminar in London.

The future of the library (Doom or Bloom?) was what the day event was all about, and according to Stutzmann, we’ve already invented it. We’ve got the nice buildings, and so ostensibly the library of the future will be the same as that of today. If the library space vanishes, he argued, it will only be the result of a self-fulfilling prophecy because librarians aren’t confident about what they’re doing. I think he’s really onto something – there is indeed an element of subjective crisis in the problem of the future of libraries. He admitted, though, that Web 2.0 re-presents the user-librarian relationship in quite a fundamental way; the user becomes both publisher and librarian. But users don’t want librarians to disappear. He seems to be saying that our library spaces continue to be successful, so leave them alone but engage with some interesting technological stuff as well, because libraries are well-positioned to do so. He added that users trust libraries with everything including long-term preservation of data, and BNF is clearly poised to exploit that trust, but not for its own ends, but for everyone, in the great universal tradition of libraries.

Stutzmann perceives the potential of semantic technologies very clearly in terms of the user experience – giving everyone improved and accurate access to the information available, and had an impressive array of exemplars to reel off, citing Google Book Search’s use of data mining tools taking city name from search results and pinpointing them on a map, and Bibliosurf’s map of novels as examples. Along similar lines, he demonstrated an interactive map with mashed up data from last-fm to produce a map of composers, where proximity indicates artistic commonality rather than geographical proximity – for example Beethoven is situated alongside Vaughan Williams.

As a Modern Languages graduate, I loved hearing about semantic search developments at the European Library and specifically in their TELplus project, where multilingual search (i.e. a search query with terms from more than one language) has been achieved. Stutzmann was clear that authority data is indivisible from semantic web developments, and that is where the librarian tradition really comes into its own; he demonstrated search results with LCSH headings as a facet on the side-panel. He pleaded with librarians to use metadata to give more accurate access to data.

The only downbeat element to his presentation was a survey carried out at BNF in 2008 to get a clearer picture of their users. A key finding was that the average user of the digital library 48, although there is an overall age range of 14-94. Europeana suffers from the same problem. Funnily enough, when I was out on Saturday night, a friend was saying how almost all the people who queued up recently in Birmingham to see the Anglo-Saxon treasures recently discovered in the West Midlands were white people aged 50+. Stutzmann pondered whether there was anything that could be done about it – does it come down to lifestyle fundamentals?

In the same survey, there was a fascinating finding about Library 2.0. Many users questioned felt that library sites should not be spoilt by the comments of user. They are happier to share their information and collaborate with the librarian than with other users. Obviously this goes against received Library 2.0 thinking, and left me wondering, is that a specifically “French thing”, or do UK users have more in common with their European counterparts than we think?

English, Libraries, Semantic Web, Uncategorized, eurolis

Europeana: Think culture

November 9th, 2009

EuropeanaAiming high is rarely the wrong thing to do, in my opinion, and Jonathan Purday’s presentation, at the Eurolis Seminar Doom or Boom of Europeana, a digital library offering a single, direct and multilingual interface to cross-domain European cultural artefacts certainly wasn’t short of lofty aims. Europeana isn’t just about making library resources available, it’s about breaking down the cultural institution-based silos right across the European cultural sector, and in the process it has created an exciting online resource for the public, researchers and teachers and learners in education.

It’s easy for British people to forget the risk that the Google Book Project will overshadow non-English artefacts in Europe, and this has been an important concern since at least 2005, when the European Commission launched its Digital Libraries initiative. Initiatives such as Europeana are, in Purday’s words “making available the intellectual record of other languages”. And it will also “harmonise digitisation practices across Europe”. All good stuff.

It was also great that Purday acknowledged that every search now begins with Google, and that if you don’t find material, you think it hasn’t been digitised or it doesn’t exist. I and a number of delegates were left wondering at the end of the session, though, whether the full text of content in Europeana will be exposed to Google, and if Purday could come back on that point, that would be useful.

It’s worth mentioning that every single speaker at the Eurolis seminar mentioned the need to consider copyright harmonisation and Purday was no exception, but he probably deployed the most powerful arguments to support this. We can’t digitise at the scale now technologically possible, he argued, unless we reconsider and harmonise copyright, he said, and that the risk was of creating a “20th century black hole”, whereby we will be unable to represent the published output of “the most documented century” and we will end up with a distorted picture of the past as a result.

I would urge people to take a look at Europeana. The search interface is available in 26 languages, and in the next 2 years they plan to be able to translate search terms on the fly (currently only the interface is translated). Purday demonstrated a search on Don Quixote, which not only came up with an impressive range of book editions, but also images inspired by the work, plus videos, including a 1956 news broadcast in which Salvador Dali recreates a vision of Don Quixote at Moulin de la Galette. Europeana holds metadata in the central index and takes the user back to the original site to look at the full artefact, so decentralised and collaborative in a sustainable way.

Europeana is currently attracting 15,000 users a day. Purday is concerned, though, that most people interested in the site are over the age of 45. He plans to address this by creating an API so users can put Europeana into their own web space, although in discussions afterwards, people wondered whether such a measure would succeed in engaging younger people.

English, Google Book Settlement, Uncategorized, digitisation, eurolis

Semantic Technologies Monthly Review. October

November 6th, 2009

Lots of news related to semantic technologies have appeared in the media during this month.

  • As is usual some of them are related to search engines for example perfect search , or bing
  • There are some mentions to some application of semantic technologies to concrete areas for example the patent research. In this area LexiNexis announces the introduction of  transparent semantic technologies in the search of patents. Or for example related to advertising, or to smarter aggregators
  • In the area of the press, the NYTimes announces their contribution to the linked data cloud with first 5,000 Tags Released to the Linked Data Cloud
  • In the field of social networks, Adaptive Blue’s Glue is a Firefox add-on that uses semantic technology to understand the subject of the page you are on and then shows you via a bar at the bottom of your browser whether your friends have commented or liked the item anywhere on the Web. During this month Glue’s destination site, GetGlue.com, has been launched. This  is a recommendation network for people with the same interests in books, music, movies and other products.
  • The good moment for these technologies is demonstrated by the fact that new projects are getting funds, for example Royal Melbourne Institute of Technology (RMIT), in collaboration with an industry consortium facilitated by Fuji Xerox Australia have got a grant of 1,4 million $ from Australian Research Council (ARC). And by the fact that there are awards for the most innovative companies, for example 2009 Promise and Reality award tries to promote innovative technology solutions for implementing and integrating knowledge management practices into their business processes. Among the list of finalists some companies related to semantic area are included
  • Semantic Technologies are still in the first phase of implementation but there is place for celebrations Thomson Reuters Celebrates Ten Innovative Sites And Services Using OpenCalais.
  •  Semantic technologies applications are very diverse, among them we find very curious tools, one of them is the application of semantic technologies to the book of odds, a tool of a Boston company that is set to answer questions as these: What are the odds of being struck by lightning? Bitten by a rabid dog? Run down by a bus? Audited by the IRS?

English, Spanish, Technologies, Uncategorized, monthly review, semantic technologies

Describing SPARQL Extension Functions

November 5th, 2009

At the end of my recent post on Surveying and Classifying SPARQL Extensions I noted that I wanted to help encourage implementors to publish useful documentation about their SPARQL Extensions. If you’re interested in the current state of that survey then you can check out my current spreadsheet listing known extension functions. There are more to add there, but its a good summary of the current state of play.

At VoCamp DC last week I did some work on designing a small vocabulary for describing SPARQL Extensions. The first draft of this is online here: SPARQL Extension Descriptions. There’s a little bit of background on the Vocamp wiki too, if you want to see my working :) .

Here’s an example of the vocabulary in use, describing some extensions to the ARQ SPARQL Engine:


<http://jena.hpl.hp.com/ARQ/function> a sed:FunctionLibrary;
  dc:title "ARQ Function Library";
  dc:description "A collection of SPARQL extension functions
      implemented by the ARQ engine";
  foaf:homepage <http://jena.sourceforge.net/ARQ/library-function.html>;
  sed:includes <http://jena.hpl.hp.com/ARQ/function#sha1sum>.

<http://jena.hpl.hp.com/ARQ/function#sha1sum>
  a ssd:ScalarFunction;
  rdfs:label "sha1sum";
  dc:description "Calculate the SHA1 checksum
       of a literal or URI.";
  sed:includedIn <http://jena.hpl.hp.com/ARQ/function#>.

<http://jena.hpl.hp.com/ARQ#self> a sed:SparqlProcessor;
  foaf:homepage <http://jena.hpl.hp.com/ARQ>;
  rdfs:label "ARQ";
  sed:implementsLibrary <http://jena.hpl.hp.com/ARQ/function>;

Ideally what should happen is that every URI associated with a filter function and property function should be dereferencable, and that terms from this vocabulary be used to describe those functions. There’s a lot more detail that could be included, but I suspect this is sufficient to cover the primary use cases, i.e. documentation and validation.

The draft SPARQL 1.1. Service Description specification does cover some of this ground, but falls short in a few places, and I think some of what I’ve described here could usefully be folded into that specification without greatly extending its scope. But thats a matter for the Working Group to decide.

One specific issue is that the specification doesn’t currently recognise “functional predicates” (to use Lee Feigenbaum’s preferred term; others include “property functions” and “magic properties”) as a distinct class of extensions. They clearly exist, so I think we should have a means to describe them. In fact arguably they are the most important class of SPARQL extensions that need describing.

Filter functions are relatively well understood and can clearly be identified based on where they appear in a query. Language extensions will generate a parser error if an endpoint doesn’t support them, so will easily be caught. But functional predicates use existing turtle triple pattern syntax, but typically involve triggering custom logic in the SPARQL processor, rather than actually appearing as triples within the dataset. Without the ability to dereference their URIs and identify them as a functional predicate, a SPARQL engine will simply treat them as a triple pattern and fail silently, rather than complaining that the extension is not supported.

The following example query illustrates this:


PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>
PREFIX func: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX ex: <http://example.org/vocab/>

SELECT ?doc ?contributor WHERE {
   ?s dc:modified ?created.
   ?s ex:authors ?authorList.
   ?authorList list:member ?author.
   LET ( ?contributor := ?author )
   FILTER ( ?created < func:now() )
}

The above query contains 3 extensions: a language extension (LET); a filter function (func:now()); and a functional predicate (list:member). Without prior knowledge of that predicate, or the ability to dereference its URI, there’s no way to know that the functional predicate is not really a triple that the query author is attempting to match against, rather than an extension.

I’d like to urge all implementors to consider making their extension URIs dereferencable. The schema I’ve drafted is very light-weight so shouldn’t be difficult to support. I’m also very happy to take comments on its design. I’m intending it as a starting point for others to help build upon.

English, Uncategorized

Semantic Social Networking

October 15th, 2009

FOAF was one of the first Semantic Web projects, and is still trotted out as an example on a regular basis.  The FOAF model itself has been criticized a number of times (I don't feel like googling all the examples), but there are some things about FOAF that are very interesting in today's world.

One could criticize FOAF for having invented social networking in the late nineties, then having missed the whole Web 2.0 boat, to have the limelight taken by myspace, linkedin, livejournal, and nowadays by facebook.  Indeed in terms of bringing social networking awareness to the masses, this criticism would be true.  But if you have a look at some of the founding assumptions behind FOAF, you'll find that the project was eerily prescient - forseeing problems with social networking that took years to come to light once social networks became commonplace.

A simple example is a bit of drama that happened on the social networking site LiveJournal a couple of years ago.  Livejournal was sold to a Russian firm, with the risk that all the servers, with all those back journals, would migrate outside the United States.  Many American users (who for the most part had been ignoing the vast number of Russian speaking users) suddenly became aware of the fact that their precious journal data might drop out of control of copywrite laws that they understood.  A panic ensued, and LiveJournal dump programs became quite the "meme".

A more recent example was the change of the terms of use for Facebook.  Suddenly, Facebook reserved the right to use your photos in its advertising.  Okay, they probably don't want that photo of the time you passed out in Vegas and your 'friends' stripped you to your underwear and drew faces on your chest with shaving cream, but you never know.  The outcry amongst FB users cause them to rescind this policy.  But the same issue came up again - who owns the data that you put on social networking servers?

FOAF understood this issue over a decade ago, when they envisioned a distributed social network, where servers owned/operated by different agents could participate in the same social network.  A sort of decentralized, distributed version of facebook.  Where you kept your own ownership, access control, backups, etc.  Or you could hire someone to do it for you, if you preferred.  But you had the option.

This is a key idea behind the Social Web - not just social networking on the web, but making the network part of the web itself.  How can this work?  The Semantic Web plays a big role in the solution - or so many of us believe.  Come to the Social Web Camp in Santa Clara on November 2 and  find out what the W3C and others are doing to make this come true. 

Uncategorized

What makes a good library service? New guidelines issued by CILIP

October 14th, 2009

CILIP logoAt the PLA 2009 conference last week, Bob McKee, Chief Executive of CILIP, proudly presented a new set of guidelines as to what makes a good library service. In comparison to the traditional bulky, text heavy and complex use of language presented in traditional library guidelines, this A5 pamphlet could easily be overlooked as an advert or flyer rather than library guidelines. However, this is not to be perceived as a bad thing. The concise manner in which it is presented leaves no room for hot air and leaves it do exactly what it says on the tin: guide.

The guidelines urge the library service to be:

“Continually refreshed and improved to respond to the adapting needs of local communities”

And

“Library buildings, equipment and ICT facilities should be well-designed and kept up-to-date.”

The ten questions to ‘test’ whether your library service is up to standard, highlight many benchmarks which could only ensure a good service is being achieved. The one which caught my eye in particular, was point four.

“Does your library service provide what local people expect in terms of location, accessibility, materials, resources, staffing and activities?”

There is not a ‘one size fits all’ solution to turning around the current perception of the library service; each should not be a clone of another. Whilst sharing best practise has a valuable role to play, we must engage with those around us ensure the local library service is engaging, and as odd as it may seem, local.

Download the guidelines here.

CILIP, English, Libraries, Library, Talis, Uncategorized

All-Party Parliamentary Group on Libraries, Literacy and Information Management Report: a review

October 13th, 2009

APPG report more ppl shotLast week, the All-Party Parliamentary Group launched their new report: an inquiry into the governance and leadership of the public library service in England. On the basis of the progression we have seen with the DCMS modernisation review, I had little expectation of this report providing any real insight or vision. As I worked my way through the report, I found myself scribbling and highlighting away, only to find the very thought I had just noted to be clarified in the upcoming paragraph. So I was pleasantly surprised to say the least, as I found the report to consider more perspectives than I anticipated.

It would have been too easy for the scope of the report to be wide and vague, which no doubt would have provided a foggy vision if any. So it was good to see that the focus of this report is specifically on the effectiveness of arrangements for the governance and leadership of public library services. The six lines of enquiry were very appropriate in light of the current situation. They were:

1)      What are the strengths and weaknesses of the present system for the governance and leadership of the public library service in England?

2)      Should local communities have a greater say in decisions about the public library service?

3)      Should central government do more to superintend the public library service?

4)      Are local authorities the best agency to provide library services?

5)      What are the governance and leadership roles of the Advisory Council on Libraries (ACL), the Museums, Libraries and Archives (MLA) and the Department of Culture, Media and Sport (DCMS)?

6)      What changes (if any) are required to improve and strengthen governance and leadership?

Perhaps a closer look into the role of technology and innovation may have been a potential area for inquiry, though this may be something which stems from point six. As the report began to take a closer look at the strengths and weaknesses of the public library service, they acknowledged that:

“The submissions presented a bleak national picture with more weaknesses than strengths being identified.”

Amongst some of the more legitimate and agreeable points raised, there were a few points which led me to frown as I read. For example, the group believes the library service is diverse and innovative, listing it as one of its strengths. But is this really the case? Would this report really be necessary if they were? A couple of contradictions arose too, for example, listing staff to be helpful and experts at one point and then ill equipped and unhelpful at another.

In summary, the key recommendations were to develop one lead voice for libraries through the establishment of a single Library Development Agency for England (LDAE). A reassuring recognition, as a vision leading the library service could not be any more crucial than it is today. The current role and purpose of the many national agencies has brought confusion to the service, lacking a prominent player leading the way. The report rightly recognises the library sector has lost its way, and is sadly regarded to be of low value by decision makers.

Whist the LDAE is in the making (I assume answers around who, when and how are yet to come) we can expect a mid-term communications strategy and training and development programmes for public library personnel to improve management and leadership skills, from the MLA. Interesting, as the report recognised the MLA’s poor record with libraries in the past, and some contributors felt regret around the recent changes to its regional structures. The formation of LDAE would result in revision to the role, function and allocated funding of the MLA, making them a surprising/uncertain candidate to lead the way on the mid-term plans.

Overall, I was pleased to see the group recognise dramatic action is required and quickly. Yet it could be argued that recognising the problem is the easy part, finding and implementing the solution is the real challenge.

Image copyright of APPG. Publisher, CILIP.

Full report available to download from CILIP.

APPG, DCMS, DCMS Review, English, Libraries, MLA, Public Libraries, Talis, Uncategorized

PLA – Day 3 and final thoughts

October 9th, 2009

2311077890_4fa91cb329Day 3 and it’s the final day of the Public Library Association conference 2009. I had low expectations for the day, as I misread the conference programme to believe the day would be dwindling to an end. Yet as the first session began, I was quickly proven wrong.

I assumed the ‘Libraries opening doors to health’ session would be bland and irrelevant, so was attending a little half heartedly. But as Bob Gann, Head of Strategy and Engagement for NHS Choices programme began the session, he had me engaged straight away. The NHS Choices web site allows patients to review their own health services, and has been (informally) described as the “NHS Trip Advisor”. Aside from the direct work the programme does with libraries such as bibliotherapy and community information centres, it was clear the programme and the strategies used to execute it could be mirrored in libraries. For example, he crucially recognised the importance of syndication. Though the site gets lots of hits (attracting over 7 million visits a month), he acknowledged early on that people are less likely to visit a government website out of all the websites they could choose from, so by syndicating NHS information to over 100 different channels, such as YouTube to showcase videos and Boots to support their existing health information etc. they were able to reach a wider range audiences. An enjoyable presentation which I dare to describe as insightful, and hopefully something which librarians recognised as something they could emulate to achieve such similar successes.

The second presentation was from Senior Library Managers at the Nelson Mandela Bay Library Service and Nelson Mandela Bay Metropolitan University and it began with a 15 minute thank you to the conference organisers. This is all very well, but I would’ve much rather preferred that that time was spent talking us through the projects. Just as I began losing my patience, some interesting aims began appearing on the screen. The NMBM aims to meet the information needs of those less privileged social groups, recognising that university and public libraries are building blocks of local information and knowledge infrastructure. Key projects were showcased during the session, including a reading project working with the youth of South Africa and New Zealand. The project encouraged participants to become avid readers – a unique fact in itself, as resources are not easily accessible in South Africa. Another project to develop partnerships to improve service delivery, increase the flow of information was adopted as it was believed to be the way forward. By the end of the session I was left thinking, if a library in South Africa can achieve so much with so little and really make a difference to their community, why can’t we?

Following a well deserved break, John Fisher, CEO of Citizens online began his session. He believes the focus should not be about getting everyone a computer, but ensuring everyone benefits from the use of a one. Conscious of his semi-graveyard slot, John began some quick interactive surveys to demonstrate the scale of the population who don’t use technology. Apparently, 15-16 million people (one quarter of the of the UK’s population) doesn’t use technology. And a further third of those are totally disconnected, and see no benefit in using it at all. He went on to explain the Everybody Online project, where a digital champion has been recruited, Martha Lane Fox, the Co-founder of Lastminute.com to launch a strategy to improve these statistics. The project aims to optimise social media tools to engage with communities by allowing them to choose their own information, and encouraging them to share and build online communities. It was a nice change to see a speaker actually speak and not read from a card or slides; in fact John’s entire presentation had no slides, resulting in a highly engaged audience.

Followipla2009ng the last few sessions, I began concluding my thoughts of the three days and of my first PLA conference. Though officially the themes were centred on community engagement, in hindsight, I felt it was something quite different. Reading between the lines, I felt the main focus of the delegates wasn’t around engaging with their communities at all, but more about justifying their existence. Cases like Wirral and more recently, the proposals of library closures in Aberdeenshire, has left librarians constantly thinking about how they can build their portfolio of ammunition, should their service come under the firing line some time soon. And if recent goings on are anything to go by, it’s almost certain that they will have to in the coming years. Each speaker seemed aware of this too. Though not literally, each was providing ideas and models to do so, with the term ‘outcome based accountability’ sneaking in quite frequently.

Throughout the conference I was keen to speak to as many people as possible and gauge their opinion on the sessions as they happened. It was interesting to see the two distinct interpretations of the presentations that emerged. Throughout the conference, many librarians felt many of the speakers weren’t as insightful as they’d hoped, lacking an understanding of the real issues. Whereas particular Councillors and Senior Executives were nodding enthusiastically when informally discussing over lunch that the declining library usage would rightly justify library closures. There appears to be a distinct difference in vision for the future of libraries between librarians and those elsewhere, begging the question, do we need to engage internally before externally? Should my assumption be correct, librarians have no option but to fail if half of the team has already given up…

English, Libraries, PLA 2009, Talis, Uncategorized, books

PLA 2009 – Day 2

October 8th, 2009

Grand hotel

Today, my day didn’t begin in the most ideal way. As I’m staying in a hotel a few minutes away from the conference, a complementary shuttle bus has kindly been provided to escort delegates back and forth. This morning, a combination of a late dash for breakfast and the shuttle bus being reliably late, led me to be a little more flustered than usual, only just managing to make the start of the conference. However, I didn’t let this dampen my outlook for the day as, of course, today was the day the DCMS publish their long awaited Modernisation Review; at least it was supposed to be. But more on that later.

Andrew Cozens, Strategic Advisor at the Improvement and Development Agency (IDeA) kicked off the day with his interactive workshop, introducing the approach – outcomes based accountability. He explains that currently there are too many terms defining performance measures, and not enough discipline in using them. By using three key particular definitions, ‘outcomes’, ‘indicators’ and ‘performance measures’, a real outcomes based accountability approach can be achieved. The term outcome would be used only to describe the high level goal, for example, ‘improve the well being of children and adults’. The term indicator would then go a step further, by highlighting the measure which helps to quantify the achievement of an outcome, and finally performance measure would then measure how well the programme is performing. Overall, this was an interesting session which challenged delegates to re-think their current thought processes, as all too often, it’s easy to focus on the measuring performance elements and lose sight of whether the outcome is improving.

Then the session many were waiting for began, as the Rt. Hon Margaret Hodge, Minister for Culture and Tourism took to the stage. She began by acknowledging that public libraries are very precious, but from time-to-time, we must question whether things could be done differently to ensure a comprehensive and efficient service fit for purpose in the 21st century is being delivered. She then went to on to provide some ‘interesting’ statistics which appeared to paint a sad and downward spiralling trend in library usage. However, these statistics were later questioned, to which Margaret was only able to respond “I don’t know where they [the statistics] came from, they are just given to me”.

She believes engaging with young people requires radical innovation, as they require something new and something stimulating. Her acknowledgment of the technological revolution being at the heart of future of libraries hinted at what the (once again delayed) Modernisation Review would focus on, looking to models such as LoveFilm and Amazon. Some ‘innovate’ suggestions for libraries included a loyalty card that rewards every ten book loans with a free DVD hire and a library card for every new born baby, bringing frustration to many delegates sitting at my table, as they squealed “We’ve done that for years”. They felt such suggestions demonstrated Margaret’s lack of understanding of the library profession and felt patronised. However other ideas to provide an internet lending service to have books delivered to your home; selling books as well as lending in conjunction with companies like Amazon, led to more positive reactions.

The Modernisation ReMargview itself is to be published in a much faster paced climate than previously published reports, she explained, and therefore, the DCMS do not intend for it to be the last word in the conversation. Margaret would like the time to input her thoughts on the paper before release, and publish as a consultation document. The cynic may read this as a lack of ideas or direction on the DCMS’ part, yet others may believe wider consultation is a genuine attempt to engage with those experienced in the field. In her closing statements, she encouraged librarians to get in touch, as she would like to produce a comprehensive and controversial report. She promised that the Government remains committed to strong and modern public library services and will continue to value and champion them.

The third session was lead by Liz Forgan, the Chair of the Arts Council, highlighting the importance of reading. From the conference programme, I got the impression that this would be a bad case of preaching to the converted, however, I was proved wrong. She explained, for a library to support reading is instinctive, but today, everything must be evidence based, therefore the difference that reading makes must be highlighted. “Libraries are central to reading, and reading is your jewel” she explained.  Miranda McKearney, Director of the Reading Agency explained how they can work closer with libraries to do this. Firstly, national reading programmes can be worked harder. Secondly, stronger partnerships can be established with publishers, broadcasters and media to publicise reading further. By setting up a digital taskforce to take up reading developments online can help showcase achievements as well as build stronger networks. Thirdly, a 21st century library workforce created via strategic training could also contribute significantly to wider reading. And finally new thinking would be essential to develop clear messages and creative new projects. The session finished on thoughts of cross authority reading strategies, where a show of hands indicated a mere two local authorities were actively adopting them. A second show of hands highlighted how many would like to adopt such strategies in their libraries and this time there were significantly more than just two.

For the afternoon session, we were given the opportunity to visit local libraries providing unique and innovative services. I chose to visit the Hartcliffe Library and the Knowle West Media Centre in the South of Bristol. The Hartcliffe Library was built in 1974 in what was once a vibrant part of the area. Following the closure of a nearby factories and banks, the library began to suffer. It wasn’t until the adjacent Morrisons supermarket was built that the area became revitalised and the close nit community was reformed. In 2003 the refurbishment of the library began, in which the local community remained faithful to the service, bringing flasks of hot drinks through times of power cuts. With strong support from youth in what is described to be a ‘challenging area’ the library acts as a social environment engaging with all, simply by opening up.

The Knowle West Media Centre is a stunning building; the walls of which are made of straw bales and a rubber roof which harvests rain water. As we were shown around the building, we were told about the activities that take place within the centre including photography, music and film maker projects. But what was really interesting was how the local youth had been engaged in the development of the building. And we’re not just talking minor consultation. Real decisions such as choosing designers, architects and creating the design brief were all done in close conjunction with the local youth. This way, not only is the passion ignited within the youth straight away, but they are presented with a building that they are a part of and something which is made to their requirements. The Media Centre staff believe they learn just as much from those who use the centre as they do from them. They believe the jobs of the future require a solid understanding of digital skills and therefore the centre has a massive role to play.

Today I have enjoyed speaking to delegates from all sorts of backgrounds and the coach trip around Bristol. Though my highlight has to be Margaret Hodge’s presentation, simply because of the debate she stimulated. Tomorrow promises more interesting sessions as the conference draws to an end. Watch out for PLA Day 3 tomorrow…

Images published by _satunine and ourcreativetalent on Flickr

DCMS, DCMS Review, English, Libraries, Library, Margaret Hodge, PLA 2009, Talis, Uncategorized, books