Archive

Archive for the ‘English’ Category

“The distributed social web”

November 20th, 2008

I read an interesting Gartner talk summary by Ross Dawson about the distributed social web, via another blog post by Chris Saad. Building blocks like OpenID, oAuth and microformats are mentioned in both posts, and I wanted to pipe up on behalf of the Semantic Web (if I may)…

A distributed social web is one of the ultimate goals of projects like FOAF and SIOC. Both FOAF and SIOC have recently been listed by Yahoo! SearchMonkey as recommended vocabularies (FOAF for personal profiles and social networks and SIOC for blogs, discussion forums and Q&A sites). Ross, if you like this topic, then you’ll probably love ideas like SMOB (Semantic Microblogging), where people can keep their microblog entries in their own space and then push them to as many Twitter-like aggregation services as they want. See my post on this here.

Also, here’s a slidedeck about SIOC for the uninitiated:

See also:

English

McNamee: Textual Representations for Corpus-Based Bilingual Retrieval, 9am Mon 11/24

November 20th, 2008

Paul McNamee will defend his dissertation on Textual Representations for Corpus-Based Bilingual Retrieval at 9:00am Monday 24 November 2008 in ITE 325B. His mentor is Charles Nicholas and the dissertation committee includes Tim Finin, James Mayfield (JHU), Sergei Nirenburg and Doug Oard (UMCP). Here is the abstract.

The traditional approach to information retrieval is based on using words as the indexing and search terms for documents. One part of this research investigates alternative methods for representing text, including a method based on overlapping sequences of characters called n-gram tokenization. N-grams are studied in depth and one notable finding is that they achieve a 20% improvement in retrieval effectiveness over words in certain situations.

The other focus of this research is improving retrieval performance when foreign language documents must be searched and translation is required. In this scenario bilingual dictionaries are often used to translate user queries; however even among the most commonly spoken languages, for which large bilingual lexicons exist, dictionary-based translation suffers from several significant problems. These include: difficulty handling proper names, which are often missing; issues related to morphological variation since entries, or query terms, may not be lemmatized; and, an inability to robustly handle multiword phrases, especially non-compositional expressions. These problems can be addressed when translation is accomplished using parallel collections, sets of documents available in more than one language. Using parallel texts enables statistical translation of character n-grams rather than words or stemmed words, and with this technique highly effective bilingual retrieval performance is obtained. Translation of multiword expressions is also explored.

In this dissertation I present an overview of the field of cross- language information retrieval and then introduce the foundational concepts in n-gram tokenization and corpus-based translation. Then monolingual and bilingual experiments on test sets in 13 languages are described. Analysis of these experiments gives insight into: the relative efficacy of various tokenization methods; reasons why n-grams are effective; the utility of automated relevance feedback, in both monolingual and bilingual contexts; the interplay between tokenization and translation; and, how translation resource selection and size influence bilingual retrieval.

English

POWDER documents published

November 19th, 2008
POWDER technology button The Protocol for Web Description Resources (POWDER) Working Group published four Working Drafts today. The purpose of the Protocol for Web Description Resources (POWDER) is to provide a means for individuals or organizations to describe a group of resources through the publication of machine-readable metadata.
  • Description Resources (Last Call); which details the creation and lifecycle of Description Resources (DRs), which encapsulate metadata
  • Grouping of Resources (Last Call); which describes how sets of IRIs can be defined such that descriptions or other data can be applied to the resources obtained by dereferencing IRIs that are elements of the set.
  • Formal Semantics (Last Call); which describes how the relatively simple operational format of a POWDER document can be transformed for processing by Semantic Web tools
  • Primer (First Public Draft)
Last Call comments are welcome through 5 December.

English

The Map of Data: Over 10 Billion Pieces of Reusable Information

November 19th, 2008

I just stumbled upon a useful resource from Sindice (the Semantic Web search engine) called the Map of Data. The Map of Data lists sites that export their information via Microformats and embedded RDF (as well which format(s) the sites are using). Each site has been categorized and conveniently placed into lists. The categories include books, people, places, products and listings, social news, events, politics, and more. According to Sindice over 10 billion pieces of reusable information can already be found across 100 million pages.

Got something to say? Leave a comment!

English

Semantic Applications at age one

November 19th, 2008

After a year, Read/Write Web has revisited their review of 10 promising Semantic Web apps, producing 10 Semantic Apps to Watch - One Year Later.

“A lot can happen in one year on the Internet, so we thought we’d check back in with each of the 10 products and see how they’re progressing. What’s changed over the past year and what are these companies working on now? The products are, in no particular order: Freebase, Powerset, Twine, AdaptiveBlue, Hakia, Talis, TrueKnowledge, TripIt, Calais (was ClearForest), Spock.”

They plan to publish a completely new list of Semantic applications to watch as the next post in the series and ask people to leave suggestions in the post comments.

Maybe Read/Write Web will do like Michael Apted’s 7up series and report back to us on how the systems are doing each year, which I guess may be like seven Web-years.

English

3scale provides infrastructure of the programmable web

November 19th, 2008

3scale provides infrastructure for the programmable web3scale Networks is a Barcelona-based startup that is trying to fill a critical gap in helping organizations manage web services as a business or at least in a business-like manner.

“3scale provides a new generation of infrastructure for the web - point and click contract management, monitoring and billing for Web Services. The 3scale platform makes it easy for providers to launch their APIs, manage user access and, if desired, collect usage fees. Service users can discover services they need and sign up for plans on offer.” (source)

They have been operating a private beta system for a few months and just announced that their public beta is open. Currently signing up with 3scale and registering services is free and the only costs are commissions on transaction fees your service charges. Once you’ve registered a service, you can install one of several 3scale plugins for your programming environment to get your service talking to 3scale and configure one or more usage plans. 3scale uses Amazon’s EC2, S3 and Cloud Computing services.

3scale’s co-founder and technical lead is Steve Wilmott, who we worked with for many years when he was an academic doing research on multiagent systems. Several months ago he invited us to add Swoogle’s web service to 3scale’s private beta. We were please with how easy it was and look forward to exploring how else to use 3scale.

A story in yesterday’s Washington Post, Manage Your API Infrastructure With 3scale Networks, has some more information.

English

First Make.tv cast about the Social Semantic Web

November 19th, 2008

Time for a bit of over-the-top web 2.0 adulation… at yesterday’s Digitalks event (organized once again wonderfully by Meral Akin-Hecke), Luca Hammer was there and filmed throughout the presentations and discussions - using two cameras at a time AND live-editing and live-streaming it on Make.tv. What is Make.tv? The most incredible web 2.0 application I’ve seen so far - it’s a TV-Studion in your browser! And it’s free! (Although I doubt I will stay free forever)

You can live-edit the input from several cameras - this can also be achieved by logging in on different computers at a time, thus using the input from several built-in webcams at a time. You can drag and drop the video input channels into your scene, make the embedded videos smaller to achieve a screen-in-screen effect, create your own TV design and virtual studio from graphics…. wow, wow, wow.

I played with it today, not being quite as adventurous as Luca, in that I used only one camera (see what he achieved yesterday with multiple screens), nor did I interrupt and restart the recording (which I could have), but even though, I find the visual result, i.e. the ’studio’ I built from the book cover, impressive enough.

So here is it: My introduction of the Social Semantic Web publication (which is in German, which is why the audio is in German, too, but you don’t need to understand what I am saying to be impressed by Make.tv). Jump to seconds 3:30 to 4:30 to see how you can switch between different screens while doing the web cast.

P.S. That’s an image below - you can embed the video, but you cannot (yet) deactivate that it starts automatically if you embed it, so I’ve decided to use an image on the blog instead. Click here, or the image, to launch the webcast on the Make.tv website.

Social Semantic Web - Webcast

Btw, I am not sure whether I said XML or XHTML in the webcast, but of course I meant XHTML when talking about the benefits of RDFa.

Reblog this post [with Zemanta]

English

Yahoo vs Google - Technology vs Advertising

November 19th, 2008

Just stumbled upon this observation in a blog post by Daniel Tunkelang where he compares Yahoo’s and Google’s latest key word tools, and chuckled. The occasion was Yahoo’s release of a new BOSS features called Key Terms, and Google’s announcement of the release of a new tool that tells you which keyterms you’re missing (i.e. should potentially buy):

I imagine that the technology behind both tools isn’t all that different–or at least doesn’t have to be. But, while Yahoo makes friends in the technology community (especially among researchers), Google makes friends in the advertising community–and makes itself oodles of money.

Nice analogy, Daniel!

English

New Approaches for Libraries – Jenny Levine in Conversation

November 18th, 2008

online-information-logo-2008 Internet Development Specialist and Strategy Guide for the American Library Association, and prolific blogger as The Shifted Librarian, Jenny Levine’s views challenge librarians to look to the future and engage with new technology, the web, and gaming.

Jenny Levine In this thoughtful conversation, Online Information Conference Key Speaker, Jenny explores the way libraries should be more open to experimentation, despite the concerns of spending other people’s money to deliver a better service to those people.   Much can be learnt from the wider web about simplicity and planning for a changing environment.  

Jenny also throws out the challenge to those attending the conference for specific questions or topics they would like her to cover in her presentation to get in touch.

English

boards.ie has the second highest number of unique visitors to an Irish website

November 18th, 2008

According to island of Ireland audited data on Irish websites from ABCe, boards.ie has the second highest number of unique visitors in Ireland, currently at 1.7 million compared with RTÉ’s 2.1 million visitors. This is more than the combined audited figures for both the Irish Times plus MyHome.ie (1.5 million) and for IN&M’s Irish Independent plus PropertyNews.com (1.2 million).


Unique users

Daft currently tops the page impressions league, with 86 million pages in September 2008. boards.ie had 22 million, ahead of the Irish Times and the Irish Independent (18 million each).


Page impressions

You can view the September 2008 ABCe certificate for boards.ie. See also the press release from Daft.

English

Web Awareness Barometer - please participate!

November 18th, 2008

This year, which is already beginning to draw to a close, has seen many exciting developments on the Semantic Web, in particular in the area of Linked Data. But, as past technological evolutions have shown: many innovations that researchers and experts get excited about today, won’t even have entered the market the day after tomorrow.

This is why we would like to know how you feel about the state of the Semantic Web in 2008 - which ever position on the web you are hailing from: semantic web practitioner, researcher, or ‘regular’ user.

Please participate in our survey:

Web Awareness Barometer 2008

This is not only an excellent opportunity to give feedback to resarch, development and the industry, but also your chance to win a set of two tickets for the i-Know / i-Semantics conference which is going to take place in Graz in September 2009, worth about € 700. Also, we are giving away three of our Emergency Exit - RDF t-shirts.

The survey is conducted by the Semantic Web Company (i.e. us) in cooperation with Know-Center Graz and the work group Corporate Semantic Web at the Dept. of Computer Science at Freie Universität Berlin.

Please take the survey - your participation contributes to gaining a better understanding of the potentials and barriers for the application of new web technologies, in particular of Semantic Web technologies.

The survey will close on Dec 22 - results are going to be published in February 2009. Thank you for your help and cooperation!

Reblog this post [with Zemanta]

English

Judges for the boards.ie SIOC Data Competition

November 18th, 2008

I am happy to announce that the judges for the boards.ie SIOC Data Competition are:

We had about sixty registrants and eight final submissions of very high quality. We will announce the winners in a few weeks time…

English

Introducing Glue for iPhone

November 17th, 2008

glue_friends_movies.png We are very pleased to announce Glue for iPhone - the companion application that brings the Glue network to the popular mobile device.

Using the iPhone application, Glue users can tap into the Glue network on the go.

Want to have easy access to your favorite books, music and movies on your phone? Do you ever head into a bookstore and wish you knew what your friends recently read? When looking for a place to eat, wouldn’t you like to know the restaurants that your food loving friends like? Looking for a movie in Blockbuster on Friday night and what to know what is popular? Trying to pick up the perfect bottle of wine in the store? Glue makes it easy to learn these things and more.

The application surfaces the following information:

1. Me. Access books, music, movies, restaurants, wine etc. that you liked and commented on via the browser. All your favorites are always synced and right there when you need them.

2. Friends. Searching for a new book at Barnes & Noble? Picking up a wine? Looking for a flick in Blockbuster on Friday night? Tap into an intelligent, aggregate list of popular things your friends liked around the web.

3. Popular. Stay connected to what is happening on Glue around the web. Flip though the 100 books, music, movies, restaurants, wines and more that are popular among Glue users.

To learn more about Glue for iPhone please watch the following screencast:



Looks fun? Download Glue for iPhone from the iTunes store now!

P.S. A special thanks to Dominick D’Aniello, our engineer who learned Objective C and then coded Glue for iPhone all by himself!

English

Gladwell: 10,000 hours to success

November 16th, 2008

The Guardian has an extract, A gift or hard graft?, from Malcolm Gladwell’s new book, Outliers: The Story Of Success, due out later this month. The piece introduces the idea that a key to becoming extraordinarily successful in a field is achieving early expertise and that to become an expert in a discipline requires on the order of 10,000 hours of practice. The 10K figure comes from the research of Anders Ericsson who in the early 1990s studied violinists at the Berlin Academy of Music.

“The curious thing about Ericsson’s study is that he and his colleagues couldn’t find any “naturals” - musicians who could float effortlessly to the top while practising a fraction of the time that their peers did. Nor could they find “grinds”, people who worked harder than everyone else and yet just didn’t have what it takes to break into the top ranks. Their research suggested that once you have enough ability to get into a top music school, the thing that distinguishes one performer from another is how hard he or she works. That’s it. What’s more, the people at the very top don’t just work much harder than everyone else. They work much, much harder.”

The extract focuses on some of the most successful people in the computer industry — Bill Joy, Bill Gates, Steve Jobs, and others — and argues that another part of their success was being born at the right time, in 1954 or 1955. This made them about 20 years old when the first person computers became available.

I’m going to seize on this as yet another personal excuse — I was born a half decade too early.

English

Google map of London with Flickr shape data overlaid

November 16th, 2008

Flickr place info now includes shape data for many places. See the Flickr code blog for more.

We’ve correlated most of Dopplr’s places with Yahoo WOE IDs using Flickr’s reverse geocoder, so we can use this data too. As an experiment, I wrote some clientside code to overlay this shape data onto the maps we use on Dopplr. Help yourself to the code if you want it: gist.github.com/25502

English

New SW Case Study Published by Yahoo!

November 15th, 2008
Yahoo! has just published a SW Case Study. The Case Study describes the SearchMonkey application that reuses structural data embedded in Web pages (in, eg, RDFa, eRDF, or microformats) to produce more compelling search results.

English

links for 2008-11-14

November 14th, 2008

OCLC Talk with Talis about the new Record Use Policy

November 14th, 2008

calhounOCLC logoRoy_tennant

I am joined in this Talking with Talis conversation by two well known OCLC names – Vice President WorldCat and Metadata Services, Karen Calhoun and Senior Programme Officer, Roy Tennant.

There has been much coverage on Panlibus and several other blogs, about the way the recently updated Record Use Policy was announced, the elements of the policy, and its ramifications for the wider library community.

Apart from the need to update and replace the current 21 year old Guidelines, the professed objectives of the new policy is to clarify and increase the possibilities for the sharing and transfer of OCLC records.  From the noise in the blogosphere, it is clear that many do not share that understanding.

Karen published an extensive post on Metalogue providing some background to the policy and its announcement.  I was delighted to share this extensive conversation with Karen and Roy to explore in more depth the intention, details and ramifications of this new policy, due to be implemented in February 2009.

In the conversation, my own questions were supplemented by some submitted by Talking with Talis listeners.  Thank you to those who took the time email me with those suggestions.

English

Extinction Timeline

November 14th, 2008

Following on from my previous post quite by chance I came across this extinction timeline that predicts the death of mending things for 2009. Fascinating predictions.

English

Reuters Calais to support Semantic Web Linked Data in next release

November 14th, 2008

Thompson Reuters announced on their blog (Life in the Linked Data Cloud: Calais Release 4) that their next release of the Calais web-based information extraction services will support linked data.

“In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.”

The new capabilities will be available in release 4 that is expected
out on 09 January 2009.

The change is based on Calais returning de-referenceable URIs for the entities it finds. Accessing those URIs will produce RDF with links to corresponding entities in DBpedia, Freebase and other sources of “Semantic Web” data. It will be very interesting to see how well their system does at mapping document entities (e.g., “secretary Rice”) to entities in the LOD cloud such as http://dbpedia.org/resource/Condoleezza_Rice. Accessing that URI with a request for content type application/rdf+xml returns the RDF at http://dbpedia.org/data/Condoleezza_Rice that has RDF assertions extracted by DBpedia from Wikipedia.

English

The End of Fixable Objects?

November 14th, 2008

This is a depressing trend in the engineering world: the sealed machine with no discernable parts. I sometimes feel that, at the age of 35, I am a member of the last generation that grew up in a world of fixable objects.

via Popular Mechanics comes this tale of an iPhone failing because of a build-up of dust. It’s very true though that despite the increased awareness of the imperative to build a sustainable society we continue the trend towards throw-away and unfixable items. Apple is one of the worst offenders here. There’s even a macroeconomic effect since a disposable items cannot create a secondary market of repairs and spares. Companies like Apple are ring-fencing all the value in their products for themselves.

English

links for 2008-11-13

November 13th, 2008

Life in the Linked Data Cloud - Calais Release 4 Coming Jan 09

November 13th, 2008

Life in the Linked Data Cloud: Calais Release 4

The Gist: Release 4 of Calais will be a big deal. In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.

The goal of this post is just to give our community a heads-up to start thinking and planning.

During the course of 2008 we’ve had three significant releases of Calais, with additional point releases nearly each month along the way. We’ve added new knowledge domains, improved performance, delivered integration with a range of tools and developed new user-facing applications. It’s been a year of amazing growth in our developer community and the capabilities of the Calais service.

While every previous release has accomplished something significant, Release 4 is going to introduce something that we think is game changing – and that’s life in the Linked Data cloud. It’s important enough that we want to give all the members of our community time to think about it, prepare for it and get your brains in gear on how you might use it.

Every release of Calais up to this point has focused on meeting the need to extract semantic information from text. Release 4 builds on this by creating the ability to harvest the Linked Data cloud using that semantic data.

For this all to make sense we need to introduce a few things. If you already know about de-referenceable URIs and the Linked Data cloud – skim ahead. If not – please take a moment to ingest the background you need.

When you send text to Calais it returns several things: entities, facts, events and categories. For purposes of today’s discussion we’re going to focus in on entities. Entities are just what they sound like – they are things. Some specific examples are people, companies, organizations, geographies, sports teams and music albums.

When Calais extracts an entity from your text it returns (at least) a few things. It tells you the name of the entity and it tells you what type of entity it is. Unlike other extraction services we don’t just return a list of things – Calais tells you it found a thing of type=Company and a value=IBM or type=Person and value=Jane Doe. But – there’s something else Calais returns that hasn’t meant very much up until now: it returns a Uniform Resource Identifier (URI) for that entity. There’s nothing magic about URIs - they are simply a unique identifier for every entity that Calais discovers. Here’s an example (it’s not pretty) of what the URI for the Company IBM looks like:

d.opencalais.com/comphash-1/7c375e93-de13-3f56-a42d-add43142d9d1

Well, that doesn’t look very useful does it? If you were to pull up that URI (when Release 4 is out) all you’d see is RDF with links to places called DBpedia and Freebase and Reuters. But keep those links in mind: they’re the key to a whole new world.

Linked Data is the name of a movement underway (not too surprisingly, initiated by Sir Tim Berners-Lee) that sets a standard and expected behavior for publishing and connecting data on the web. This isn’t about publishing web pages – this is about turning those web pages into data that’s accessible to programs to work with. We’ll give you a quick example to make it real: Wikipedia is one of the single largest sets of information across a broad range of topics in the world. It’s really great if I'm a person who's casually looking for information on a particular topic – but it’s not so great if I’m a computer program that wants to use that data. Why? Because it’s formatted and organized for people – not computers – to read.

But Wikipedia has a twin - in fact a Linked Data twin – called DBpedia. DBpedia has the same structured information as Wikipedia – but translated into a machine-readable format called RDF and accessible via the Linked Data standards. And, Wikipedia is not alone. A growing cloud of information sets from DBpedia to the CIA World Fact Book to U.S. Census data to Musicbrainz – and many others – is becoming available. What’s important is that this cloud is 1) growing, and 2) interoperable. There are “pointers” from entries in DBpedia to entries in Musicbrainz and back to entries in Geonames – it’s another big Web – but this time it’s a Web of Data.

So – lots of words and arcane concepts. Let’s try to bring it all together into something that makes sense. We’ll put one sentence out there – and then we’ll give a few examples.

Beginning with Calais Release 4 you and the programs you develop will be able to go from many of the entities Calais extracts directly to the Linked Data Cloud.

A simple example:

I want to process today’s business news. For each article I want to extract all of the companies mentioned – but only if the article also mentions a merger or acquisition. I am only interested in companies whose headquarters (or those of their subsidiaries) are located in New York State. Do all of that and give me a widget for my news site titled “Merger Activity for NY Consulting Companies”. And oh, by the way, this isn’t a research project – I want you to do it real time for the 10,000 pieces of news I process every day.

How would you do that? Option 1 is to hire a bunch of researchers, give them a fast internet connection and teach them to type very very fast.  Option 2 is to write some code that looks like this:

For each Article

   Submit to Calais, get response
       If MergerAcquisition exists then
           For each Company
               Retrieve Calais Company URI, extract DBpedia link
               Send Linked Data inquiry to DBpedia, get response
                   If CompanyIndustry contains “Consulting”
                       If CompanyHeadquarters = “New York”
                          Put them on the list
                       For each subsidiary
                          Send Linked Data query to Dbpedia, get result
                              If CompanyHeadquarters = “New York”
                                  Put them on the list

(lots of endif’s)

Print the list

That really is a pretty straightforward example. How about companies in the news with at least one subsidiary doing business in an area that the CIA Factbook considers dangerous? Or books released by authors who attended Harvard who live in Ohio? Or ... . We think you get the idea.

So. The summary. The combination of semantic data extraction (generic extraction, tags, keywords won’t do the trick) + de-referenceable URIs (entity identifiers you and your programs can retrieve) + the Linked Data Cloud = amazing stuff.

We’d like you to start thinking about it.

English

IET and CompSoc present a talk on the “Social Semantic Web” on 27th November

November 13th, 2008

The Social Semantic Web

Speaker: Dr. John Breslin, Engineering and Informatics, NUI Galway
Date and Time: 27th November 2008, 18:15
Venue: DERI, IDA Business Park, Dangan, Galway - useamap.com/deri

Open to the public, no attendance fee

The Social Web - social networking services, blogs and wikis - has captured the attention of millions of users as well as billions of dollars in investment and acquisition. As more social websites form around the connections between people and their objects of interest, more intuitive methods are needed for representing and navigating the content in these sites. Also, to better enable user access to multiple sites, interoperability among social websites is required. This talk will describe the semantic technologies that can be used to interconnect both people and objects on the Social Web.

John Breslin, BE (Electronics), PhD, MIET - www.johnbreslin.org

John Breslin is a lecturer at the Department of Electronic Engineering in the College of Engineering and Informatics at the National University of Ireland, Galway. He is also an associate researcher and leader of the Social Software Unit at the Digital Enterprise Research Institute (DERI) in NUI Galway, the world’s largest Semantic Web research institute. He is the founder of the SIOC project, which aims to interlink online community sites using semantic technologies, and which has been deployed in over 50 applications including Yahoo! SearchMonkey. The Irish Internet Association presented him with Net Visionary awards in 2005 and 2006 for the Irish community website boards.ie, which he co-founded in 2000.

For further information contact: Mark on 087 1251858 / mneedham@theiet.org
or the Institution of Engineering and Technology Ireland Network.

English

Read this: Linking Social Networks on the Web with FOAF

November 13th, 2008

Jennifer Golbeck, Matthew Rothstein. Linking Social Networks on the Web with FOAF: A Semantic Web Case Study. Proceedings of the Twenty-Third Conference on Artificial Intelligence (AAAI’08).
Download (PDF, 320 KB).

ABSTRACT
One of the core goals of the Semantic Web is to store data in distributed locations, and use ontologies and reasoning to aggregate it. Social networking is a large movement on the web, and social networking data using the Friend of a Friend (FOAF) vocabulary makes up a significant portion of all data on the Semantic Web. Many traditional webbased social networks share their members’ information in FOAF format. While this is by far the largest source of FOAF online, there is no information about whether the social network models from each network overlap to create a larger unified social network model, or whether they are simply isolated components. In this paper, we present a study of the intersection of FOAF data found in many online social networks. Using the semantics of the FOAF ontology and applying Semantic Web reasoning techniques, we show that a significant percentage of profiles can be merged from
multiple networks. We present results on how this affects network structure and what it says about relationships and individual behavior. Finally, we discuss the implications this has for using web-based social networking data to create intelligent user interfaces and social software.

Reblog this post [with Zemanta]

English

Semantic Tools from hakia for On-line Advertising

November 12th, 2008

At hakia, we have been developing our own Semantic Advertising System, which is at its final stages of testing. One of the branches of this development is the Contextual Advertising. We have developed a middle-ware system for contextual advertising which can be used by the 3rd party SEMs and on-line advertising warehouses.

To gauge the current state of things, we set up a demo system and experimented in comparison with the dominating player, Google Adsense’s test link. Several case studies later, we were surprised to find out that the Google’s ad targeting against the submitted content suffers seriously from poor relevancy significant percentage of the time. The on-line demo is available for interested parties, just send us an email and explain your interest.

The content below is about “BEAT GENERATION” but Google Adsense suggests ads like “beat maker” “Deadbeat father” and “beat DUI.” On the same screen, hakia identifies the correct triggers (bottom-left), and Yahoo test ads in response to these triggers bring relevant ads (top-left).

Semantic Contextual Advertising

While hakia consistently identified the relevant content in all test cases, Google’s poor performance puzzled us. If the performance of the Adsense test links is a good representation of the current state of contextual advertising, it would explain why most advertisers stay clear and why the CTRs are low in contextual advertising.

The cases we analyzed left no margin of debate for unknown factors like “user profiling”. For example, we tested a page talking about email spam at Department of Energy’s Web site. Google Adsense test links suggested ads related to Energy, whereas hakia correctly suggested email spamming. It is obvious that Google jumps on terms on the page using some sort of Term Detection, which is never comparable to real content detection based on semantic analysis.

Let me remind the reader of the fact that the actual problem in contextual advertising is the fast moving, dynamic content, but not the static pages where the content owner can somewhat optimize the ads manually one by one. The challenge is to do it accurately in an automated manner.

The middle-layer we have developed can be utilized in various ways. SEOs and SEMs handling millions of pages for keyword extraction, meta tag insertion, etc., can utilize hakia’s tool. Contextual advertising providers can deploy the hakia’s middle layer software to handle vast amount of dynamic pages literally “hands-free” with semantic precision.

To make real progress, we have to look beyond the obvious “backward looking” statistics which undermines the actual potential of contextual advertising due to the tainted practice in the past. If your Website’s traffic is coming 90% of time from search engines, then those people who are viewing your content have “search” in their mind. Relevant advertising next to your content serves their mind set, and maximizes your monetization capacity. Poor relevancy makes the ads look bad, not only at the onset of exposure but also on the entire advertising industry.

Let’s imagine the future where TV and PCs will merge. Semantic contextual advertising systems will be able to push relevant ads at the bottom of the screen as the conversations change from one subject to another in any TV program. Is the TV anchor saying something of interest? Right there you can click on a link to catch it and jump to a relevant Web page. That’s the future.

English

OCLC – any questions?

November 12th, 2008

Following on from the recent announcements from OCLC, I will be recording a podcast in the Talking with Talis series with Karen Calhoun to discuss the hows whys and wherefores of the recent WorldCat record use policy changes.

I would be delighted to receive suggestions for questions that I might include in the conversation – no guarantees as to which I will use  - Drop me an email richard.wallis@talis.com

English

Malcolm Gladwell (Geek Pop Star) on Outliers

November 12th, 2008

New York magazine has an article (Geek Pop Star) on Malcolm Gladwell whose new book, Outliers: The Story of Success, is due out later this fall.

“Malcolm Gladwell’s elegant and wildly popular theories about modern life have turned his name into an adjective—Gladwellian! But in his new book, he seeks to undercut the cult of success, including his own, by explaining how little control we have over it.”

His book explains why I never became a hockey star — I was born too late in the year. A disproportionate number of top Canadian Hockey players are born in the first half of the year. Gladwell’s explanation is that the cut-off for joining a junior hockey league is that you must be 10 years old by January 1. So if you were born on January 2nd, you will start playing with the advantage of being older, larger and stronger than your peers. I’m not sure that my August birthday explains my own poor skating skills, though.

This quote from the article addresses by Bill Gates did so well.

“Or take the case of Bill Gates. Gladwell cites a body of research finding that the “magic number for true expertise” is 10,000 hours of practice. “Practice isn’t the thing you do once you’re good,” Gladwell writes. “It’s the thing you do that makes you good.” Gladwell shows how Gates accumulated his 10,000 hours while in middle and high school in Seattle thanks to a series of nine incredibly fortunate opportunities—ranging from the fact that his private school had a computer club with access to (and money for) a sophisticated computer, to his childhood home’s proximity to the University of Washington, where he had access to an even more sophisticated computer. “By the time Gates dropped out of Harvard after his sophomore year to try his hand at his own computer software company,” Gladwell writes, “he’d been programming practically nonstop for seven consecutive years. He was way past 10,000 hours.” Yes, Gates is obviously brilliant, Gladwell concludes, but without the lucky breaks he had as a kid, he never could have had the opportunity to fulfill the true potential of that brilliance. How many similarly brilliant people never get that opportunity?”

I guess I spent my own 10,000 hours hacking Lisp too late in life.

English

Pellet 2.0 RC3

November 12th, 2008

We’re happy to announce that a new release candidate for Pellet 2.0 is now available for download. The RC3 release fixes several issues that were identified with RC1 and RC2.

The most noteworthy changes in RC3 include fixes to the SWRL optimizations and to the Jena interface. In addition, several improvements have been made to the command-line interface for better error handling. As usual, the Pellet trac site contains the complete list of issues fixed in this version and in RC2, which we released on Monday but did not announce.

Thanks to all the users who reported issues they encountered using RC1. Please continue sending your bug reports to the Pellet users mailing list and we’ll keep fixing bugs on our drive to a final 2.0 release.

English

Ideas worth spreading: More Entertainment, less Technology

November 12th, 2008

The tradition of Barcamps is not very old. The idea of course, is. Some of you might remember the upcoming of TED in 1984. TED stands for Technology, Entertainment, Design. It started out as a conference bringing together people from those three worlds. Since then its scope has become ever broader. „The power of the spreading of ideas“ led this initiative. The talks are inspired by the world’s greatest thinkers and doers and to my mind this is the nicest way to kind of relax during a Semantic-Web-Business-Day.

„Today, TED is therefore best thought of as a global community. It’s a community welcoming people from every discipline and culture who have just two things in common: they seek a deeper understanding of the world, and they hope to turn that understanding into a better future for us all.“ [Source]

Not only is it sine qua non to regularly get one’s inspiration from charismatic people or events in a creative environment like the web’s future - but it’s also very nice for networking. So – don’t forget to let some more entertainment into your everyday-lives instead of concentrating exclusively on the very crucial technological issues. And, certainly: Keep spreading your ideas.

Read (and write) more on TED Blog: mfb

Author: Marion Fugléwicz-Bren

Reblog this post [with Zemanta]

English