Archive

Posts Tagged ‘Paris’

Multimedia in the Web of Data - Annotating and Interlinking Photos, Music, Multimedia [WOD-PD]

October 23rd, 2008

The Web of Data Practitioners Days concluded with the session on Multimedia in the Web of Data, the first part of which was led by Ansgar Scherp (University of Koblenz-Landau, Germany).

Multimedia content, as Ansgar pointed out, is hardly annotated, badly organized, and hardly ever looked at again - just think of the 300 something pics you might take on an average week-end getaway, and which you never touch again. Annotating multimedia content requires a lot of work and dedication - but most of the time, these pictures eventually dissappear in the “digital shoe box” that is your photo management software.

The most obvious remedy is to annotate content as early as possible, ideally when creating the content, ideally already on your portable camera (formerly known as: mobile phone:) Ansgar suggested to provide incentives for people to encourage picture annotation - professionals could for instance receive a higher financial reward if the deliver already annotated pictures. And of course there are ‘Games with a purpose’ such as Google Image Labeler, where players tag images in pairs, with and against each other, and are rewarded with the entertainment factor of the game.

The slide below shows what has happened (or will happen) to the process of creating photo books in the digital age and the age of mashups:

Ansgar Scherp's slides

After all, this is the age of the social semantic web, so why not try and (re-)use the content, structure and contexts that other users have already created on the web? Content augmentation, for the scope that Ansgar is concerned with, consists in the reuse of content and structures (e.g. from sources such as Flickr and Wikipedia, Geonames) made possible through the definition of rules, e.g.:

  • If there are two or less pictures on a page*
  • then automatically augment the page with additional photos using location information.

* Page here means a page in the album you are currently working on - you probably took a picture of yourself and your friend in Paris, and even though you went to the Centre Pompidou, you forgot to actually take a pic of the building itself - well, let the web be your library!

So the goal is clear: develop a procedure for applying automatic content augmentation in the creation of good photo books.

But what makes a ‘good’ photo book anyway? Here are some of the results of a structural analysis of real, human-created photobooks conducted at CeWe Color:

  • % of photos with faces: 36%
  • Number of album pages: 16.96
  • Photos per page: 6.69
  • Text fields per page: 1.45
  • % of pages with text: 87%

There are many rules that can be established from the structural analysis, which can be applied in turn in the creation of photoboooks, e.g. rules like this one,

  • If the text located in the upper third of a page
  • if the font size is equal or larger that 16 points
  • if the number of words is less than 10
  • if there is no caption on the page that has a bigger font size
  • then this page is the title

Ansgar recommended xSmart, which he described as a “context-driven authoring tool for page-based multimedia presentations.”

Ansgar’s presentation was followed by two more: one by Yves Raimond on Interlinking Music on the Web of Data, and one on Interlinking Multimedia - in spite of better intentions, I did not manage to cover these two in detail, but at least I gathered the links to relevant resources from all three sessions…

Links for Ansgar Scherp’s session

Links for Yves Raimond’s session

Links for Michael Hausenblas’ session

  • InterlinkingMultimedia.info - a wiki dedicated to Interlinking multimedia (iM), “a light-weight bottom-up approach to interlink multimedia content on the Web of Data”.
  • Rammx - RDFa-deployed Multimedia Metadata
  • CaMiCatzee - multimedia interlinking concept demonstrator.

Last not least: Ansgar Scherp allowed us a sneak peek of SemaPlorer, a Large-scale Semantic Faceted Browsing Application for Multimedia Data that is going to be revealed on Dec 2, 2008, at the BOEMIE Bootstrapping Ontology Evolution with Multimedia Information Extraction) workshop in Koblenz. Here is an abstract:

Navigating large media repositories is a tedious task, because it requires frequent search for the `right’ keywords, as searching and browsing do not consider the semantics of multimedia data. To resolve this issue, we have developed the SemaPlorer application. SemaPlorer facilitates easy usage of Flickr data by allowing for faceted browsing taking into account semantic background knowledge harvested from sources such as DBpedia, GeoNames, WordNet and personal FOAF files. The inclusion of such background knowledge, however, puts a heavy load on the repository infrastructure that cannot be handled by off-the-shelf software. Therefore, we have developed SemaPlorer’s storage infrastructure based on Amazon’s Elastic Computing Cloud (EC2) and Simple Storage Service. We apply NetworkedGraphs as additional layer on top of EC2, performing as a large, federated data infrastructure for semantically heterogeneous data sources from within and outside of the cloud. Therefore, SemaPlorer is scalable with respect to the amount of distributed components working together as well as the number of triples managed overall.
Steffen Staab, Information Systems and Semantic Web (ISWeb), University of Koblenz-Landau, Germany

Thank you, thank you, thank you, it was a lovely event with an unusually high amount of processable input!

Reblog this post [with Zemanta]

English , , , , , , , , , , , , , , , , , , , , , , , ,

Release 3.1 - Now in Technology Preview

October 1st, 2008

Release 3.1 now in Technology Preview

Well, it’s been over two weeks since we released something cool (www.semanticproxy.com) – time to get cracking on some new stuff.

We’ve placed Release 3.1 of Calais into technology preview status. Just as a reminder, technology preview is a separate instance of Calais that allows developers to evaluate new features and test their software prior to our moving the release to production. You can access the Preview by simply pointing your tool to http://beta.opencalais.com rather than http://api.opencalais.com. Just like Calais, the preview version requires that you have a developer API key – your existing key will work just fine.

This will be a relatively extended Preview – most likely lasting throughout October 2008. We want to give everyone the opportunity to test some significant new features and make sure we have adequate time to respond to any issues you discover. That being said – please don’t wait until the last minute to give things a spin.

As you may have noticed, our releases are getting significantly larger and incorporating substantial new functionality on a monthly basis. Release 3.1 is no different – it contains everything from major new capabilities such as company and geography disambiguation to performance improvements to new output formats to some significant expansions of the types of information it can extract.So, in our tradition of lengthy blog postings – here’s an overview of what’s new in 3.1. I’ve broken this up into a few high-level focus areas. You can also visit the release notes right here.

New and Significant (at least to some of us)

Release 3.1’s big new functionality is disambiguation of company names and geographies. One of the big challenges of automated entity recognition is how to deal with ambiguity – for example “IBM”, “IBM Corp” “International Business Machines”. For the vast majority of use cases you want each of these variations resolved to a single entity called “IBM”. There are similar challenges around geography such as Calais, Maine vs. Calais, France.

For companies we’ve implemented a sophisticated disambiguation capability that is driven by a reference database of tens of million of company names and their variations. This database is primarily focused on public companies – but we’ll be expanding it to contain a broader range of companies in the future. In addition to variations on a company name, we also use hints that may exist in the text, such as location or industry, as additional evidence.

For geography we’re utilizing elements of DBPedia and other public data assets to dive in and figure out which Calais or Paris or wherever the text is really talking about. We base this disambiguation not just on the name itself – but hints in the surrounding text (for example longhorns are seldom discussed in the same article as Paris, France – but Paris, Texas is another story). To jumpstart mapping applications we also return the geo coordinates of the geography we’ve detected.

Efficiency and Scalability

We’ve implemented a couple of changes to make life easier for our higher-volume users. First, you now have the option to tell Calais you do not want a copy of the original text returned to you. If your application doesn’t care about offsets of detected items in the text you might consider turning this option on to reduce your bandwidth utilization.

Second, Calais now supports HTTP traffic compression. Given that we’re dealing with text on the input and output sides of the transaction, this can dramatically reduce the size of your transaction, again reducing your bandwidth utilization.

New Output Formats and Integrations

Please take a look at the Release Notes for details on a number of small changes to the RDF, MicroFormats and Simple format outputs. We’ve also added a JSON output format that’s covered in more detail here.

Calais now also talks PopFly! Microsoft’s PopFly is an interesting mashup building platform with a visual development interface. You can now directly integrate Calais within your PopFly mashups. Our documentation for this capability is available here.

Getting Smarter

In keeping with prior releases Calais is also getting smarter. We’ve added a number of new elements to the Calais vocabulary. These include PatentFiling, PatentIssuamce, FDAPhase, PersonEmailAddress, PersonEmployment, new elements for PersonAttributes, and SecondaryIssuance. In addition to these elements, we have one particularly interesting one: PersonRelation. The PersonRelation entity extracts references to symmetric relationships between people in the areas of business, friends, academic, military service or politics. This is one you’ll have to play with to get an idea of – but here’s a simple example:

The text:

The two served together in combat, and McDonald said Odierno was an "absolute joy to work with”.

Would result in:

Person1:  Mark McDonald
Person2:  Ray Odierno
PersonRelationType: Military Service

That’s it for R3.1. Any questions, please feel free to post to the forums or drop us a note at questions@opencalais.com. I’ll be posting an update on what’s in the pipeline for R4 in the next few days – lots of interesting stuff is on the way.

English , , , , , , , , , , , , ,

Release 3 Technology Preview Now Available!

August 18th, 2008

Calais R3 Technology Preview Now Available

Release 3 of Calais is now available for testing. Given the enormous increase in the number of production users of Calais we have modified our release process to incorporate a Technology Preview of new releases to allow for testing and experimentation. Of course, the
production Calais service remains up and fully functional during the R3
Technology Preview period.

The details on accessing the technology preview are located here

This is a long post, so I’ll highlight the significant changes right here:

  • Many new entities and events
  • A REST interface to the Calais web service
  • Document level categorization into standard news categories
  • Exhaustive extraction
  • A variety of miscellaneous bug fixes
  • Higher performance

Some details on R3….

What’s in R3?
First, as with every release we are expanding and enhancing the universe of entities and relationships extracted by Calais. While the details are located in the R3 Forum – a few highlights:

  • New entities include Sports League, Programming Language, Operating System, Medical Treatment and Company Ticker
  • New events include Movies Releases, Album Releases and a variety of
    business related items such as Bonus Shares Issuances, Types of
    Business Relationship and others.

Second, after many requests we have implemented a REST interface to Calais. This should simplify access to the service from a variety of environments.

Third, a preview of our new document categorization capability.
Categorization examines your text and attempts to place the document as
a whole into one of a number of news related categories. This
capability will be significantly expanded in the future – but will
provide immediate benefit to anyone aggregating news content today. The
initial categories supported are Business, Sports, Entertainment,
Health, Politics and Technology.

Fourth, depending on what you’re using Calais for this could be a big deal. In R3 we’re releasing a Generic Relations capability. Generic Relations will expose all
relationships in your document as long as one of the members of the
relationship is a known entity type. Generic Relations is sometimes
called Exhaustive Extraction – extracting all the relationships that
involve at least one entity, even if the relationship type hasn’t been
predefined. This capability is designed for semantic processing experts
who know what they are doing. The volume of output can be quite large –
but the ability to do in-depth information discovery is enormous.

And finally, we’ve done our best to solve any extraction related
issues that have been reported to us. We can’t promise 100% - but you
should see significant improvement.

What’s Coming?
Let’s limit ourselves to the very short term – things you can expect to see in the next month or less.

  • Company Disambiguation. This is a big deal and the first step
    toward richer entity disambiguation throughout Calais. With company
    disambiguation we will use everything from the name of the company to
    the names of people to the geographies mentioned to return a single
    authoritative name for the company. A simple example: “IBM”,
    “International Business Machines”, “IBM Professional Services” will all
    be detected as companies – and will all be linked back to a single
    definitive reference for “IBM”.
  • Geo Disambiguation. The same effort as applied to geographies. No
    longer will we be confused whether we talking about Paris, TX or Paris,
    France.
  • A super secret skunkworks project. Just think of it as putting a semantic layer on top of the web. The whole web. Right now.

 

 

English , , , , , , , , ,

Press release: “NUI Galway and Tourist Republic to Make Travelling Tailor-made”

June 24th, 2008

http://www.nuigalway.ie/news/main_press.php?p_id=760


TripPlanr will be aimed at the more adventurous traveller who wants more than a weekend for two in one of Paris’s main hotels

NUI Galway’s Digital Enterprise Research Institute (DERI) is to develop a new intelligent trip planner in collaboration with Irish start-up Tourist Republic Ltd. The internet tool, TripPlanr, will allow travellers to plan more complex trips than existing technology allows, such as combining multiple destinations on a fixed budget and timeline. The cost of this initiative is €200,000 and has received support funding under Enterprise Ireland’s Innovation Partnership programme.

TripPlanr will be aimed at the more adventurous traveller who wants more than a weekend for two in one of Paris’s main hotels. The technology will combine Touristr.com’s traveller recommendations with information from airlines and accommodation providers, suggesting the most perfectly-attuned trip possible.

Jan Blanchard is CEO of Tourist Republic and sees huge benefits in the partnership: “We knew that to build the intelligent trip planner which we have in mind, we needed a team to rival the in-house expertise at Google or Yahoo! Through Enterprise Ireland we have this opportunity to bring our vision to reality with DERI, which is the largest Semantic Web research institute in the world”.

DERI’s specialised expertise in Information Mining, the Semantic Web and Web 2.0 applications will allow TripPlanr to filter data and make recommendations based on the preferences of the traveller and their social network. Building on Touristr.com’s existing destination review site, the new solution is expected to increase the probability of the traveller booking the targeted option suggested.

According to Dr. John Breslin, Project Leader with DERI at NUI Galway, and founder of the popular online forum boards.ie, “The pre-internet problem of information deficit has been replaced with the problem of information overload. We are faced with an overwhelming surfeit of similarly sounding destination descriptions and offers. We hope to make online trip planning much more personalised by enabling networked knowledge using the latest technologies developed here at DERI.”

The TripPlanr project has a skilled team in place to research and develop the application, and the project is currently recruiting for web developers to join this exciting work. TripPlanr is expected to be in beta testing by the end of the year.

-ends-

English , , , , , , , , , , , , , , ,

20:20 talk on hardware hacking for software people

May 19th, 2007

I just got back from XTech 2007 in Paris. It was an excellent conference this year and I'm really proud of having contributed in a small way by being on the programme committee. Every year the speaker lineup gets better and better.

The theme this year was 'The Ubiquitous Web'. HTTP isn't just for computers any more, and I'm particularly interested in how developers like me can learn to make their own network-connected objects in the real world. To spread the word, I gave a lightning talk on my experiences with the Arduino hardware hacking boards and other toys from tinker.it.

I put the slides on SlideShare.

Permalink

Uncategorized , ,

Serendipity 2.0: going fulltime on Dopplr

April 27th, 2007

For the last couple of months I've been working on a new project in my spare time. Dopplr is a social network for frequent travellers, designed to increase the amount of serendipity in the world. It lets you share your travel plans with your trusted fellow travellers, and uses them to find the coincidences, near-misses and surprises. Maps, mobile, timelines, feeds, calendars: you can have the information pretty much any way you want it.

Dopplr's still invite only, but there's a good chance you know someone with an account by now. We'll be issuing new invite tokens from time to time, so keep an eye out. There are some screenshots on Flickr, and alpha travellers Stowe Boyd and Roo Reynolds have written some illuminating reviews. I'll be at XTech in Paris in May (don't forget, online registration closes soon) so track me down and I'll give you a demo.

I'm having a great time making something of my own and collaborating with people whose skills and opinions I trust and respect. I showed the alpha release around ETech and SXSW and got some great reactions. We started inviting people in to test the app, a few at a time, and their feedback has been very encouraging.

Because I'm having so much fun and I want Dopplr to be as good as it can possibly be, I've taken the decision to suspend my freelancing and work on it full time. It seems they'll let anyone be a CTO these days.

If you want to follow our day-to-day progress, I'm collecting dopplr-related links and coverage on del.icio.us.

Permalink

Uncategorized , , , , ,

semantic web is webby data

March 17th, 2007

I often been puzzled why people write “The Semantic Web is AI” and “The Semantic Web is a top-down design” and “The Semantic Web is Ontologies”. As far as I’m concerned, all of those are bogus. I think I’ve worked out why they write this - they aren’t talking to anyone actually working directly on the technologies.

The semantic web is: a webby way to link data. That is all.

Everything beyond that is entirely optional fluff: data vs metadata, syntaxes, ontologies, query languages, rules, logic, …

This is my “lowercase” semantic web and the basis of what I have in running production code right now.

I’ll probably use that as my theme when I speak about A Little Semantics Can Go a Long Way on the panel at the Semantic Technology Conference in San Jose in May. ( I’ll also be at WWW2007 in Banff, Canada and XTech 2007 in Paris. )

Uncategorized , , , , , ,

XTech 2007 in Paris: get your proposal in this weekend

December 14th, 2006

The call for proposals for XTech 2007 is closing this weekend. Last year's conference was superb, and if you've got anything to say about making the web then you'll definitely want to be part of next year's lineup.

The theme for this year's conference is "The Ubiquitous Web". As the web reaches further into our lives, we will consider the increasing ubiquity of connectivity, what it means for real world objects to connect to the web, and the increasing blurring of the lines between virtual worlds and our own.

The technologies underpinning these developments include mobile devices, RFID, ultra-wideband, Second Life, location-aware services, Google Earth and more. The issues surrounding them include privacy, intellectual property, activism, politics, regulation and standards.

A special mention for some of the talks I particularly liked last year:

Permalink

Uncategorized , , , , , , ,

Neutrality of the Net

May 2nd, 2006

Net Neutrality is an international issue. In some countries it is addressed better than others. (In France, for example, I understand that the layers are separated, and my colleague in Paris attributes getting 24Mb/s net, a phone with free international dialing and digital TV for 30euros/month to the resulting competition.) In the US, there have been threats to the concept, and a wide discussion about what to do. That is why, though I have written and spoken on this many times, I blog about it now.

Twenty-seven years ago, the inventors of the Internet[1] designed an architecture[2] which was simple and general. Any computer could send a packet to any other computer. The network did not look inside packets. It is the cleanness of that design, and the strict independence of the layers, which allowed the Internet to grow and be useful. It allowed the hardware and transmission technology supporting the Internet to evolve through a thousandfold increase in speed, yet still run the same applications. It allowed new Internet applications to be introduced and to evolve independently.

When, seventeen years ago, I designed the Web, I did not have to ask anyone's permission. [3]. The new application rolled out over the existing Internet without modifying it. I tried then, and many people still work very hard still, to make the Web technology, in turn, a universal, neutral, platform. It must not discriminate against particular hardware, software, underlying network, language, culture, disability, or against particular types of data.

Anyone can build a new application on the Web, without asking me, or Vint Cerf, or their ISP, or their cable company, or their operating system provider, or their government, or their hardware vendor.

It is of the utmost importance that, if I connect to the Internet, and you connect to the Internet, that we can then run any Internet application we want, without discrimination as to who we are or what we are doing. We pay for connection to the Net as though it were a cloud which magically delivers our packets. We may pay for a higher or a lower quality of service. We may pay for a service which has the characteristics of being good for video, or quality audio. But we each pay to connect to the Net, but no one can pay for exclusive access to me.

When I was a child, I was impressed by the fact that the installation fee for a telephone was everywhere the same in the UK, whether you lived in a city or on a mountain, just as the same stamp would get a letter to either place.

To actually design legislation which allows creative interconnections between different service providers, but ensures neutrality of the Net as a whole may be a difficult task. It is a very important one. The US should do it now, and, if it turns out to be the only way, be as draconian as to require financial isolation between IP providers and businesses in other layers.

The Internet is increasingly becoming the dominant medium binding us. The neutral communications medium is essential to our society. It is the basis of a fair competitive market economy. It is the basis of democracy, by which a community should decide what to do. It is the basis of science, by which humankind should decide what is true.

Let us protect the neutrality of the net.


  1. Vint Cerf, Bob Kahn and colleagues
  2. TCP and IP
  3. I did have to ask for port 80 for HTTP

Uncategorized , , , , , , , , , , , , , ,