Archive

Archive for April, 2008

Nova Spivack: Entender la web semántica

April 30th, 2008
Traducción de la presentación de Spivack

delicious ,

Nova Spivack: Entender la web semántica

April 30th, 2008
Traducción de la presentación de Spivack

delicious ,

If Popularity is the Only Tool in Web Search, Every Result Looks Like a Nail!

April 30th, 2008
Lo realmente atrayente de la web semántica es que puede ser capaz de acabar con la cultura de masas.

delicious

If Popularity is the Only Tool in Web Search, Every Result Looks Like a Nail!

April 30th, 2008
Lo realmente atrayente de la web semántica es que puede ser capaz de acabar con la cultura de masas.

delicious

If Popularity is the Only Tool in Web Search, Every Result Looks Like a Nail!

April 30th, 2008
Lo realmente atrayente de la web semántica es que puede ser capaz de acabar con la cultura de masas.

delicious

Why I Migrated Over to Twine (And Other Social Services Bit the Dust)

April 29th, 2008
Introducción a Twine, el juguetito nuevo que estoy probando

delicious

Why I Migrated Over to Twine (And Other Social Services Bit the Dust)

April 29th, 2008
Introducción a Twine, el juguetito nuevo que estoy probando

delicious

10 herramientas semánticas

April 29th, 2008

10 herramientas semánticas

April 29th, 2008

Slides from the SIOC tutorial at WWW2008

April 28th, 2008

Here are the PowerPoint slides from our tutorial on “Interlinking Online Communities and Enriching Social Software with the Semantic Web” at the World Wide Web Conference in Beijing - you can also download them from here:

The tutorial went well, it was hot in the room and we were a bit jetlagged, but we had some good feedback afterwards and about 30 people attended in all.

I had a nice few days in Beijing, participating in the W3C advisory commitee meeting on Sunday, Monday and Tuesday, giving our SIOC tutorial with Alex and Uldis on Monday afternoon, popping along to our paper at the Linked Data on the Web workshop on Tuesday, attending some sessions on Wednesday (Kai-Fu Lee’s plenary keynote on Cloud Computing, the discussion panel with Lada Adamic et al. on the Future of Online Social Interactions, the W3C Open Your Data! track, and a packed session on Social Networks: Discovery and Evolution of Communities). On Thursday, I gave a talk about DERI at Tsinghua University to Cemon Yang and his team at the Digital Government / Web and Software Research Centre. Thursday evening we had the banquet in the Great Hall of the People, and I headed back to Ireland on Friday.

Unfortunately I saw little of Beijing outside of travelling between venues in taxis and buses, so I have a good reason to return and see / do more next time…

English , , , , , , , , , , ,

Introduction To The Semantic Web ‽ SlideShare

April 27th, 2008
extensa presentación acerca de la web semántica, RDF

delicious ,

Wikipedia de tapa dura

April 27th, 2008

Bertelsmann ha anunciado que comercializará en septiembre, por primera vez en el mundo, una edición en papel de la versión alemana de la enciclopedia Wikipedia.

En la edición de papel irán 50.000 entradas de la versión alemana de la Wikipedia (que cuenta con 700.000 en su versión online). Saldrá a la venta por 19.95 euros. De esa cantidad, un euro irá destinado a la asociación que, sin ánimo de lucro que gestiona la Wikipedia alemana.

Los artículos publicados serán los más consultados en la Red entre 2007 y 2008, aunque el libro estará más enfocado en la actualidad que una enciclopedia de papel tradicional.

Lo que no han dicho es como se va a gestionar el hecho de que vayan a obtener ingresos … si es que venden alguna … gracias al trabajo altruista y desinteresado de miles de editores de todo el mundo, entre los que me incluyo.

Tampoco daría yo ni un euro por una edición en papel, sin actualizaciones automáticas, que quedaría tan obsoleta como el resto de enciclopedias que aún se empeñan en vendernos. ¿Es Wikipedia o Bertelsmann quien no se ha adaptado a los tiempos?. A ver si todos tenemos un malentendido y resulta que ahora la Web 3.0, la Web Semántica pasa por pasarse por el arco del triunfo las licencias CC y GNU y vender lo que otros regalan … en papel.

Fuente | El País


Spanish

Explaining the Semantic Web - Google Docs

April 27th, 2008
Excelente presentación sobre el tema, bloguearé sobre ella en español

delicious

If Popularity is the Only Tool in Web Search, Every Result Looks Like a Nail!

April 24th, 2008

Popular votes determine the leaders of our societies as a fair practice of equality and human rights. Popular votes among expert physicians can produce better diagnostics of a medical condition. There are many other cases where popularity, as a method, works and functions well.

But, how well does popularity work for the Web search?

Like the saying “if the only tool is a hammer, everything looks like a nail” popularity ranking is the only perspective available out there, thus every result looks like a nail! There is nothing else to compare.

A closer look, however, shows that popularity can sometimes fail miserably without the need for any comparison. Worse than that, it fails in a hidden way at a much higher rate than what we want to believe. I would like to talk about these issues in this post, and explain why we are doing what we are doing at hakia.

Let’s first look at the validity of popular view. Below are some fun examples of how popular view fails. Obviously, this has no impact on our lives other than making us look stupid. There are so many of them that you can find books published in this area, but let’s just list 3 of them.

    Contrary to popular belief:
    - Nowhere in the Bible is the fruit eaten by Adam and Eve referred to as an apple
    - Thomas Edison did not invent the light bulb
    - Seasons are not caused by Earth being closer to the sun in summer than in winter.

Fun right? The next category is not fun at all. The history is full of cases in medicine, science, technology, politics, etc., where popular belief produced deadly consequences.

    Contrary to popular belief (in history):
    - More than 40,000 women executed in Medieval Europe were not witches.
    - The Titanic could sink despite its unsinkable reputation.
    - To imitate the Marlboro man was not cool at all.
    - Lead piping and asbestos were harmful building materials.
    - HIV could spread among heterosexuals.
    - There were no WMDs in Iraq to pose an immediate danger.

The lesson to be learned from all these examples is that popular view can be wrong. While some of the misconceptions may be innocent due to lack of scientific data, most of them are byproduct of information manipulation for commercial or political benefit. A careful eye will catch millions of such manipulations still at work today.

Now, let’s switch back to Web search. If search results are ranked and organized by a popularity algorithm, which reflects peoples’ choices as to what is right and relevant, how can you trust this view for important queries in health, finance, law, business, etc.? Can some of these results be commercially-biased, politically-biased, or innocently incorrect? Are there better results that you don’t see? We call this the hidden failure.

This is where we separate ourselves from the current wave of popularity algorithms. The semantic algorithms at hakia are not based on collecting statistics on link referrals, click behavior, or on any other similar measure. Our criteria is quality, which is defined as the combination of credibility, freshness, and relevance by meaning match. It is our vision that this new perspective will benefit the Web searcher by minimizing (if not eliminating) the extra burden of quality assurance. There will be no hidden failures by design.

Note that we haven’t even touched the subject of long-tail which is an inherent technical limitation of popularity algorithms. Long-tail limitations directly contribute to the hidden failure.

For the Web searcher, the only tool is no longer just popularity, and we are working hard to bring quality search as an alternative perspective. Until then, keep challenging hakia BETA to benefit from the ongoing progress and to give us feedback.

English , , , , , , , , , , , ,

WWW2008 Beijing: Dr. Kai-Fu Lee (Google) - “Cloud Computing”

April 23rd, 2008

Kai-Fu Lee is Vice President of Engineering at Google, and President of Google Greater China. He joined Google in 2005, and developed the first speaker-independent continuous speaker recognition system, for which he won a Business Week award in 1988.

He started by talking about the “people theme”, saying that this is what the (Chinese) Internet is all about. (For April Fool’s Day, Google China announced that they were going to shut down their servers to save electricity, and that they would have to hire 25 million people to do their searches for them. They got 1,800 resumes for the positions.)

There are 235 million people on the Internet in China. What do these people want? Kai-Fu listed these things: accessibility, shareability, freedom (data wherever they are), simplicity, and security. Google believes that cloud computing solves a lot of these problems. It’s not new, so Google are just a part of it like we all are. But day by day, cloud computing is changing the way we use the Internet.

He then explained a little bit about what the Cloud is. Data is stored in the Cloud, on some server somewhere that is not necessarily known by the user, but it’s just there and accessible. Software and services are also moving to the Cloud, usually accessible via a full-featured web browser on the client device. He also advocated the use of open standards and protocols, which he says are “liked” by Google (e.g. Linux, AJAX, LAMP, etc.) so as to avoid control by one company. Finally, the Cloud should be accessible from any device, especially from phones. He said that when the Apple iPhone hit the market, they found that web usage from that device was 50 times greater than that from other web-capable phones, and that Google’s servers really felt it.

Next up was a history lesson on cloud computing. The PC era was hardware centric. Then, the client-server era was more software centric, which was great for enterprise computing. Cloud computing now abstracts that server and makes it very scalable, by hiding complexities, and with the server being anywhere. This is service centric.

Banks too have become “Clouds”, allowing people to go to any ATM and remove money from their bank wherever they are. Electricity can be thought of similarly, as it can come from various places, and you don’t have to know where it comes from: it just works.

Driving forces behind cloud-based computing include: (i) the falling cost of storage, (ii) ubiquitous broadband, and (iii) the democratisation of the tools of production. This is beginning to make cloud-based computing more like a utility. A lot of this is due to IBM and DEC’s work in the 1990s, who realised that computing should be a utility. It is only now that these three key things are in place that this becoming a reality.

There are six further properties that make this area exciting, being: (1) user centric, (2) task centric, (3) powerful, (4) accessible, (5) intelligent, (6) programmable.

(1) User centric. The data moves with you, and the application moves with you. People don’t want to reload their address book or applications on new machines, as it is painful to do. For example, how bad do you feel if you drop or break your laptop? How easy is it to switch your cellphone? It’s hard, because synchronising your data is usually hard to do. The IR functionality on a mobile is not easy to use / user centric: how often do people use it to backup stuff to their laptops?

If data is all stored in the Cloud - images, messages, whatever - once you’re connected to the Cloud, any new PC or mobile device that can access your data becomes yours. Not only is the data yours, but you can share it with others (e.g. on Picasa Web, your photos are stored in the Cloud). You don’t have to worry about where it is. We’re not there just yet, but the time is approaching where the way we deal with photographs will change. Another example is GMail, as you can use it on any device (since large storage is not required on the device). Kai-Fu bets that everyone in the room has some kind of cloud computing-based e-mail.

PCs are normally our window to the world, but mobile devices can do more. Since services know who you are and where you are (eek!), they can give you more targetted content. There are 600 million cellphone users in China, three billion worldwide, dwarfing the number of PCs that are Internet-accessible. Intelligent mobile search is useful for cellphones, giving you local listings and results relevant to your context. The most powerful and popular application is maps, especially when people get lost, or if they spontaneously want to go somewhere. Maps are more than the traditional flat piece of paper, allowing you to search nearby, see real-time traffic flows, etc. Such mashups provide even more power - calling these integrations a map is a misnomer - the capabilities are enormous. As there’s a move from e-mail usage towards maps and photos, these new applications have to go into the Cloud as well. And with the shift in this direction, another question is how do you make this economic?

Instant information sharing is also important, e.g. via Google Docs, Page Creator, etc. Recently, Google Sites was released - Google hosts it all for you, so there’s no need for you to buy servers or hosting - 50,000 sites were set up in the first few hours after it began. Not only can you access the data, but you can create it anywhere. The browser is the platform.

(2) Task centric. The applications of the past - spreadsheets, e-mail, calendar - are becoming modules, and can be composed and laid out in a task-specific manner. For example, a task may be teachers creating a departmental curriculum, where you can see the people viewing the curriculum spreadsheet and they can have debates in parallel in real time. Spreadsheet editing allows collaboration and publishing to a selected group of people, with version control.

Google considers communication to be a task, such that in GMail you see pop-up chats and chat histories which provide zero-latency discussions combined in communications tasks. If you want, you can have real-time discussions instead of waiting for e-mail responses if people are online in the contacts list. You can also organise all of your common tasks, e.g. using iGoogle’s widgets portal.

(3) Powerful. Having lots of computers in the Cloud means that it can do things that your PC cannot do. For example, Google Search is faster than searching in Windows or Outlook or Word. Of course, Google Search has to be be much faster, even though there are many more documents. In terms of how much storage is required, if there are 100 billion pages at 10 kB per page, that’s about 1000 TB of disk space. Cloud computing should have an infinite amount of disks / computation at its disposal. When you issue a query to the Google web search engine, it queries at least 1000 machines (potentially accessing 1000s of terabytes).

(4) Accessible. Universal search (”searchology”) was announced by Google last year. Traditional web page search does IR / TF-IDF / page rank stuff pretty well on the Web at large, but if you want to do a specific type of search, for restaurants, images, etc., web search isn’t necessarily the best option. It’s difficult for most people to get to the right vertical search page in the first place, since they usually can’t remember where to go. Universal search is basically a single search that will access all of these vertical searches.

This search requires simultaneously querying and searching over all the specific databases: news, images, videos, tens of such sources today, with potentially hundreds and thousands of them in the future. There are lots of these simultaneous searches which then get ranked, so it is even more computing intensive than current web search.

(5) Intelligent. Data mining and massive data analysis are required to give some intelligence to the masses of data available (massive data storage + massive data analysis = Google Intelligence).

In their machine translation work for translate.google.com, a trillion words were collected from bilingual and monolingual text, and they wanted to not only find various orders of words but also the mappings of words. Statistical models of translation were trained, and they saw how an English-Chinese pair could be aligned. Then, they needed to extract phrases and collect statistics (e.g. how often variations of a certain translation were being used, such as variations for latest / last / newest / most recent). As more training data is added, the quality improves. Context is also an important matter for consideration, and it provides an advantage for the phrase analysis part of Google’s translators. There are estimates that their translator is equivalent to a high-school student’s level of translator quality.

Lots of data can be processed by machine analysis to generate intelligence. But this needs to be combined with humans - via their collaboration and contributions - to change a mess / mass of photos or data or whatever into a very powerful combination. People and tools together can create intelligent knowledge. Applications like Google Earth are much more useful when people can contribute to them, e.g. by National Geographic sticking loads of high-res photos into it. Reviews, 3-D buildings, etc. can turn a tool from a bunch of pictures into something special. Creativity adds connections to data-centric applications, enabling intelligent combinations of content.

With all this data comes the issue of server costs. If you are trying to choose between buying $42,000 high-end servers or cheap PC-class servers for $2,500 each, you can get 33 times cost efficiency by going for the PC-class servers. You can get a 1000 CPU PC-class cluster for the same price as a high-end 64 CPU server, with possibly 30 times the performance (figures may be out of date).

Even though there is a lower cost, there still needs to be high reliability. Google search is mainly based on low-cost commodity PCs running Linux. Failures are expected in every system every day. If we assume that there are 20,000 machines, there’s typically a failure rate of 110 per day. Google has built a custom software layer that can tolerate failure. (They have also deployed a new data centre in just three days.)

(6) Programmable. This follows on from the previous description of data requirements. How does one program for 10,000 “flaky servers” in a Google farm? There needs to be: (i) fault tolerance, (ii) distributed shared memory (if storing every web page in yahoo.com, no one machine can store that, so multiples are required), and (iii) new programming paradigms required for storing stuff.

For (i) fault tolerance, Google uses GFS or distributed disk storage. Every piece of data is replicated three times. If one machine dies, a master redistributes the data to a new server. There are around 200 clusters (some with over 5 PB of disk space on 500 machines).

The “Big Table” is used for (ii) distributed memory. The largest cells in the Big Table are 700 TB, spread over 2000 machines.

MapReduce is the solution for (iii) new programming paradigms. It cuts a trillion records into a thousand parts on a thousand machines. Each machine will then load a billion records and will run the same program over these records, and then the results are recombined. While in 2005, there were some 72,000 jobs being run on MapReduce, in 2007, there were two million jobs (use seems to be increasing exponentially). Not everything is suitable for MapReduce, e.g. parallelising SVMs. Matrix operations can’t be split and re-glued together easily. For this, they use Incomplete Cholesky Factorisation.

Cloud computing needs new skills, especially when working with tens of thousands of machines as opposed to just one. The Academic Cloud Computing Initiative in the US and China (at Tsinghua) was launched by Google and IBM. Cloud computing is not just for web-based problems, but it can help provide solutions for scientific problems that were previously very hard to solve.

In terms of benefits, everything should just work, changing the way we work and play. IT should become “simple and safe”, by outsourcing IT to a “trusted shop” via a browser. Entrepreneurs should have new opportunities with this paradigm shift, being freed from monopoly-dominated markets as more cloud-based companies evolve that are powered by open technologies. Governments should leverage such “innovation-enabling platforms”, where people can effectively program tens of thousands of machines themselves. With $540 million of venture capital infused into China last year, Kai-Fu sees cloud-based computing as being a catalyst of economic growth. He finished up saying that cloud computing has arrived. “Embrace the Cloud!”

There was one question from the audience. The questioner said that Kai-Fu made cloud computing sound simple (i.e., it was well explained, not that the techologies or efforts were trivial). He asked what is the societal change rather than the technological change? Assume we have cloud-based computing, how we can start to encourage “cloud thinking” within society? The questioner works with universities looking at open access, trying to encourage people to share their intellectual outputs, but believes it is difficult to persuade knowledge workers to move their work into the Cloud. His question was, what can we do encourage cloud thinking and “cloud knowledge”?

Kai-Fu’s answer was firstly that cloud computing is not simple, rather it is incredibly complex, but we can learn from what has happened so far. There have been efforts to categorise world knowledge, e.g. Cycorp, which Kai-Fu said has not resulted in a success yet (however, I’ll note here that they are becoming part of the Linked Data initiative: as Kingsley Idehen said yesterday, “Yoda is awake”!). There has been some success in various question and answering systems with pieces of knowledge that can be mined and found. He stated that these were the two extremes, but believes that the answer lies somewhere in the middle: some organisation, but not too much. Wikipedia is a step in this direction, so he suggested bringing the question and answering approach and the Wikipedia approach closer together.

He said that two things would be required. Firstly, he saw the need for some kind of translation capability. There is so much knowledge in English, which spoils native English speakers. In China, people are also spoiled. However, for many other countries, there is very little local language content. If auto translation doesn’t work well, some kind of assisted translation is required. Secondly, there should be mobile endeavours to make knowledge available. There may also need to be some economic incentive for people to create and share content via their mobiles.

(More reviews at 1, 2 and 3.)

English , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

De la Web 2.0 a la Web 3.0

April 22nd, 2008

Hace un año no estaba claro si la Web 2.0 sería realmente 2.0 o solo una nueva burbuja de la era de Internet. Además, aún se hablaba poco de la Web 3.0. Pero no, llevamos muchos meses viendo como surgen interesantísimas aplicaciones Web 2.0, algunas realmente imprescindibles y que poco a poco van consiguiendo la pretendida quimera de no necesitar aplicaciones de escritorio.

El paso siguiente, la Web Semántica, debería permitirnos encontrar información relevante mucho mas rápido que con cualquiera de los sistemas anteriores, jerarquizando y etiquetando la Web de manera que cada vez se haga mas fácil encontrar información relevante en la red en vez de bucear en los navegadores para descubrir algo que se asemeje a lo que necesitamos.

Además, debemos ver la Web 3.0 como un entorno permanentemente conectado y colaborativo en el que haya aplicaciones que nos permitan localizar y estar en comunicación con nuestros amigos y colaboradores, permitiendo que la red sea un entorno cercano y accesible.

Las redes sociales están dirigiéndose cada vez mas hacia la integración de aplicaciones gracias a nuevos estándares como RDF. Tampoco debemos perder la pista a prototipos que pronto servirán de base para nuevos estándares, aplicaciones como twine. Y tampoco debemos dejar de seguir el desarrollo del proyecto noovo.

¡La Nueva Web está llegando!


Spanish

WWW2008, a pity

April 21st, 2008

This week Beijing (China) hosts the WWW2008 Conference, probably the most important conference about the Web. It’s a pity not to be there, because there are many interesting things:

Well, next year is closer, so we’ll try to be there.

English, Spanish , ,

hakia.com is a Webware 100 winner!

April 21st, 2008
webware.pngWe are happy to announce that hakia.com received a 2008 Webware 100 award for “Search and Reference” by Webware, a CNET site. We thank you for your support and votes!

The 2008 Webware 100 awards recognize the best Web 2.0. sites, services and applications. The Web 2.0 user community cast nearly two million votes in an online voting poll which ultimately selected the winners. Finalists for the 2008 Webware 100 Awards were selected by the editors of Webware.

We are proud to be the only New York-based search company and the sole flag bearer of semantic search in this short list. Bear in mind that we are continuously working on our technology to make the hakia products better.

By this award, an important question may have been answered: Is “semantic search” a reality now?

Webware 100 voters think so. We say “it is only the beginning.”

English , , , ,

Proposed W3C Activity for Video on the Web

April 20th, 2008

W3C organized a workshop on Video on the Web in December 2007 in order to share current experiences and examine the technologies (see report). Online video content and demand is increasing rapidly, becoming omnipresent on the Web and the trend will continue for at least a few years. These rapid changes are posing challenges to the underlying technologies and standards that support the platform-independent creation, authoring, encoding/decoding, and description of video. To ensure the success of video as a “first class citizen” of the Web, the community needs to build a solid architectural foundation that enables people to create, navigate, search, and distribute video, and to manage digital rights.

The general scope of the proposed Video on the Web activity is to provide cohesion in the video related activities of W3C, as well helping other W3C Groups in their effort to provide video functionalities. In addition, this activity will focus at implementing the next steps from the W3C workshop on Video on the Web. The proposal is to create 3 new Working Groups around Video on the Web. Please, have a look at the following documents:

  1. Activity proposal
  2. Media Fragments Working Group Charter
  3. Media Best Practices and Guidelines Working Group Charter
  4. Media Annotations Working Group Charter

We welcome general feedback, general expressions of interest (or lack of!) and comments on the discussion list public-video-comments@w3.org.

Philippe Le Hégaret will be presenting the activity proposal during the Web Conference this week, on Thursday afternoon.

English , , ,

Really cool SIOC widget from Sindice (for WordPress)

April 20th, 2008

I’ve installed the new Sindice SIOC widget, produced by Adam, Fabio and Giovanni from the Sindice team.

As you can see, if you look at the post author or click into any comments list, each user now has a speech bubble beside the username. Clicking on this bubble will show you posts, comments and topics created by that user across the “SIOC-o-sphere”.

20080411b.png

You can also click on any arrow icon beside a link in a blog post to see where else it has been referenced, like this one.

There is a Sindice SIOC API available which serves as a gateway to SIOC data via the Sindice discovery and search services, enabling the verification of the presence of a user or a link on the SIOC-o-sphere as indexed within Sindice.

English , , , ,

hakia.com Hosts the NY Semantic Web Meetup

April 18th, 2008

Last night we hosted the April meeting of the NY Semantic Web Meetup. The presenters were Richard Cyganiak from DERI and Dr. Christian Hempelmann, our Chief Scientific Officer.


Richard gave an introduction to the future of open linked data on the World Wide Web. Christian presented the “Search for Meaning” the hakia way and gave a quick intro into our OntoSem technology and what its license package includes. The Semantic Web v.s. Semantic Search discussions were heated and lively. I expected nothing less.


As the pictures show, our office was packed. Some people wanted to see the tomatoes we grow in our 33rd floor terrace, but it is unfortunately not the right season.

Thank you all for coming! We would also like to thank Marco Neuman, the organizer of this dynamic Meetup for building the “semantic community” in New York.

English , , , , , ,

Semántica emergente: principios básicos para el tratamiento del conocimiento en comunidades Web

April 17th, 2008
Semántica emergente: principios básicos para el tratamiento del conocimiento en comunidades Web Estos últimos días he estado investigando sobre varios problemas de inteligencia colectiva relacionados con nuestro proyecto, y he recordado algunos principios sobre semántica emergente que se enunciaban en el artículo Emergent Semantics Principles and Issues: Princ...

Spanish

Semantic Technology Happy Hour, Round 1

April 16th, 2008

Powerset, Metaweb, and Radar Networks share many similarities: we’re semantic technology companies, we’re building bleeding edge products, we employ a small army of PhDs, and our offices are within a couple blocks of each other in the SOMA district of San Francisco. With such geographic and conceptual proximity, Powerset’s co-founder and CTO Barney Pell suggested that we start referring to the neighborhood as SEMA, or the Semantic Technology Area.

For an inaugural SEMA event, Powerset decided to host the first Semantic Technology Happy Hour last night. In addition to sizable contingents from the core Trinity, this Happy Hour included guest representatives from BooRah and TrueKnowledge, who will both be on a panel with Powerset at next week’s AltSearchEngines Get Together. A few members of the press also were present. In particular, Dan Farber wrote a great article on his blog about the event.

Semantic Web Happy Hour

Overall, the format was casual and focused on building personal relationships among the represented companies. As you can imagine in a group with a high median IQ, conversations tended to be spectacularly geeky. TrueKnowledge gave us a peek under the hood of its beta and Powerset demonstrated our integration to Freebase. At one point, a group stood in front of the computer and threw out queries to see all of the different Freebase types that Powerset can handle. After being lubricated by a drink or two, inter-company gaming commenced, featuring pool, foosball, and ping pong. A few of the hardcore folks from Powerset and Metaweb ended up at the Hotel Utah for after hours merrymaking.

In the spirit of building the semantic technology community, there will likely be a follow-up event in the next few months. If you’re a semantic technology company, especially in SEMA, contact me (mark AT powerset.com) if you’d like to be included in the next iteration of the Semantic Technology Happy Hour.

English , , , , , , ,

The Good, the Bad, and the Ugly side of Web Search

April 16th, 2008

Our recent BETA update stirred interesting conversation on the blogsphere along with questions and scepticism. We welcome all comments and take the feedback from the users to improve our search engine. I want to take this conversation one step further.

When you are searching for any topic on the Web, no matter what, you will find yourself in between the worlds of different views. There will be good and bad things about it written by opposing views, a part of which will have a commercial interest.

When the good and bad are clear, there is no problem. For example, burning coal for energy is good because it is cheap, but it is also bad because it pollutes the environment. The distinction is as clear as it gets.

When the good and bad are not clear, there is uncertainty which is the ugly. What makes uncertainty ugly is the commercial motivation to interpret it one way or another. In medicine, law, finance, business, politics, and many other topics, uncertainty “the ugly” has been the fertilizer to grow cash and personal gain out of misleading. Some success come from using the ugly in a very unsettled way.

If a corporation, say X, is manufacturing vaccinations in the market, and financially sponsoring some independent organizations, say Y, and if you read the benefits of a certain vaccination on the Website of Y, where does this leave you as the consumer? Does it put you in the ugly? There are tons of Xs and Ys in the world involved in medications, treatments, therapies, and what not.

Unlike in the movie, the ugly is not just “bad looks” in Web search. It may cause pain, both mentally and physically. Forming opinions and making decisions by using misleading information can have serious consequences. To avoid it, you have to spend a lot of time trying to assess the legitimacy of the information found.

As a search engine user, you shouldn’t worry about the ugly. You shouldn’t be tracing relationships to assess the legitimacy of information. This is one of the aspects where hakia differentiates itself from the others. Starting a week ago, we are displaying search results at top positions from credible sources recommended by expert librarians. Those are the ugly-free results.

The user can point the mouse on the corner of a search result to see the following details:


We display the full name of the organization, the librarians who vetted the sources, and the date of capturing this search result. We call it the “quality stamp”. If we display a source like Wikipedia, which is not as credible as the others, we show its nature: “User Generated Content”.


Our objective is to be the first general purpose search engine where the results are ranked by quality rather than popularity. As a one-stop destination site, we want to offer the users credible and fresh results in every vertical. This is possible, thanks to our scientists, building semantic algorithms and resources in the back-room.

Semantic search technology enables accurate retrieval of information via concept/meaning match. It is very effective, and perhaps the only method, in application to credible and dynamic content. Popularity algorithms cannot work effectively beyond common queries because most of the credible and dynamic content are statistically flat (infertile). That is how semantics is related to quality as the enabling force.

There are more innovations coming your way from hakia. That’s the good thing. It is taking quite an effort and time. That’s the bad thing. And the ugly, we’ve decided to stand up to it.

And Eli Wallach says: Those who are making comments on hakia’s QDEXing ought to read this page before writing a blog post. He means it!

English , , , , , , , , , , , ,

Delivery Context Ontology

April 16th, 2008

W3C’s Ubiquitous Web Applications Working Group has published a Working Draft of Delivery Context Ontology. The Delivery Context Ontology provides a formal model of the characteristics of the environment in which devices interact with the Web or other services. The delivery context is an important source of information that can be used to adapt materials to make them useable on a wide range of different devices with different capabilities. The delivery context includes the characteristics of the device, the software used to access the service and the network providing the connection among others. This document describes the ontology (using OWL) and gives details of each property that it contains.

English

Google AppEngine for Personal Web Presence?

April 14th, 2008

Some thinking aloud...

I've browsed through the Google App Engine gallery and the applications you can find there at the moment are pretty much what you'd expect: lots of Web 2.0 "share this, share that" sites. These are what you'd expect because firstly they're the kind of simple application you'd build whilst exploring any new environment. Secondly because they're exactly the kind of sites that are currently being released every which way you turn.

But for me App Engine is intriguing as it might provide an interesting new perspective on distributing shrink-wrapped packaged software. When Google take the lid off of the number of sign-ups, its going to be a simple matter for anyone to have their own App Engine environment. Forget cheap web hosting and the expensive and configuration overhead that that entails: just sign up for an App Engine account.

App Engine has the potential to provide an enormous number of people with a well-documented stable environment into which an application can be deployed.

It will be interesting to see if anyone seizes on App Engine as an opportunity to create a simple personal application that combines elements of all of the Web 2.0 favourites: bookmarks, blogging, calendar, photos, travel, and perhaps an OpenId provider. One that that makes me the administrator of all of my own data, but doesn't scrimp on the options for other people to harvest, syndicate and browse what I'm uploading.

At the moment our online identities start out fragmented, because we have to push data into a number of different services. And then we strive for ways to bring that data together and knit it into other sites that we, or our social network, use.

But why not turn this on it's head? And seize on App Engine as a way to avoid this early fragmentation and instead start out with a centralized, personal web presence; but one which seamlessly integrates with data in other spaces. The potential is in open data, and services that are built around it. So why aren't we managing our own open data repositories and letting others offer us services against particular aspects of it?

The App Engine environment doesn't involve any configuration on behalf of the end user, and I suspect you could probably create an App Engine Deployer using App Engine itself. So sign-up, deployment and upgrades could also be pretty straight-forward. Python seems well suited for creating a simple modular web application that could be extended to cover new areas as users needed.

Instead of using lots of different web applications, we can each have our own modular web application that is intimately linked into the web, and becomes the primary repository for the data you want on the web. Data portability follows from the fact that you'd be the administrator of your own data.

This would also change the nature of the kinds of applications that we'd need elsewhere on the web. Instead of lots of specialist databases, we need more generic services and more community/local/temporary aggregations.

Uncategorized , , , , , , , , , ,

Filtrbox, un servicio realmente excepcional para recabar aún más información

April 13th, 2008

Filtrbox, un servicio realmente excepcional para recabar aún más información

April 13th, 2008

DataPortability lunch meetup in London / OpenSocial hackathon

April 11th, 2008

20080411a.png

I attended the DataPortability lunch meetup in London on Sunday (see link to some photos above), where I met up with DP enthusiasts including Tom Morris, Tony Haile, Chris Saad (founder), Cassandra Shanks, Imp, Julian Bond, Christian Scholz, and Sokratis Papafloratos. We had some great food and interesting discussions, including DP scenarios, the scope of DataPortability (is it more than just the Social Web?), SIOC, forthcoming announcements, and more…

Tom, Christian and I went to the OpenSocial hackathon at the BT centre afterwards. I spoke with organiser Michael Mahemoff briefly, and Dan Peterson invited us to attend the forthcoming Google I/O event in May. I also listened in to Dan Brickley and Cassie discuss connections between FOAF and the OpenSocial APIs. (Unfortunately, I missed the presentations which were on in the morning before I arrived in London.)

English , , , , , , , , , , , ,

Talk in DERI

April 10th, 2008

On yesterday I gave a talk in DERI NUI Galway about SWAML. Basically I presented our latest work on the project (and our paper in a workshop of WWW2008 in China). It was really nice to discuss about this stuff with first class researchers. Thank John for invite me. By the way, slides are available.

before my talk in DERI

And in the afternoon we also organized a brainstorming about SIOC that was also quite interesting.

It’s a pity that my time here is finishing…

English, Spanish