Archive

Archive for March, 2008

Danja rocks with his “DataPortability and me” video / some slides I’ve made for DP+SIOC

March 31st, 2008

Wow! Danny Ayers has made the best video I’ve seen for the “DataPortability and me” competition, which ends today:

Travelling on the train to Dublin and back this morning, I gathered and made some slides for future presentations on DataPortability and SIOC:

English , ,

Una posible solución para la escalabilidad del razonamiento automático con ontologías

March 30th, 2008
Una posible solución para la escalabilidad del razonamiento automático con ontologías En lógica matemática, una teoría T es una extensión conservativa de T’ toda consecuencia lógica de T en el lenguaje de T’ es demostrable también en T’. En el campo de la Ingeniería Ontológica, la aplicación de esta ...

Spanish

Ficción cartográfica: el engaño de Google Earth, variedades de Riemann y el negocio de la imagen por satélite

March 28th, 2008
Ficción cartográfica: el engaño de Google Earth, variedades de Riemann y el negocio de la imagen por satélite

La deficiencia (y el peligro) fundamental de las fotos por satélites es que están desprovistas de semántica. El proyecto de la Web Semántica Geoespacial pretende paliar esa ausencia de interpretación. Esta idea es el corazón de la conferencia a la que as...

Spanish

zemanta, complemento firefox para redactar posts semanticos

March 28th, 2008

zemanta, complemento firefox para redactar posts semanticos

March 28th, 2008

Semantic Web in the news

March 27th, 2008

Well, the Semantic Web has been in the news a bit recently.

There was the buzz about Twine, a "Semantic Web company", getting another round of funding. Then, Yahoo announced that it will pick up Semantic Web information from the Web, and use it to enhance search. And now the Times online mis-states that I think "Google could be superseded". Sigh. In an otherwise useful discussion largely about what the Semantic Web is and how it will affect people, a misunderstanding which ended up being the title of the blog. In fact, the conversation as I recall started with a question whether, if search engines were the killer app for the familiar Web of documents, what will be the killer app for the Semantic Web.

Text search engines are of course good for searching the text in documents, but the Semantic Web isn't text documents, it is data. It isn't obvious what the killer apps will be - there are many contenders. We know that the sort of query you do on data is different: the SPARQL standard defines a query protocol which allows application builders to query remote data stores. So that is one sort of query on data which is different from text search.

One thing to always remember is that the Web of the future will have BOTH documents and data. The Semantic Web will not supersede the current Web. They will coexist. The techniques for searching and surfing the different aspects will be different but will connect. Text search engines don't have to go out of fashion.

The "Google will be superseded" headline is an unfortunate misunderstanding. I didn't say it. (We have, by the way, asked it to be fixed. One can, after all, update a blog to fix errors, and this should be appropriate. Ian Jacobs wrote an email, left voice mail, and tried to post a reply to the blog, but the reply did not appear on the blog - moderated out? So we tried.)

Now of course, as the name of The Times was once associated with a creditable and independent newspaper :-), the headline was picked up and elaborated on by various well-meaning bloggers. So the blogosphere, which one might hope to be the great safety net under the conventional press, in this case just amplified the error.

I note that here the blogosphere was misled by an online version of a conventional organ. There are many who worry about the inverse, that decent material from established sources will be drowned beneath a tide of low-quality information from less creditable sources.

The Media Standards Trust is a group which has been working with the Web Science Research Initiative (I'm a director of WSRI) to develop ways of encoding the standards of reporting a piece of information purports to meet: "This is an eye-witness report"; or "This photo has not been massaged apart from: cropping"; or "The author of the report has no commercial connection with any products described"; and so on. Like creative commons, which lets you mark your work with a licence, the project involves representing social dimensions of information. And it is another Semantic Web application.

In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.

So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?

A great example of Semantic Web data which works this way is Linked Data. There is growing mass of interlinked public data much of it promoted by the Linked Open Data project. There is an upcoming Linked Data workshop on this at the WWW 2008 Conference in April in Beijing, and in June 17-18 in New York at the Linked Data Planet Conference. Linked data comes alive when you explore it with a generic data browser like the Tabulator. It also comes alive when you make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or using SPARQL, rather than having to learn a new set of APIs for each site and each application area.

I think there is an important "double bus" architecture here, in which there are separate markets for the raw data and for the mashed up data. Data publishers (e.g., government departments) just produce raw data now, and consumer-facing sites (e.g., soccer sites) mash up data from many sources. I might talk about this a bit at WWW 2008.

So in scanning new Semantic Web news, I'll be looking out for re-use of data. The momentum around Linked Open Data is great and exciting -- let us also make sure we make good use of the data.

Uncategorized , , , , , , , , , , , , , , , , , , , , , ,

Twine. Primeras sensaciones en la “ola metaweb”

March 26th, 2008
Dos proyectos metaweb, Twine y Freebase pueden revolucionar la concepción de las redes sociales en la WWW en los próximos meses. Y no sólo por las inversiones que están consiguiendo, sino por el planteamiento revolucionario de sus objetivos. La filosofía de las...

Spanish

Los que cuentan, los que gritan, los que callan en la sociedad de la conversación

March 26th, 2008
Forrester index y propuestas para el cambio cultural.

delicious

The “shadow web” in the limelight

March 25th, 2008

Modelos de negocio abiertos versus nuevas ideas de negocio

March 24th, 2008
Modelos de negocio abiertos versus nuevas ideas de negocio Los modelos de negocio abierto han atraído la atención de muchos inversores, economistas y emprendedores en la última década. En realidad, la atención se centra en algo mucho más concreto: en modelos de negocio para proyectos Web 2.0 (y otros proyectos s...

Spanish

Entender la web semántica a través de las nuevas herramientas

March 24th, 2008
Web semantica es uno de los sinonimos que se aplica al futuro de la web, la web 3.0. Resumen explicativo y listado de recursos.

delicious

Entender la web semántica a través de las nuevas herramientas

March 24th, 2008
Web semantica es uno de los sinonimos que se aplica al futuro de la web, la web 3.0. Resumen explicativo y listado de recursos.

delicious

web semantica o 3.0, explicacion y herramientas | El caparazón

March 23rd, 2008

web semantica o 3.0, explicacion y herramientas | El caparazón

March 23rd, 2008

Web semántica: ¿qué es?

March 19th, 2008

Lo primero es lo primero: ¿de qué se trata todo esto? Si llegaste hasta este blog, es que te interesa saber qué gracia tiene la web semántica, para qué sirve, cómo se puede implementar y en qué nos va a beneficiar.

Para no volver a autoplagiarme, te invito a leer una presentación que preparé hace unos meses a modo de “Introducción a la web semántica”.

¿Dudas, preguntas…? Si querés consultar algo o que ampliemos algún tema en particular, dejanos tu comment! Si no, para el próximo post se viene: Stack de web semántica (lenguajes y protocolos).

Spanish

Yahoo! Search reading the semantic web

March 13th, 2008

Yahoo! Search announced today in their blog post that they will soon support in the Search Monkey project the use of semantic web technologies such as RDFa with standard vocabularies such as FOAF and Dublin Core. It also will do a lot of other cool stuff that you can read about above.

It was nice to see that several other people noticed this. (Techmeme frozen page - #1 story)

I didn’t work on this project and don’t work in the search division, but continue to build with Semantic Web technologies in another more internal-facing part of Yahoo! But it is exciting to see that there are more public applications getting out, such as this and research projects like microsearch by Peter Mika.

Uncategorized , , , , , ,

Conocimiento versus creencias. Deseos versus Intencionalidad

March 13th, 2008
Uno de los objetivos de muchas redes sociales, mundos sintéticos y otras comunidades que cohabitan en la red es la distorsión de realidad física como base para afianzar el círculo casi mágico que rodea a esos proyectos. Es una forma de afianzar la arquitectura...

Spanish

My First Experiences with Twine

March 12th, 2008

Today finally I logged in to Twine the first time. I was reading yesterday about some shortcomings of the system, so I was keen on trying out the system by myself to get my own impression.

It's true that the system isn't as easy to understand as del.icio.us or other bookmarking tools. It takes a while until you get used to all those additional ways you can navigate through the system. Remember: "Twine looks at content and parses it automatically for the names of people, places, organizations and other subject tags. Users are then able to navigate between related content, view recommended content and connect with recommended people with related interests."

The "shortcoming" mentioned by Marshall Kirkpatrick that "... it's hard to keep track of all the levels and types of information available" I can't agree with: This has only to do with a general problem, which arises whenever semantic technologies should enhance the user experience. Either you stay with "simple" user-interfaces like Google or del.icio.us or you spend 5 minutes or so to learn a new piece of software which will help you to save time in the future and which helps you to find related information automatically.

On the other hand I was very surprised, that the automatic recommendations Twine makes on how to annotate or describe a new resource is really unsatisfying. Users will only spend time to tag their bookmarks if the machine comes up with some intelligent suggestions. And it's true, as Marshall says, "most of the web is made up of ugly, non-standard pages."

So hopefully Twine will add that feature before it will open up to the public (isn't there a plan to integrate OpenCalais or something similar?), otherwise there will be no "first mainstream semantic web application" but only another prototype of a yet another semweb-app.

Got something to say? Leave a comment!

English , , , , , , , , , ,

Twinkle on code.google.com

March 11th, 2008

I've created a Google Code project for Twinkle. It's called twinkle-sparql-tools.

If you're a Java developer and/or a user of the tool and are interesting in contributing code then drop me a mail and I'll set up you up with source access.

Uncategorized , , ,

Set Algebra For Updating a Triple Store

March 11th, 2008

Lets assume we have a stored graph Gstore. Also that we have been given another graph of incoming data Gin that contains some modifications to a specific sub-graph.

Lets also assume that we have a function view() that can extract the "equivalent" sub-graph (i.e. equivalent view) of the original data.

In pseudo code to apply these updates we do the following:


Gview = view(Gstore)
Gdelete = Gview - Gin
Ginsert = Gin - Gview
Gstore' = Gstore.remove(Gdelete).add(Ginsert)

Job done. The Jena API provides methods for handling the basic operations see, for example, the difference method. You can also wrap the modifications to Gstore in a transaction.

The nice thing is that this is agnostic to the actual data being updated, we don't care which triples are being added or inserted. This differentiates it from the SPARQL Update Language, specifically the MODIFY operation, which requires the patterns being inserted or deleted to be added to the query. Changesets are much the same.

In the above approach the detail of what is being changed (or is being allowed to change) is shifted out of the triple store update code and into the view() function. The extent of the graph that is returned by this function must match that being passed as input. So we've defined a specific "document type". As it turns out this is quite reasonable as you can generally match, e.g. a RESTful service call, to a view based on the identifier of the item to which the content is being posted, its media-type, other service parameters, etc.

In terms of implementing the view() function, it turns out you can go a long way with a SPARQL CONSTRUCT operation. DESCRIBE isn't suitable as you don't have control over how the sub-graph is built.

I think there are strengths and weaknesses to all of the different approaches to updating RDF stores and suspect that there isn't going to be a one size fits all approach. For example SPARQL Update looks like a handy syntax to use when the modifications all follow predictable patterns, e.g. I'm doing parameterized updates to some stored data, much like parameterized updates in a SQL database. Changesets offer some extra functionality around store versioning which doesn't drop out of the set logic approach (although it could be added).

Oh, and the keen eyed amongst you will notice that this approach does involve some "thrashing" of updates for bnodes, because they don't compare as equal. But what ya gonna do?! :)

Uncategorized , , , ,

Graph Shape Sorting

March 10th, 2008

On Sunday I posted about how constrained views of RDF can be useful in order to document the inputs into an application, validate those inputs, and also manage updates via application of set algebra. I explored the idea that a system may support many such views or "document types" without blessing any as the primary view of the data. And, importantly, that this approach doesn't ultimately constrain the range of data that you can put into a triple store.

It just occured to me that there's another way to explain the concept: a shape sorter.

Photo by ellas dad

A shape sorter can contain many different sizes, shapes, and colours of block. Each can only be put into the box through a specific hole, but once in they're all mixed together. And one can reach in and pick out any or all of them. Depending on which face of the shape sorter you're looking at the options may look quite limited. But the sorter has a whole has a lot of different faces and options.

The inside of the box is the triple store. It can contain many different things. Each block is a specific data format or the shape of a specific sub-graph. Passing a block through a shape is the validation process, and the shape sorter offers many different forms of validation.

Useful alternate explanation or excuse to post a pointer to a pretty picture?


Uncategorized

Modelling Statistical Publications: Some Notes

March 10th, 2008

Lee Feigenbaum has put together a really nice posting discussing different ways of modelling statistical data using RDF. I wanted to contribute to that discussion and add in a few comments about how I've been modelling some of the OECD's statistical publications using RDF.

Note the emphasis: what I've been doing is capturing metadata about individual statistical tables and graphs, their association with specific publications, their metadata, etc. I've not attempted to capture the detail of the statistics themselves, but do have a few relevant comments there.

The background to this is that I'm currently technically leading a project to build the latest version of OECD's electronic library. All of the metadata is stored in RDF, with content available as HTML, PDFs, Excel spreadsheets or as views into the OECD.stat application that the OECD have developed as a power tool for housing and delivering their statistical data.

As Lee discovered in the EuroStat data, regions and countries are core concepts. All of the OECD's statistical output can be classified by country and region, and these are types defined within our schema. We assign URIs to the countries using either the ISO 3166-1 alpha-2 country code or, in the case of classifying data that refers to countries that no longer exist as a specific entity (e.g. Yugoslavia), we use the ISO 3166-3 4 letter country code.

A country may be associated with zero or more Regions, using an Is Part Of relationship. A region may be the European Union, OECD member states or other arbitrary grouping. I suspect the same basic requirements will apply to other statistical datasets.

There are some other types of classification that we associate with the tables:

  • An indicator of whether the table is a "comparative table": e.g. does it include data from multiple countries?
  • An association between the table and a "Table Series" which constitute a collection of tables published over time
  • The statistical Variables that the table contains, e.g. GDP
  • A summary of the time range that the table covers, e.g. "2007", "2005-2007", "2000, 2002-2005", etc. These are captured as simple literals for now as we have to do little/no processing on them at this level.

And then there's the usual collection of title, description, etc. all as multi-lingual literals. All tables are also assigned a DOI to provide a stable link that can be cited in publications. If the table was originally published in a specific Book or journal Article then that relationship is also captured.

Obviously this metadata is, largely, at a level above that which Lee has been exploring, but I thought this might provide some useful context. For anyone looking at capturing statistical data in RDF, there are some other useful places to look at for defining terms and drawing on prior experience.

Firstly the Journal of Economic Literature Classification provides some terms that can be associated with statistical publications to help categorize them. The OECD's statistical glossary fills a similar role.

Secondly, the Statistical Data and Metadata EXchange (SDMX) initiative is also worthy of a look. It's not RDF but, as well as defining XML Schemas and web services for exchanging statistical data, the guidelines include lists of cross-domain concepts and their mappings to those in use by EuroStat, OECD, IMF, etc. So plenty of scope for grounding RDF vocabularies for statistical in a lot of prior art.

Finally, the OECD have some public documentation about the design and implementation of their "MetaStore" database that supports OECD.stat (it's a different beast to the Ingenta MetaStore, I should point out). For example, the document "Management of Statistical Metadata at the OECD" (PDF) has some interesting detail about the different types of metadata (structural, technical, publishing) that is stored in these multi-dimensional data cubes.

Uncategorized , , , , , , , , , , ,

Testing Twine

March 9th, 2008

Thank you Nova and John for the invitation to Twine. For people that don’t know it, Twine is a new social application, something similar to Facebook. But the interesting thing is that it’s built upon a semantic platform. I’m not sure if I’ll use it, probably not, but at least I want to test it.

I don’t talk about its features from a user perspective, but from a semantic web developer. All the items in Twine can be exporter in RDF, for example my user. But it’s a pity that they use their own terms instead using others more extended, such as sioc:User. But well, I hope that with the time they understand the benefits of use common representations.

By the way, if anybody wants to test it, I still have some invitations.

English, Spanish , ,

Documents Types in RDF

March 9th, 2008

The notion of a "document" and a "document type" are core concepts in XML. The specification includes a precise description of document, what it means for a document to be well-formed, valid, and so on. Even if you're not using a DTD or XML schema, and are just using XML as a syntax for exchanging structured or semi-structured data, the concept of document is still a useful one. For example a document has a clear boundary and content, and so there is a limited scope for the data that an application has to deal with.

The ability to define classes of documents ("document types") brings other benefits: the structure and content of documents can be standardized. The document type becomes both a contract that can be enforced by an application prior to its processing of any given document, and a description of the acceptable inputs of that application.

The concepts of "document" and "document type" are quite general and aren't limited to XML applications. See, for example, the JSON schema discussion. The same concepts and their attendant benefits also crop up in messaging systems.

But you don't see much discussion about the concept of a document or their types in RDF applications. Granted, RDF/XML does define a document type for serializing RDF graphs, but we all know that the large variation in how any single RDF graph could be encoded in valid RDF/XML means that the same benefits we get from non-RDF XML vocabularies are lost. The document scope can be highly variable scope, as can content and syntax. Of course it is is possible to create "RDF profiles" that constrain the RDF/XML syntax so that an XML schema can be used to validate documents. Jeni Tennison has recently discussed some approaches to this, and I've explored the topic myself in the past. In fact I regularly apply it when designing RDF based systems: it is extremely useful (essential) to be able to validate incoming data.

But generally the notion of document types doesn't sit well with RDF. RDF is a data model for semi-structured data. It assumes an "open world model" in which missing information is not invalid, or as Dan Brickley has put it "missing isn't broken". This wild and woolly nature of RDF is, I think, one of the reasons many people struggle with it. As Dan says:

If nothing is mandatory, then how can they write code that knows what to expect?

Dan concludes that posting by suggesting that there are certain bedrocks which application authors can still rely on, e.g. XML+Namespaces, conformance to the RDF model, etc. But lately I've come around to the view that we need to go beyond that and offer tighter ways to document, declare and validate data that is being exchanged in RDF applications. I don't know of any applications that adopt an open world model; quite the opposite in fact. I think there are benefits in looking at the notion of "document" and "document type" in an RDF context. Although "document" may not be the right term here, a better one may be "view".

So how might we achieve this, and what are the benefits in doing so?

We can use the aforemention "profiling" option to create an constrained RDF/XML vocabularly that can be validated using XML schema (of whatever kind). Where two parties need to have an agreed on format for data exchange this works well. So for example, the OECD are supplying us with XML documents according to an XML schema. The documents are valid RDF/XML so we can simply pour them into a triple store for our application to use. Each XML document is basically a packet of RDF that describes one section (or sub-graph) of the entire data set. Those same packets are used as the basic message format for passing between internal components (e.g. the search indexer). So this is one useful application of the document concept in an application which is otherwise entirely RDF-driven and which goes to some length to be agnostic to the details of the data it contains.

In a scenario where there isn't any prior co-ordination between the parties exchanging data then there are other options. A typical scenario here might be submitting my FOAF document (either directly, or referenced via an OpenId) to register/configure some online service. There are many ways I might structure my FOAF document, so how does the service validate or check that the required data is present? The answer here is SPARQL. SPARQL can be used to to validate a graph by testing whether specific graph patterns are present using ASK or CONSTRUCT. It can also be used to CONSTRUCT a constrained "view" of the submitted data that throws away anything that the application isn't directly interestd in. The other side benefit to using SPARQL is that it doesn't really matter that RDF syntax is being used: the validation and data extraction is happening at the level of the data model not the syntax.

We use the technique of defining RDF views using CONSTRUCT elsewhere in our applications. The primary one being fetching the data required to present some aspect of the RDF graph to an end user. I've described this, and the underlying system and its assumptions in a recent presentation. Here the "view" or "document type" is used to drive a simple data binding layer, and is essentially the contract between the application logic and the presentation layer. The application doesn't need to deal with the entire graph, just useful use case specific subsets. And these are different "document types" to that used when loading the original data. The application doesn't have a single document type: it has many and they're used in different contexts. This avoids overly constraining the model (we want to be able to store arbitrary additional properties) but imposes local scoping to gain the benefits of validation, known contents, etc.

It turns out that there's another use case where RDF document types or views are useful: managing updates to a triple store. If you know that some incoming data is constrained to a particular view (e.g. by prior agreement, or through extracting only those graph patterns that are of interest) then apply the incoming message as an update to the store is simply a matter of doing some set algebra. Extract the equivalent view from the store (i.e. the relevant sub-graph) and then look for the difference between the stored and incoming sub-graphs. The end result is a list of triples to delete and add to the store.

I'll follow up more on the topics in this posting, as I think there are huge benefits to be had here from looking at how the notion of documents and document types can add value to RDF systems. It's very easy to get caught up in the completely general case of a highly-distributed, wild and woolly world of RDF and the Semantic Web. But the majority of applications will have a much more limited world view, and my experience so far is that applying some additional constraints here and there can have huge benefits. Embracing the notion of multiple document types is one of these.

Uncategorized , , , , ,

Inclusiva-Net 2008. Cuarta (y última) jornada de comunicaciones

March 8th, 2008
Inclusiva-Net 2008. Cuarta (y última) jornada de comunicaciones

Esta última jornada consistió en tres comunicaciones y una conferencia de clausura. La primera comunicación, SPIP GIS, de Horacio González Diéguez, consistió en presentar el proyecto escoitar.org proyecto que consiste en almacenar con geolocalizaci&oac...

Spanish

Inclusiva-net 2008. Tercera jornada

March 7th, 2008

En tercer día de comunicaciones de Inclusiva Net se han presentado tres de éstas y se cerró con una conferencia, todas versando sobre medios locativos, y cómo los media aumentan la realidad del espacio urbano, paisajístico, etc. En Eversión and locati...

Spanish

Oxford SWIG Talks: Twinkle & SPARQL Query Forms

March 7th, 2008

I finally found time to attend one of the Oxford SWIG sessions last night and had a thoroughly enjoyable time.

I gave a couple of presentations which I've posted to slideshare, and which I'll embed below.

The first was a general introduction and mini-demonstration of Twinkle. I gave a basic overview of the key features and showed how the configuration drives the user interface:

The second talk as about the different SPARQL query forms. I started by asking the question "why are there four different query forms?" and then proceeded to examine each one and talk about the benefits and their applied use.

The talk was streamed online via Yahoo Live which was a nice touch as one SWIGger was at home with a broken ankle (get well soon Katie!). It'd be nice to see more use of free video streaming at other events.

Uncategorized , ,

Cork: WebCamp and BlogTalk

March 3rd, 2008

I arrived on Saturday’s evening to Cork, with time just to go some pubs to taste the local beer.

On yesterday I attended to the WebCamp workshop on Social Network Portability. An interesting event, but probably more for its social component than a purely technical point of view. For example, I don’t agree with some opinions discrediting serious and rigorous solutions; I mean, hacking is very funny, but you can’t improvise always. I think it’s more constructive if the people involve in create good technology, not only cool technology. Later we had a really nice bloggers dinner in a restaurant in the city center, continuing with some pints before returned to our b&b to sleep.

Today BlogTalk 2008, the 5th International Conference in Social Software, started. The first three talks were really interesting, and the programme promises more… congratulations John for your superb work organazing this event. More in the blogosphere.

And if you want to see some pictures, I create a set in flickr for this trip, enjoy it.

English, Spanish ,

¿DE DÓNDE VIENEN LOS AGENTES DE SOFTWARE?

March 3rd, 2008
Los agentes de software serán parte de la Web semántica, pero no están restringidos a ella. Cada vez se usan más en aplicaciones de todo tipo: comercio electrónico, sistemas de telecomunicaciones, control de procesos industriales, búsqueda de información, control del tráfico aéreo, reingeniería de procesos, gestión de agendas, organización de correos electrónicos, etc. Puede que en el futuro no exista la Web semántica o que sólo haya "islotes" semánticos. Sea como fuere, los agentes están aquí para quedarse. En este artículo veremos de dónde viene la tecnología de agentes.

Cada vez más, se necesitan programas o aplicaciones flexibles, que sean capaces de anticiparse a las necesidades de los usuarios de sistemas informáticos y de adaptarse a ellas. Los agentes son una solución a esa necesidad. Un agente de software es una entidad autónoma de software que puede interaccionar con su entorno. James Hendler considera que los agentes de software no difieren mucho de los agentes humanos: "… los agentes podrían encontrar posibles maneras de cumplir las necesidades de los usuarios y ofrecer al usuario elecciones para su realización. Del mismo modo que un agente de viajes podría darle una lista de varios vuelos que usted podría coger, o una elección entre volar o coger un tren, un agente de la Web podría ofrecer una lista de posibles maneras de obtener lo que necesita en la Web".

Los agentes proceden de los campos de la inteligencia artificial (IA) y de la ingeniería del software (en particular, de la orientación a objetos). Desde un punto de vista conceptual los agentes proceden del modelo de actores concurrentes que propusieron Carl Hewitt, Peter Bishop y Richard Steiger en 1973. Los actores, directos predecesores de los agentes, fueron definidos por Hewitt en 1977 como "objetos autocontenidos, interactivos y que se ejecutan concurrentemente, que poseen estado interno y capacidad de comunicarse" y como "agentes computacionales que tienen una dirección de correo y un comportamiento". Los actores se comunican mediante un intercambio de mensajes y llevan a cabo sus acciones concurrentemente (es decir, sus acciones pueden ejecutarse en paralelo, sin secuencias fijadas de antemano). La principal diferencia entre los actores y los agentes es que estos últimos suelen tener restricciones relacionadas con metas o propósitos.

Las relaciones y las diferencias entre los objetos y los agentes se detallan en http://www.wshoy.sidar.org/index.php?2007/05/08/38-los-trabajadores-de-la-web-semantica-agentes-agentes-inteligentes-y-agentes-semanticos. La ingeniería de software tiende a adoptar enfoques un tanto totalitarios: todo es un actor, todo es un objeto…

Hay una fuerte relación entre los agentes y la IA: provienen del campo de la inteligencia artificial distribuida (IAD), que estudia métodos y técnicas para la resolución de problemas mediante la cooperación de diversas entidades distribuidas, autónomas e inteligentes. En la IAD se entremezclan dos disciplinas: la IA y los sistemas distribuidos. Un sistema distribuido es, según George Coulouris, "un sistema en el que los componentes de hardware y/o software localizados en computadores en red se comunican y coordinan sus acciones intercambiando mensajes". Aunque usted no supiera qué es un sistema distribuido, seguro que ha usado alguno. Si no fuera así, no estaría leyendo esto, pues Internet y la World Wide Web son sistemas distribuidos.

En la IAD, la colaboración de unas entidades con otras produce comportamientos colectivos que resuelven problemas que serían irresolubles si se abordaran individualmente o que proporcionan soluciones eficaces en cuanto a tiempo, velocidad o calidad. Un ejemplo de inteligencia "natural" distribuida nos la proporciona una colonia de termitas: la colaboración entre ninfas, obreras, soldados y la reina permite la supervivencia de la colonia. Las termitas por separado no podrían sobrevivir (los soldados no pueden alimentarse por sí solos, la reina apenas puede moverse y las obreras no pueden defenderse); pero su cooperación les ha permitido existir desde hace millones de años en este planeta. Quién sabe, quizás sobrevivan al Homo sapiens sapiens: en el gran libro de la evolución quedan muchas páginas por escribir.

La IAD consta de tres grandes ramas de investigación: los sistemas multiagente (que estudian sistemas en que un conjunto de agentes cooperan, coordinan y se comunican para conseguir un objetivo común), la solución distribuida de problemas (que estudia la solución de problemas mediante procesamientos descentralizados) y la inteligencia artificial en paralelo (que desarrolla métodos y algoritmos paralelos de IA). Dentro de la IAD, los agentes provienen de los sistemas multiagente, que son grupos de agentes autónomos, generalmente heterogéneos e independientes, que colaboran entre sí para conseguir ciertos objetivos; esta colaboración implica que cooperen, se coordinen y negocien unos con otros. En un sistema multiagente no hay un control global del sistema ni existe un lugar donde esté toda la información.

Tal y como se menciona en http://www.wshoy.sidar.org/index.php?2007/05/08/38-los-trabajadores-de-la-web-semantica-agentes-agentes-inteligentes-y-agentes-semanticos, no es imprescindible que los agentes del sistema distribuido sean inteligentes (esto es, que tengan algún tipo de inteligencia artificial); la propia "inteligencia" puede obtenerse de la cooperación entre agentes "tontos". Este tipo de inteligencia se denomina inteligencia social, y es la empleada en los partidos de fútbol entre robots. En estos partidos, cada robot persigue dos metas bien simples: marcar gol y esquivar a los jugadores del equipo contrario. La combinación de los comportamientos individuales para lograr dichos objetivos hace emerger un comportamiento social semejante al de cualquier equipo de fútbol humano, salvo en la celebración de los goles. Las colonias de termitas, mencionadas antes, son ejemplos biológicos de inteligencia social –al igual que las colonias de hormigas o de abejas–. En las colonias, cada individuo tiene unas metas individuales, programadas genéticamente, más complejas que las de los robots futbolistas.

Los sistemas multiagente se enfrentan a varias preguntas: ¿qué lenguajes deben usar los agentes para comunicarse?, ¿cómo deben coordinarse los agentes para que consigan los objetivos del sistema?, ¿cómo pueden los agentes resolver los conflictos (de intereses, p. ej.) que pueden surgir mientras colaboran?, ¿qué relaciones sociales surgen en una comunidad de agentes?

Gran parte de las propiedades y ventajas de los sistemas multiagente proceden de los sistemas de IA distribuida. Veamos algunas de ellas:

  • Modularidad. Según el Dictionary of Object Technologies: The Definitive Desk Referente, la modularidad es "la descomposición lógica de las cosas (por ejemplo, responsabilidades y software) en agrupaciones simples, pequeñas (p. ej., requisitos y clases, respectivamente), que aumentan las posibilidades de lograr las metas de la ingeniería de software". La programación modular simplifica el desarrollo de sistemas de software y reduce su coste. (Si quiere saber por qué la modularidad ganó la II Guerra Mundial puede consultar http://www.javahispano.org/tutorials.item.action?id=25 o http://www.javahispano.org/contenidos/es/orientacion_a_objetos_11/.)
  • Bajo acoplamiento. Suele utilizarse el término acoplamiento para designar la dependencia entre módulos o componentes de un sistema. En un sistema de software de bajo acoplamiento, cada componente depende lo mínimo posible de los otros. En estos sistemas, los componentes pueden comunicarse a pesar de tener diseños e implementaciones muy distintas. Por el contrario, en un sistema de acoplamiento fuerte, los componentes están diseñados para trabajar estrechamente con otros y dependen fuertemente unos de otros. Por ejemplo, el controlador de una impresora tiene un acoplamiento muy fuerte con la plataforma en que se ejecuta: pasar de un PC a un Mac requeriría programar de nuevo el controlador. El bajo acoplamiento de los sistemas multiagente se traduce en flexibilidad (si hay que modificar algún agente, los cambios apenas repercutirán en el resto de los agentes) e interoperabilidad (los agentes pueden trabajar juntos aunque hayan sido diseñados y programados independientemente).
  • Fiabilidad. Que un agente del sistema deje de funcionar no implica que los demás lo hagan.
  • Eficacia. Las funciones del sistema se pueden dividir en tareas repartidas entre los agentes, con lo cual se consigue paralelismo (los agentes trabajan a la vez en distintas máquinas).
  • Flexibilidad. Se pueden añadir y eliminar agentes dinámicamente, y éstos pueden tener diseños e implementaciones muy distintas.
  • Independencia de la plataforma. Los agentes pueden funcionar en distintas plataformas. Esta independencia está relacionada con el bajo acoplamiento de los sistemas multiagente.
  • Velocidad. Como los agentes que cooperan entre sí se ejecutan concurrentemente, aumenta la velocidad de ejecución del sistema en conjunto.
  • Redundancia. La utilización de agentes redundantes (es decir, que desempeñan una misma tarea) mejora la tolerancia a fallos del sistema.
  • Escalabilidad. El sistema mantiene su eficacia cuando aumenta significativamente el número de usuarios del sistema.

Si bien los agentes proceden del campo de la IA y forman un subcampo relevante de ella, cada vez más en continua expansión, la IA no termina en los agentes. Los agentes no resuelven todos los problemas a los que se enfrenta la IA. A saber: comprensión automática de textos en lenguaje natural, traducción de textos, reconocimiento y síntesis automáticas del habla, construcción de sistemas capaces de pensar de manera original o creativa, introducción del sentido común en máquinas, fabricación de sistemas de reconocimiento de rostros o de formas… Dirá el escéptico: "¿No debería la IA resolver todos esos problemas antes de anunciar lo listos que son los agentes?". Pobre escéptico, porque con esa exigencia jamás habría visto un agente en su vida (ni en varias vidas, si cree en la reencarnación). Dudo mucho que algún lector llegue a ver una máquina que hable así espontáneamente: "Sé que últimamente he tomado muy malas decisiones, pero puedo asegurarle que mi trabajo se normalizará. Todavía tengo gran entusiasmo y confianza en la misión, y quiero ayudarle". Hasta el momento, los agentes inteligentes trabajan en dominios muy limitados (la Web, bases de datos, colecciones de documentos, correo electrónico) y realizan tareas muy sencillas. Así, necesitan muy poca inteligencia y no tienen que enfrentarse a muchos de los problemas de la IA aún sin solución.

¿Adónde van los agentes? Por ahora, a todas partes: cada vez hay más aplicaciones comerciales y académicas basadas en ellos. Si al final son sustituidos por otra tecnología (basada en "siervos", por ejemplo), puede estar seguro de que oiremos frases como éstas: "Todo es un siervo", "La tecnología de siervos aumentará la productividad de las empresas", "Los siervos abren la puerta a una nueva era de tecnología", "Con los siervos, los usuarios no perderán el tiempo en tareas repetitivas"…

Spanish