Is Twitters plan to log all clicks a privacy loss?



This movie requires Flash Player 9
September 2nd, 2010

Twitter’s planned shortening of all links via its t.co service is about to happen. The initial motivation was security, according to Twitter:

“Twitter’s link service at http://t.co is used to better protect users from malicious sites that engage in spreading malware, phishing attacks, and other harmful activity. A link converted by Twitter’s link service is checked against a list of potentially dangerous sites. When there’s a match, users can be warned before they continue.”

Declan McCullagh reports that Twitter announced in an email message that when someone click “on these links from Twitter.com or a Twitter application, Twitter will log that click.” Such information is extremely valuable. Give Twitter’s tens of millions of active users, just knowing how often certain URLs are clicked by people indicates what entities and topics are of interest at the moment.

“Our link service will also be used to measure information like how many times a link has been clicked. Eventually, this information will become an important quality signal for our Resonance algorithm—the way we determine if a Tweet is relevant and interesting.”

Associating the clicks with a user, IP address, location or device can yield even more information — like what you are interested in right now. Moreover, Twitter now has a way to associate arbitrary annotation metadata with each tweet. Analyzing all of this data can identify, for example, communities of users with common interests and the influential members within them.

Note that Twitter has not said it will do this or even that it will record and keep any user-identifiable information along with the clicks. They might just log the aggregate number of clicks in a window of time. But going the next step and capturing the additional information would be, in my mind, irresistible, even if there was no immediate plan to use it.

Search engines like Google already link clicks to users and IP addresses and use the information to improve their ranking algorithms and probably in many other ways. But what is troubling is the seemingly inexorable erosion of our online privacy. There will be no way to opt out of having your link wrapped by the t.co service and no announced way to opt out of having your clicks logged.

English, Semantic Web, privacy, social media, twitter


Why SKOS thesauri matter – the next generation of semantic technologies



This movie requires Flash Player 9
August 30th, 2010

As a matter of fact still a lot of “semantic technologies” are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search”, the search for similar documents to enable cross-reading or recommendation systems.

Providers of first generation semantic technologies calculate rather basic “semantic networks” by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.

A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.

STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.

This background knowledge is used in this demo app to improve the search for similar documents dramatically:

Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents”.

This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.

Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.

English, PoolParty, Tools & Software, lasso, lod2, recommender system, search engines, semantic web applications, similarity search, text mining


Welcome picnic for CSEE grad students, 1:30-3:30 Mon 8/30



This movie requires Flash Player 9
August 29th, 2010

The UMBC ACM Student Chapter and CSEE Department are hosting a Welcome/Welcome Back picnic for all new and returning CSEE graduate students, faculty and staff this coming Monday, 30 August. It will be held from 2:00pm to 4:00pm 1:30pm to 3:30pm in the atrium of the Engineering and Computer Science (ECS) building. Food and drinks will be provided.

To get to the ECS building atrium, walk from ITE to the ECS building from the second floor of ITE and you will enter the atrium. Please come out on the day before classes and enjoy some food while catching up.

Everyone is also encouraged to also attend Convocation 2010, the formal opening of the academic year at UMBC, from 3:30 to 4:30 pm in the Retriever Activities Center. President Hrabowski will address the gathering and Wendy Salkind, Presidential Teaching Professor for 2010-13, will make brief remarks.

After Convocation, all faculty, staff and students are invited to yet another free Community Picnic on the UMBC Quad from 4:30 to 7:00pm. The rain location will be the Residence Life Dining Hall.

English, UMBC, acm, csee


Graphical “more like this” Query Building



This movie requires Flash Player 9
August 29th, 2010

I promised in an earlier blog post to talk about how to create queries over OWL in RDF.  So here it is.

As Ivan alluded in his comment, there are some syntax issues with talking about OWL restrictions in RDF.  What is he referring to?  Well, let's take the same example in the last blog post, a datatype restriction about things with age>=21.  We could write this in Manchester Syntax as 

hasAge only xsd:integer [>=21]

But the OWL/RDF rendition of this is where the 'arcane' syntax comes in.  We can see it just by looking at the source code in turtle, where it looks like this:

[] a owl:Restriction ;
owl:allValuesFrom
[ a rdfs:Datatype ;
owl:onDatatype xsd:integer ;
owl:withRestrictions
([ xsd:minInclusive 21])
] ;
owl:onProperty :hasAge .

In the last blog entry, we saw a rule that would match this sort of definition, so that we could classify persons of appropriate ages as Adults.  That rule looked like this:

CONSTRUCT {
    ?x a ?restriction .
}
WHERE {
    ?datatype owl:onDatatype xsd:integer .
    ?datatype owl:withRestrictions ?var .
    ?datatype a rdfs:Datatype .
    ?restriction owl:allValuesFrom ?datatype .
    ?restriction a owl:Restriction .
    ?restriction owl:onProperty ?datatypeproperty .
    ?var rdf:first ?var1 .
    ?var1 xsd:minInclusive ?mval .
    ?x ?datatypeproperty ?val .
    FILTER (?val >= ?mval) .
}

How do you write a rule like that?  By looking up in the standard how to express datatype restrictions, and how to link those to restricted value sets, and . . . . if that seems labor intensive and error-prone to you, then you're right.  It is.

But we can use a power-tool to help make this happen. The power tools aren't included in the free version of TopBraid Composer, so if you want to follow along here, you'll need the Maestro Edition; a 30-day trial is available for free.

Start by loading http://workingontologist.org/Examples/adult.rdf into Composer, just as shown before, and open it. We're going to use the model itself as a prototype to create a query. Let's start by looking at an example of the restriction we want to match - look at the definition of Adult in the model:

Man 

You can type it in just like that.  But that doesn't help us write a SPARQL query to match any restriction of this form.  How can we do that?   If you click on "Graph" at the bottom of the pane, you can explore this definition, in RDF.  If you drill down to the Datatype Restriction itself, you get a view like the top of this figure:

Minequery 

This is just a graphic representation of triples in the model - you can see all the structure of the RDF representation of the restriction. 

Now comes the fun part - let's turn this image into a query (which, to avoid suspense, is already shown at the bottom of the figure).  We want a query that will match "things like this" restriction.  What does "like this" mean?  That's what we have to specify - there are some aspects of this example that should be included in the match (like the fact that it is a owl:Restriction, on a rdfs:Datatype xsd:integer, and that it is a owl:minInclusive restriction), and others should not be included in the match (that the property is :hasAge; after all, we this to match for restrictions on any property).  So, we select the things that we want to keep in the query, marked with a small "x" (you can set/reset the "x" by clicking on the small box in each node in the graph).  

Once you have selected the aspects that specify what you mean by "like this" (a Datatype Restriction, on some property, with minInclusive over xsd:integers), you can generate the query automatically by clicking the  Starbutton.  You can see the generated query at the bottom of the figure. 

All the generator did was to take the triples shown in the figure, and render them in the query.  Selected nodes (with "x") appear in the query as themselves; unselected nodes (no "x") become variables.  Properties always show up as themselves.   Best guesses are made for meaningful variable names; it uses type information for the guesses.  

There are a few differences between the generated query and the WHERE clause of the rule:

WHERE {
?datatype owl:onDatatype xsd:integer .
?datatype owl:withRestrictions ?var .
?datatype a rdfs:Datatype .
?restriction owl:allValuesFrom ?datatype .
?restriction a owl:Restriction .
?restriction owl:onProperty ?datatypeproperty .
?var rdf:first ?var1 .
?var1 xsd:minInclusive ?mval .
?x ?datatypeproperty ?val .
FILTER (?val >= ?mval) .
}

The first difference is ordering of triples - the generator isn't very fussy about the order in which triples are generated, so it is different each time (if you are following along at home, your generated query will probably be different from the one shown here, and also from the rule).  

The second difference is the inclusion of a triple to match data, to wit:

 ?x ?datatypeproperty ?val .

After all, in a rule, we want to say "when some data satisfies this restriction, ..." This clause uses the same variable for the property (?datatypeproperty) as used in the rest of the query. 

The final difference has to do with the constant "21".  The generated query includes the constant, whereas the rule turns it into a variable (?mval) and adds a filter to compare it to the actual data (?val).  After all, the value "21" comes from the model, and shouldn't be built in to the rule. 

So yes, these modifications have to be made by hand (using the SPARQL editor, where the generator put the query).  The query generator should be seen as a power tool; you still need an operator who knows how to use it, but it simplifies a lot of the heavy lifting for query writing.  In this case, we have a rule with 10 clauses (9 triples and a filter).  The generator created seven of the triples, and most of the eighth one; the human only had to write the last two clauses.  That is, the power tool took care of the "arcane syntax" that Ivan referred to, leaving the human to figure out what they really want the rule to mean.

I use this feature of TopBraid Composer all the time, in this pattern.  I want to write a query that matches some 'arcane' bit of RDF (e.g., from dbpedia, the OWL in RDF standard, the XML DOM, SKOS, etc.). Instead of trying to write a query from scratch, I find (or even build) an example of the thing I want to match.  Then I generate the query - automatically guaranteeing that I didn't leave out any triples, that I got all the namespaces and property names correct, that I didn't accidentally collide bnodes by giving them the same variable name, etc.  Then I beat up the result to create the query that I really want - in which I define what I want to do with the match. 

So when you see an elaborate query with dozens of triples in it, and you wonder what sort of geek can write or maintain such a thing, keep in mind that it might not have been written at all; it might have been generated from an example.

Uncategorized


UMBC launches new cybersecurity graduate programs



This movie requires Flash Player 9
August 27th, 2010

UMBC has established two new graduate programs in cybersecurity education, one leading to a Master’s in Professional Studies (MPS) degree in cybersecurity and another to a graduate certificate in cybersecurity strategy and policy. Both are designed for students and working professionals who aspire to make a difference in the security, stability, and functional agility of the national and global information infrastructure. The programs will begin in January 2011.

English, MS, Security, UMBC, certificate, cybersecurity, graduate


Leaving Yahoo – Joining Digg



This movie requires Flash Player 9
August 26th, 2010

I’m heading to a new adventure at Digg in San Francisco to be a lead software engineer working on APIs and syndication.

I’ve been at Yahoo! nearly 5 years so it is both a happy and sad time for me, and I wish all the excellent people I worked with the best of luck in future.

Here is a summary of the main changes:

  • Silicon Valley -> San Francisco
  • 15,000 staff -> 100 staff
  • Architect -> Software engineer
  • strategizing, meeting -> coding
  • Powerpoint, OmniGraffle, twiki -> emacs, eclipse, …?
  • (No coding!) -> Python, Java, Hadoop, Cassandra, …?
  • Sunny days -> Foggy days
  • 15 min commute -> 2.5hr commute (until I move to SF)
  • Public company -> private company

Exciting!

English, comment


Extending OWL RL



This movie requires Flash Player 9
August 25th, 2010
I've always been a fan of describing OWL in terms of rules. When introducing a someone to a new technology, it is nice to be able to describe it simply (a lesson that facebook taught us again recently). And while it is a bit of a white lie to say that OWL is defined just by a set of rules, it makes it very easy to explain what something in OWL (or RDFS) means, by stating a rule that it follows.

I've actually been using a rule-based definition of OWL for years now, starting back at Intellidimension years ago, and then using OWLIM, and nowadays SPIN. All of these technologies have been 'approximating' OWL for years using variations of Datalog technology - implementing OWL as a set of rules.

While OWL 2's creation of three profiles and a subset hardly counts as keeping the standard simple, I have to say I appreciate the legitimacy that the OWL 2 RL profile has given to a practice that many of us (more than just the ones I have listed) have been doing for years now - of using rule-based systems to process OWL. And the RIF folks have even done us the favor of writing out just what rules OWL 2 RL is made of.

One of the things I have always liked about this approach is the flexibility it gives the system builder in trading off performance vs. expressiveness in the modeling language. You don't need someValuesFrom restrictions? Fine - take those rules out, and speed up the system. I've taken systems from intractable 20-minute response times down to almost instant by fine-tuning the rule system, while still maintaining the same semantics - because my model didn't use the discarded rules.

But today I want to talk about another advantage of this approach - that you can extend your model semantics as well. Suppose there is something in OWL-Full that you want to use, but it doesn't appear in the OWL 2 RL list of rules? What can you do about it? You could switch approaches, and use another style of reasoner, but then you lose the advantage of being able to tune your rule base. Another approach is to encode just the extensions that you want in rules.

Let's take a simple example of this, using SPIN as our rule language. You can follow along yourself if you like - all you need is the Free Edition of TopBraid Composer.

OWL-Full allows something called Data Range Expressions, in which you can define a range to be a set of values. A simple example of this is the notion of Adult, that is a person who has an age greater or equal to 21. An example of a model with this definition can be found at http://www.workingontologist.org/Examples/adult.rdf.

You can import this file into TopBraid Composer by right-clicking on the TopBraid project, selecting "Import RDF or OWL File from the Web" and pasting in the URL of the model, http://www.workingontologist.org/Examples/adult.rdf (see first figure). 

CreateFile

Open the file adult.rdf by double-clicking, then expand owl:Thing to see the ontology. Click on "Adult" to see its definition - a Person who hasAge only from values greater or equal to 21 (see second figure).

OpenAdult

Notice that there are also three instances of the class Person - with ages 23, 18 and 45. Evidently, two of these are adults, and one is not.

Persons

Now we run SPIN inferences (by pressing the  Inference button), and we see that indeed just the people of appropriate age are classified as Adults.

Done

How did this work?

SPIN works by expressing the rules for OWL in SPARQL. Thanks to the RIF effort, mentioned above, we at TopQuadrant were able to write out all the OWL 2 RL rules in SPIN (since SPARQL has the same expressive power as RIF). This example simply imported these rules from http://topbraid.org/spin/owlrl-all. The SPIN inferencer finds these rules, and executes them when you press the Inference button. We can see one of these rules in the following figure - it is a familiar rule, telling us how rdfs:subPropertyOf works.

SubPRule

But that doesn't explain the whole thing - if you know OWL 2 RL well, you know that DataRange Expressions are not part of the OWL 2 RL profile. There are good technical reasons why it was left out, but that doesn't keep us from wanting to do these inferences. So we express them in SPARQL and add them in to our rule set for the SPIN inferencer to work on. One such rule is shown in the next figure;

MinInclusive
most of the rule matches the RDF rendition of the OWL data restriction. It matches restrictions of xsd:integer, where all the values come from the set defined by minexclusive for some value (in our case, 21). When all these things match, then we assert that the instance is a member of the restriction.

So in the case of :Person_1 who is 23 years old, the property :hasAge matches the variable ?datatypeproperty, and 21 matches the variable ?mval, while the actual age 23 matches the variable ?val. Since 23 > 21, ?val > ?mval, and the rule matches. Hence, :Person_1 is a member of the restriction, and by the rest of the rules from OWL-RL, is an :Adult.

This approach to OWL gives a lot of control to the modeler; they can use standard models (like the OWL 2 RL model we used here), but they can also augment this reasoning with new rules that do just as much inferencing as is needed for the application. These new rules can be consistent with the standard OWL-Full rules, or they could even be domain-specific business rules. In any case, the power lies in the hands of the modeler. In the particular case of SPIN, we have the added advantage that the modeler can write these rules in the standard SPARQL language.

Uncategorized


Gridworks Reconciliation API Implementation



This movie requires Flash Player 9
August 25th, 2010

Gridworks is a really fantastic tool and there’s scope to extend it in all kinds of interesting ways. Jeni Tennison has recently published a great blog post describing how to use Gridworks for generating Linked Data. I strongly encourage you to read her posting as it not only provides a good introduction to Gridworks itself, but also shows a nice real world example of generating RDF using its built-in data cleaning and templating tools.

I was luckily enough to meet David Huynh as a workshop recently and chatted to him briefly about another aspect of the Gridworks: its ability to match field values in a dataset to entities in Freebase, e.g. identifying a place based on just it’s name. Within Gridworks this process is known as “reconciliation”.

Reconciliation is an important step for generating good Linked Data as you’ll often need to correlate values in a dataset with URIs in existing datasets in order to generate links. E.g. matching company names to their URIs. While it is possible to generate identifiers algorithmically during a conversion this typically just defers the reconciliation work until a later stage, when you carry out cross-linking to introduce equivalence links.

Recognising that the ability to introduce new reconciliation services would be a powerful extension to Gridworks, David Huynh has been creating a draft specification that will allow third-parties to create and deploy their own reconciliation services. He’s been documenting his progress on implementing the client side of this protocol and has published a testing service.

It occurred to me that the reconciliation API is essentially a structured search over a dataset and thus could be implemented against the search interface exposed by Talis Platform stores. The RSS 1.0 feeds that the Platform returns includes enough information to rank and filter results as required by the API.

I’ve created a simple Ruby application, using the Sinatra web framework, that implements the reconciliation API for any Talis Platform store. You can find the code on github if you want to have a play with it. As I note in the README there are some areas where customisation is useful to get the most from the service. So while in principle it can be used against any existing Platform store you can create a simple JSON config to tweak it for particular datasets.

There’s a live version of the code running one my server here: http://ldodds.com/gridworks/.

That page has a simple API console for carrying out queries, but consult the draft specification for more details. I think I’ve covered all of the basic features (but bug reports welcome!). Consult the README for notes on configuration options and implementation decisions.

As a simple illustration, lets say that I have the value “Bath” in a dataset and want to match that to some area in the UK administrative geography. This information is available from the Linked Data exposed by statistics.data.gov.uk and this happens to be hosted in this platform store. The reconciliation API we need can therefore be found at: http://ldodds.com/gridworks/govuk-statistics/reconcile. An HTTP GET on that location retrieves the service metadata.

If we use the API explorer we can use a simple HTML form to try out examples. Select govuk-statistics from the Store drop-down and then type Bath into the search box. You’ll get this result. This is not very readable by default, so if you’re using Firefox I recommend you install the JSONView extension which provides a nicely formatted display.

Our initial search returns a number of results. The highest ranked of these being the Westminster Constituency for Bath. That seems like a pretty good initial result to me. As it is the most relevant result in the search it’s marked as an exact match, so once integrated with Gridworks it will capture and store the reconciled identifier for you.

However, we may know that in the imaginary dataset we’re working with, that a particular field doesn’t contain names of constituencies. It may instead refer to a Local Education Authority. We can refine our search by adding the URI that defines that type of resource into the type field in the API explorer.

Try pasting in http://statistics.data.gov.uk/def/geography/LocalEducationAuthority into the post and running the search again. You’ll find that this time you get a single result, which is Bath and North East Somerset. Job done.

Of course, to get the most from this you need to know what URIs you can use for filtering by types (and properties). But this is something that the Gridworks UI will help with. It can integrate with “suggestion services” that can be used to help map values to a properties and types within a schema. I’ll be looking at how to expose those as my next piece of work.

Hopefully you can see how the overall system works. Feel free to have a play with the API to try it out for yourself. If you have comments on the implementation then I’d love to hear them, but I’d suggest that comments on the specification are best addressed to the gridworks mailing list.

I also suspect the Reconciliation API has uses outside of just Gridworks. For example, I wonder how easy it would be to introduce reconciliation into Google Spreadsheets using Google Apps Script? It’s also another nice demonstration of how easy it is to map simple RESTful APIs onto RDF datasets, this implementation works for any data in the Platform, no matter what schema it confirms with. Neat.

#linkeddata, English, RDF, Semantic Web, Talis, gridworks


Vídeo: Change Happens 2010



This movie requires Flash Player 9
August 25th, 2010

Aunque esté un poco perdida hasta el próximo jueves, “Edupunkeando” en buena compañía, como os anunciaba, por Andalucía (os dejo en el enlace el Tumblr para el encuentro que añadía Alejandro y que retransmite algunas de las cuestiones que van apareciendo en el curso), tenía pendiente dejar un vídeo que creo que os puede resultar muy interesante, por temática y novedad.

Complementa el que veíamos hace un tiempo, una segunda edición, en mi opinión mejor que la primera de El futuro de Internet, de Simón Hergueta y podría complementar una de las entradas con más visitas en El caparazón, los 5 vídeos para entender la Web 3.0.

Pronto de vuelta… con noticias del planeta Androide :)

Decía Asimov que la única constante es el cambio….

¿Te ha gustado? Compártelo



  • del.icio.us
  • Twitter
  • Facebook
  • Meneame
  • StumbleUpon
  • Wikio
  • Bitacoras.com
  • Diigo
  • FriendFeed
  • Ping.fm
  • PDF
  • Print
  • email
  • LaTafanera
  • RSS

2010, Ciencia, Planeta educativo, Spanish, cibercultura, cloud computing-web 4.0, futurismo, innovación, inteligencia colectiva, video-documentales, videos-creacion, web3.0, zeitgeist evolución


Yahoo! using Bing search engine in US and Canada



This movie requires Flash Player 9
August 24th, 2010

Google, Bing, Yahoo!Microsoft’s Bing team announced on their blog that that the Bing search engine is “powering Yahoo!’s search results” in the US and Canada for English queries. Yahoo also has a post on their Yahoo! Search Blog.

The San Jose Mercury News reports:

“Tuesday, nearly 13 months after Yahoo and Microsoft announced plans to collaborate on Internet search in hopes of challenging Google’s market dominance, the two companies announced that the results of all Yahoo English language searches made in the United States and Canada are coming from Microsoft’s Bing search engine. The two companies are still racing to complete the transition of paid search, the text advertising links that run beside and above the standard search results, before the make-or-break holiday period — a much more difficult task.”

Combining the traffic from Microsoft and Yahoo will give the Bing a more significant share of the Web search market. That should help them by providing both companies with a larger stream of search related data that can be exploited to improve search relevance, ad placement and trend spotting. It will also help to foster competition with Google focused on developing better search technology.

Hopefully, Bing will be able to benefit from the good work done at Yahoo! on adding more semantics to Web search.

Bing, English, Google, Microsoft, Semantic Web, Yahoo, search, social media


Middle-earth dictionary attack



This movie requires Flash Player 9
August 24th, 2010

Middle-earth dictionary attack

Middle earth dictionary attack
From http://abstrusegoose.com/296

English, Humor, Security


Facebook OGP for SVSW



This movie requires Flash Player 9
August 24th, 2010

I went to the Silicon Valley Semantic Web Meetup last night about Facebook's Open Graph Protocol. The presentation by Austin Haugen and Paul Tarjan was short and sweet and gave the best overview of OGP that I have seen. It included a live demo of using OGP to link a page into facebook - it only took a couple minutes, but made it very clear what was and wasn't going on.

On the one hand, I see why Jim Hendler has said that the more he sees of OGP, the better he likes it; these guys really 'get' the Semantic Web, they understand what it means to link a page in a web of data rather than just point to it, and they can demonstrate it very cleanly inside the facebook infrastructure. 

But on the other hand, when I asked the speakers how I could query the Open Graph, their answer was, for us linked data fans, a bit disappointing (though I applaud the speakers for their honesty).  Not only is there no way to query the graph, there won't be one any time soon.  One speaker went so far as to say, "we're sort of faking the semantic web here; there's no Virtuoso behind this." 

My hopes of using Facebook as a sort of clearinghouse of interesting RDF data for classes, demos, etc. were dashed right away.  

My question attracted a number of discussions afterward, many of which had a cynical edge; one fellow said to me that it is clear that facebook doesn't really understand or care about the Semantic Web; they just want a way to drive more traffic to their site, and that the techies just made it look like Semantic Web to jump on the buzzword bandwagon.  I guess it's nice that we're a buzzword bandwagon now, that someone like facebook wants to be part of.  The discussion also wandered to speculations about Google's intentions with our darling Metaweb, and what plans the not-evil giant has for her.

Be this as it may (and of course it is true to some extent; after all, facebook lives in a capitalist economy, so making money has to be a big part of what drives their decisions), that didn't stop me from putting a facebook "like" button on workingontologist.org .

On the other hand (how many hands do I have now?), one can see a motivation for facebook to include a query interface to the Open Graph Protocol - after all, they do want to encourage a cottage industry of app builders to add functionality to their site. And we know how successful RDF has been in doing that - just look at SearchMonkey...., oh wait.  Maybe not.

One of the things I found most informative about the talk came in the discussion in response to various questions about design decisions.  From the point of view of metatags, the Open Graph Protocol is really simple; just a handful of required tags with a simplified syntax (simpler even than standard RDFa).  Even so, facebook user studies showed that this was almost too complicated.  Even very small complications - additional namespaces, some slightly twisty syntax from RDFa - were found to have a severe damping effect on technology adoption.  It seems that even the levels of simplicity we argue for in our Semantic Universe blog entry on technology adoption are not enough; for some audiences, simple really has to be simple.  This is a tough pill for any technologist to swallow; looking at OGP makes it look as if the baby has been thrown out with the bathwater.  But there are now hundreds of millions of new 'like' buttons around the web; simplicity pays off.  As another commenter pointed out, regardless of the purity (or lack thereof) of the facebook approach, OGP has still made the biggest splash in terms of bringing semantic web to the attention of the public at large.  So who's the bandwagon, and who's riding?

Web/Tech


Researchers install PAC-MAN on Sequoia voting machine w/o breaking seals



This movie requires Flash Player 9
August 23rd, 2010

Here’s a new one for the DIY movement.

Security researchers J. Alex Haldeman and Ariel Feldman demonstrated PAC-MAC running on a Sequoia voting machine last week at the EVT/WOTE Workshop held at the USENIX Security conference in DC.

Amazingly, they were able to install the game on a Sequoia AVC Edge touch-screen DRE (direct-recording electronic) voting machine without breaking the original tamper-evident seals.

Here’s how they describe what they did on Haldeman’s web site:

What is the Sequoia AVC Edge?

It’s a touch-screen DRE (direct-recording electronic) voting machine. Like all DREs, it stores votes in a computer memory. In 2008, the AVC Edge was used in 161 jurisdictions with almost 9 million registered voters, including large parts of Louisiana, Missouri, Nevada, and Virginia, according to Verified Voting.

What’s inside the AVC Edge?

It has a 486 SLE processor and 32 MB of RAM—similar specs to a 20-year-old PC. The election software is stored on an internal CompactFlash memory card. Modifying it is as simple as removing the card and inserting it into a PC.

Wouldn’t seals expose any tampering?

We received the machine with the original tamper-evident seals intact. The software can be replaced without breaking any of these seals, simply by removing screws and opening the case.

How did you reprogram the machine?

The original election software used the psOS+ embedded operating system. We reformatted the memory card to boot DOS instead. (Update: Yes, it can also run Linux.) Challenges included remembering how to write a config.sys file and getting software to run without logical block addressing or a math coprocessor. The entire process took three afternoons.”

You can find out more from the presentation slides from the EVT workshop, Practical AVC-Edge CompactFlash Modifications can Amuse Nerds. They sum up their study with the following conclusion.

“In conclusion, we feel our work represents the future of DREs. Now that we know how bad their security is, thousands of DREs will be decommissioned and sold by states over the next several years. Filling our landfills with these machines would be a terrible waste. Fortunately, they can be recycled as arcade machines, providing countless hours of amusement in the basements of the nations’ nerds.”

English, Games, Security, Technology Impact, electronic voting, social media, voting


Rasqal RDF Query Library 0.9.20



This movie requires Flash Player 9
August 22nd, 2010

I just released a new version of my Rasqal RDF Query Library for two main new features:

  1. Support more of the new W3C SPARQL working drafts of 1 June 2010 for SPARQL 1.1 Query and SPARQL 1.1 Update.
  2. Support building with Raptor V2 API as well as Raptor V1 API..

The main change is to start to add to Rasqal’s APIs and query engine changes for the new SPARQL 1.1 working drafts. This release adds support the syntax for all the changes for Query and Update. The new draft syntax is available via the ‘laqrs’ query language name, until the SPARQL 1.1 syntax is finalized. The ‘sparql’ query language provides SPARQL 1.0 support.

On Query 1.1, the addition is primarily syntax and API support for the new syntax. There is expression execution for the new functions IF(), URI(), STRLANG(), STRDT(), BNODE(), IN() and NOT IN() which are noew usable as part of the normal expression grammar. The existing aggregate function support was extended to add the new SAMPLE() and GROUP_CONCAT() but remains syntax-only. Finally the new GROUP BY with HAVING conditions were added to the syntax and had consequent API updates but no query engine execution of them.

For Update 1.1 the full set of update operations syntax were added and they create API structures. Note, however there seem to be some ambiguities in the draft syntax especially around multiple optional tokens in a row near WITH which are particularly hard to implement in flex and bison (aka “lex and yacc”).

The main non-SPARQL 1.1 related change is to allow building Rasqal with Raptor V2 APIs rather than V1. Raptor V2 is in beta so this is not a final API and is thus not the default build, it has to be enabled with --enable-raptor2 with configure. When raptor V2 is stable (2.0.0), Rasqal will require it.

The changes to Rasqal in this release, in summary, are:

  • Updated to handle more of the new syntax defined by the SPARQL 1.1 Query and SPARQL 1.1 Update W3C working drafts of 1 June 2010
  • Added execution support for new SPARQL 1.1 query built-in expressions IF(), URI(), STRLANG(), STRDT(), BNODE(), IN() and NOT IN().
  • Added an ‘html’ query result table format from patch by Nicholas J Humfrey
  • Added API support for group by HAVING expressions.
  • Added XSD Date comparison support.
  • Support building with Raptor V2 API if configured with --with-raptor2.
  • Many other bug fixes and improvements were made.
  • Fixed Issues: #0000352, #0000353, #0000354, #0000360, #0000374, #0000377 and #0000378

See the Rasqal 0.9.20 Release Notes for the full details of the changes.

Get it at http://download.librdf.org/source/rasqal-0.9.20.tar.gz.

PS The source code control has also moved to GIT and hosted at GitHub.

English, comment


Google unemployment index estimates and predicts unemployment



This movie requires Flash Player 9
August 20th, 2010

The Google Unemployment Index is an economic indicator based on queries sent to Google’s search engine related to unemployment, social security, welfare, and unemployment benefits. Since some of these search terms are probably leading indicators, it can also be used to predict upcoming changes in the actual unemployment rate.


The index is based on queries tracked via Google Insights for Search that are tuned to different countries and you can also focus on particular regions or metropolitan areas and compare the index in several locations. Here’s an example comparing Florida (blue) and Maryland (red).

English, Google, social media


Smart Grid: the collision of energy and information



This movie requires Flash Player 9
August 19th, 2010

The Maryland Clean Energy Technology Incubator (CETI) at bwtech@UMBC will host a seminar series this Fall with focus on the Smart Grid. The series will discuss the issues and opportunities and speculate on expected business opportunities in this major restructuring of the electric grid. Huge investments (tens of billions of dollars) are committed to the Smart Grid for the coming decade.

About six seminars are planned for Fall 2010 to be held (mostly) on Wednesdays from 4-6pm and UMBC faculty, staff and students are encouraged to participate. They will include a ~45 minute presentation followed by a lively discussion and opportunity to socialize and enjoy light refreshments.

The first speaker, Peter Kelly-Detwiler leads a group at Constellation Energy that is developing new methods for data analysis and presentation. He is an “entrepreneur” within Constellation with 20 years of experience in the energy field and he has a perspective on the Smart Grid like few others.

A smart grid perspective: finding value in
the collision of energy and information

Peter Kelly-Detwiler, Constellation Energy

4-6pm Wednesday, 8 September 2010
2nd floor Courtyard Conference Room
UMBC Tech Center

Many people have heard of the term “smart grid” and there are many varying interpretations of what it means. But everybody can agree on three things:

  • It involves increased and timely access to information
  • There’s money in it
  • It will create new and unforeseen technologies and entrepreneurial opportunities

The discussion will center around why smart grid is needed, how an energy provider views the challenges and opportunities, the forces we see gathering on the horizon, and how Constellation Energy is responding. Issues related to power grid economics, volatility, risk management, and customer elasticities and perspectives will be addressed.

Peter Kelly-Detwiler is Senior Vice President of Energy Technology Services for Constellation NewEnergy, Inc., a subsidiary of Constellation Energy Group. He and his company-wide team oversee the integration of efficiency technologies and applications that help customers better manage their total energy bills and create optimal energy solutions. Peter has 20 years of experience in the energy industry. His accomplishments include managing the development of energy efficiency projects and reviewing economic impact of energy products.

Please RSVP to Bjorn Frogner (bjorn.frogner@umbc.edu), the CETI Entrepreneur in Residence, if you plan to attend.

English, Machine Learning, UMBC, electrical power, smart grid


Probability-based processor might speed AI applications



This movie requires Flash Player 9
August 18th, 2010

Lyric Semiconductor LEC chipAnalog computers were a hot idea — in the 1950s! But I find this intriguing because I’ve come around to the position that a lot of our human “intelligence” is the result of acquiring and using probabilistic models. So supporting this in hardware might be a big win, especially for low-cost, low-power devices. It will also support lots of other common tasks in social computing, image processing and language technology.

Technology review has a short article, A New Kind of Microchip, on computer chip being developed by Lyric Semiconductor that process signals representing probabilities rather than digital bits.

“A computer chip that performs calculations using probabilities, instead of binary logic, could accelerate everything from online banking systems to the flash memory in smart phones and other gadgets. … And because that kind of math is at the core of many products, there are many potential applications. “To take one example, Amazon’s recommendations to you are based on probability,” says Vigoda. “Any time you buy [from] them, the fraud check on your credit card is also probability [based], and when they e-mail your confirmation, it passes through a spam filter that also uses probability.”

All those examples involve comparing different data to find the most likely fit. Implementing the math needed to do this is simpler with a chip that works with probabilities, says Vigoda, allowing smaller chips to do the same job at a faster rate. A processor that dramatically speeds up such probability-based calculations could find all kinds of uses.”

Lyric’s chip is called LEC and was developed with support from DARPA. It is 30 times smaller in size than current digital error correction technology according to Wired. Although small it yields “a Pentium’s worth of computation,” according to Lyric CEO Vigoda. His 2003 dissertation at MIT was on a related topic, Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Processing.

You can also read about the LEC chip in a story in yesterday’s NYT, A Chip That Digests Data and Calculates the Odds.

English, General, Semantic Web, social media


UMBC ranked #4 in IT degrees among US research universities



This movie requires Flash Player 9
August 18th, 2010

For the past twenty years, UMBC has had a large number of student majoring in information technology. Our Computer Science and Information Systems programs are among the largest on campus and newer ones like Computer Engineering and Bioinformatics are growing.

Last week I had a chance to look at the latest information from the Department of Education’s National Center for Education Statistics, which is available from NSF’s WebCASPAR site. Data from the IPEDS Completions Survey shows that UMBC is fourth among U.S. research universities in the production of IT degrees and certificates.

In this analysis, I averaged the numbers from the two most recent years available — 2007 and 2008. Here are the top ten in terms of total production in the Carnegie classification categories RU/VH and RU/H.

average yearly production in 2007 and 2008
TOTAL
INSTITUTION
BS/A
MS
PHD
OTHER
552
Penn State
480
20
14
39
520
University of Southern California
65
414
41
0
513
CMU
124
331
58
0
503
UMBC
327
112
14
50
493
Johns Hopkins University
44
426
14
10
461
New Jersey Institute Technology
165
279
11
7
377
Georgia Tech
176
172
30
0
331
Drexel
253
72
1
5
329
MIT
160
129
21
20
324
University of California-Irvine
226
58
40
0

In this group, UMBC also ranks #2, #21 and #31 for undergraduate, MS and PhD degree production, respectively. Here’s a graph of the top 50 — click through for a larger version.


Top 50 producers of IT degrees among US research universities

Looking at all schools shows the University of Phoenix generates the most IT grads, with an average of 3318 students over 2007 and 2008! Here are the top 15 schools of any type.

average yearly production in 2007 and 2008
TOTAL
INSTITUTION
3318
University of Phoenix
1162
Community College of the Air Force
1087
University of Maryland University College
931
Strayer College
911
ECPI College of Technology
711
De Paul University
552
Penn State
528
Rochester Institute of Technology
520
University of Southern California
514
DeVry Institute of Tech
513
CMU
503
UMBC
493
Johns Hopkins University
461
New Jersey Institute Technology
430
Baker College of Flint

CS, English, UMBC


Public W3C Questionnaire on RDF Evolution



This movie requires Flash Player 9
August 17th, 2010

As has been reported earlier, W3C held an "RDF Next Steps" workshop in June 2010 and has published the Report of the Workshop in early July. That workshop discussed the possibility of an RDF Working Group. The overall goal would be to extend RDF to include some of the features that the community has identified as both desirable and important for interoperability based on experience with the 2004 version of the standard, but without having a negative effect on existing deployment efforts.

The Workshop has listed a number of work items that might be of interest for such a Working Group, and has also conducted an informal poll as for the relative priority of those items (with links to the detailed description of the items themselves). As a next step, a public questionnaire has been created listing, essentially, those items (although some of them have been regrouped for a better readability). The goal of the questionnaire is to poll the Web community at large so that the upcoming charter would reflect the real needs for the years to come.

So… if you are interested in the evolution of RDF, here is the chance to make your opinion heard. All the results of the questionnaire will be public. The questionnaire will stay open until the 13th of September.

Activity news, English


Raptor RDF Syntax Library V2 beta 1



This movie requires Flash Player 9
August 16th, 2010

Today I released the first beta version of Raptor 2. This is the culmination of about 9 months work refactoring the Raptor 1 codebase. In hindsight, I should have done this years ago, but I knew it would be a lot of work, and it was.

The reasoning behind doing this is multi-fold, but basically the code had a lot of cruft and bad design choices that couldn’t be removed without breaking the APIs in lots of ways, and at some point it’s easier to just do it all at once, and that’s where we are now.

Cruft meant removing stuff deprecated for a long time but also renaming all the functions to follow the same “objects in C” style used throughout Redland’s libraries which has standard naming forms:

  • raptor_class_method()
  • Constructors: raptor_new_class() (core constructor or 1 arg constructor) and raptor_new_class_from_extras()
  • Copy constructor: raptor_class_copy()
  • Destructors: raptor_free_class()

The major addition was a raptor_world object that is used as a single object to hold on to all shared resources and configuration. This was a design pattern I put in librdf and Rasqal but for some reason, never considered it for raptor. This turned out to be a mistake since I had to then pass around a lot of parameters and configuration to individual object instances, more than was really needed. Examples of this include the error handling which added two parameters to several constructors. The error handling, now expanded to a general log mechanism after librdf’s handles multiple structured log record types and the logging policy is once-per-world.

The addition of the world object meant that each constructor for an object in raptor now takes that object, so it can get access to the shared configuration and resources. That itself meant the change was extensive, broad in scope. The single place to manage resources means it’s easier to ensure proper cleanup and deal with library-wide issues.

One other pain point was Raptor’s simplistic (but functional!) URI class. It manipulated URIs as plain old C strings (char*). I knew from building librdf, that this could be more efficient by interning the strings so a URI for a particular string is held only once, and reference counted. I used the already built raptor AVL-Tree to implement it, and as a bonus, moved that AVL Tree to the public API, so it can be reused (Rasqal has a copy of the implementation). The resulting reference-counted URIs mean that after URI construction, comparison and copying are very cheap – and given that this is RDF, those are done a lot. The old URI code also had a swappable implementation which added a lot of complexity to the code and that has gone now, since the new implementation is more sophisticated. There is probably more work that can be done here to make this URI work better, such as caching the URI structure so that it’s quicker to generate relative URIs. Also one day I should really validate that all the URIs built are legal to the syntax.

Another long term problem was the triple itself, which I had called ‘statement’ way back when I was creating it. Unfortunately a raptor_statement had hard-coded the RDF specifics – the subject can only be URI or blank node, predicate can only be a URI etc. That meant the code was twisty. That has been replaced by an array of 3 or 4 raptor terms (URI or blank node or literal) so it can handle both triples, quads and any possible extension beyond RDF (2004), although today none of the current parsers or serializers expect non-RDF statements. That change also made a lot of the internal code simpler to understand and quicker. The RDF terms were also introduced in a reference count manner, along with adding reference counting to the statements, it meant that passing triples around which used to involve a lot of copying, is now a simple integer increment of the reference. More speed!

That sorted out the fundamentals of statements, terms and URIs and changed pretty much every piece of code that touched them in all the parsers and serializers and core code.

There were a few pieces of new work added – two new serializers and one new parser. Two of those were written by Nicholas J Humfrey who is now a core committer.

I’d also like to call out thanks to Lauri Aalto for keeping raptor, rasqal and librdf relatively buildable while I was refactoring and breaking things. He wrote the code to make Rasqal and librdf build and work with raptor V1 and V2 at the same time.

Other work included updating all the reference documentation, tutorials, examples and sundry documentation for the new APIs including admin code to automate some of the documentation so it always included accurate details about formats.

There is lots more that changed in detail, listed in the Raptor 1.9.0 Release Notes, help on upgrading and there’s even a perl script docs/upgrade-script.pl thrown in (generated by another perl script!) that may help with applying the changes. The reference manual contains a full reference on changes between raptor 1.4.21 and 1.9.0 in the form of old / new mappings with explanations.

I know that Raptor 2 is not going to place Raptor 1 for applications for some time, so this is a separately installed library with a new location for the header file and a new shared library base. However, once this hits 2.0.0 it’ll be a dependency of Rasqal and librdf.

Summary of release:

  • Removed all deprecated functions and typedefs.
  • Renamed all functions to the standard raptor_class_method() form.
  • All constructors take a raptor_world argument.
  • URIs are interned and there is no longer a swappable implementation.
  • Statement is now an array of 3-4 RDF Terms to support triples and quads.
  • World object owns logging, blank node ID generation and describing syntaxes.
  • Features are now called options and have typed values.
  • GRDDL parser now saves and restores shared libxslt state.
  • Added serializers for HTML ‘html’ and N-Quads ‘nquads’.
  • Added parser ‘json’ for JSON-Resource centric and JSON-Triples.
  • Switched to GIT version control hosted by GitHub.
  • Added memory-based AVL-Tree to the public API.
  • Fixed reported issues:

    0000357, 0000361, 0000369, 0000370, 0000373 and 0000379

It turns out that after all that, the resulting libraries for raptor 2 are actually 4% smaller than raptor 1 when installed (Debian, i386):

 -rw-r--r-- 1 root root 379780 Mar 10 06:59 /usr/lib/libraptor.so
 -rw-r--r-- 1 root root 364448 Aug 16 17:30 /usr/lib/libraptor2.so

The gzipped tarball itself is as small as raptor 1.4.17 from 2008!

Get it at http://download.librdf.org/source/raptor2-1.9.0.tar.gz

PS The source code control has also moved to GIT and hosted at GitHub.

English, comment, raptor


Pellet 2.2.1 Release



This movie requires Flash Player 9
August 16th, 2010
We are happy to announce the release of Pellet 2.2.1 which is available for download at the usual location. This is a maintenance release that fixes several issues found in Pellet 2.2.0. Complete set of tickets closed for this release are listed at the Trac page. We’ve also updated the Pellet Reasoner Plug-in for [...]

English, Pellet, Pellet 2


What interests 250+ librarians at 8:30 on a Sunday morning



This movie requires Flash Player 9
August 16th, 2010

IMG_0165 Linked Data, that’s what! 

I must admit I was a little skeptical of the timing when I accepted the invitation to provide the keynote for a Linked Data session – on the last day of IFLA 2010 – at 8:30 in the morning – in August – on a Sunday.  Who was going to want to get up at that time, on the day they were probably going to leave beautiful Gothenburg, to hear me witter on about the Semantic Web and the obvious benefits of Linked Data for libraries? A few minutes before the start, I was beginning to think my skepticism was well founded, viewing the acres of empty seats laid out in their menacing ranks in front of me. But then almost as if from nowhere, the room rapidly filled and by the time I took the stage we had something approaching a full house.  As you can see from my iPhone snap below, we ended up with a significant group (I lost count at about 250) of interested librarians.

250+ Librarians in Gothenburg

So was it worth them turning up at such an unsociable time?  I obviously can’t speak for my session, but I believe it was well worth turning up.  We had a series talks which varied from the in-depth technical/ontological spectrum to the rousing plea to open up your data now – and don’t hamper it with too much licensing.

First on after my session was Gordon Dunsire from the University of Strathclyde who gave us some in depth reasoning as to why we needed complex detailed ontologies based upon standards like RDA, FRBR, FRAD, and RDA to describe library resources in RDF for the Semantic Web.   To represent the full detail that catalogers have, and want to, provide for resource description I agree with him.  I also believe that we need to temper that detailed view by including more generic ontologies in addition. People from outside of the library world, dipping into library data [with more ways to describe a title than there are flavors of ice cream], will back off and not link to it unless the can find a nice friendly dc:title or foaf:name that they understand.

Some of the other speakers that I caught included Patrick Danowski’s entertaining presentation entitled “Step 1: Blow up the silo!. He took us through the possible licenses to use for sharing data, only to conclude that the best approach was totally open public domain.  He then went on to recommend CC0 and/or PDDL as the best way to indicate that your data is open for anyone to do anything with.

Jan Hanneman from the German National Library delivered an interesting description [pdf]of the way they have been publishing their authority data as Linked Data, and the challenges they met on the way.  These included legal and licensing issues, around what and under what terms they could publish.  Scalability of their service, being another key issue once they move beyond authority data.

All in all it was an excellent Sunday morning in Gothenburg.  I presume the organizers of IFLA 2011 will take note of the interest and build a larger, more convenient, slot in the programme for Linked Data.

Note: My presentation slides can be viewed on Slideshare and downloaded in pdf form

#linkeddata, English, Libraries, Linked Data, Metadata, Open Data, RDF, Semantic Web, ifla2010


Usability determines password policy



This movie requires Flash Player 9
August 16th, 2010

Some online sites let you use any old five-character string as your password for as long as you like. Others force you to pick a new password every six months and it has to match a complicated set of requirements — at least eight characters, mixed case, containing digits, letters, punctuation and at least one umlaut. Also, it better not contain any substrings that are legal Scrabble words or match any past password you’ve used since the Bush 41 administration.

A recent paper by two researchers from Microsoft concludes that an organization’s usability requirements is the main factor that determines the complexity of its password policy.

Dinei Florencio and Cormac Herley, Where Do Security Policies Come From?, Symposium on Usable Privacy and Security (SOUPS), 14–16 July 2010, Redmond.

We examine the password policies of 75 different websites. Our goal is understand the enormous diversity of requirements: some will accept simple six-character passwords, while others impose rules of great complexity on their users. We compare different features of the sites to find which characteristics are correlated with stronger policies. Our results are surprising: greater security demands do not appear to be a factor. The size of the site, the number of users, the value of the assets protected and the frequency of attacks show no correlation with strength. In fact we find the reverse: some of the largest, most attacked sites with greatest assets allow relatively weak passwords. Instead, we find that those sites that accept advertising, purchase sponsored links and where the user has a choice show strong inverse correlation with strength.

We conclude that the sites with the most restrictive password policies do not have greater security concerns, they are simply better insulated from the consequences of poor usability. Online retailers and sites that sell advertising must compete vigorously for users and traffic. In contrast to government and university sites, poor usability is a luxury they cannot afford. This in turn suggests that much of the extra strength demanded by the more restrictive policies is superfluous: it causes considerable inconvenience for negligible security improvement.

h/t Bruce Schneier

English, Policy, Security, privacy, social media


An ontology of social media data for better privacy policies



This movie requires Flash Player 9
August 15th, 2010

Privacy continues to be an important topic surrounding social media systems. A big part of the problem is that virtually all of us have a difficult time thinking about what information about us is exposed and to whom and for how long. As UMBC colleague Zeynep Tufekci points out, our intuitions in such matters come from experiences in the physical world, a place whose physics differs considerably from the cyber world.

Bruce Schneier offered a taxonomy of social networking data in a short article in the July/August issue of the IEEE Security & Privacy. A version of the article, A Taxonomy of Social Networking Data, is available on his site.

“Below is my taxonomy of social networking data, which I first presented at the Internet Governance Forum meeting last November, and again — revised — at an OECD workshop on the role of Internet intermediaries in June.

  • Service data is the data you give to a social networking site in order to use it. Such data might include your legal name, your age, and your credit-card number.
  • Disclosed data is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
  • Entrusted data is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data once you post it — another user does.
  • Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it’s basically the same stuff as disclosed data, but the difference is that you don’t have control over it, and you didn’t create it in the first place.
  • Behavioral data is data the site collects about your habits by recording what you do and who you do it with. It might include games you play, topics you write about, news articles you access (and what that says about your political leanings), and so on.
  • Derived data is data about you that is derived from all the other data. For example, if 80 percent of your friends self-identify as gay, you’re likely gay yourself.”

I think most of us understand the first two categories and can easily choose or specify a privacy policy to control access to information in them. The rest however, are more difficult to think about and can lead to a lot of confusion when people are setting up their privacy preferences.

As an example, I saw some nice work at the 2010 IEEE International Symposium on Policies for Distributed Systems and Networks on “Collaborative Privacy Policy Authoring in a Social Networking Context” by Ryan Wishart et al. from Imperial college that addressed the problem of incidental data in Facebook. For example, if I post a picture and tag others in it, each of the tagged people can contribute additional policy constraints that can narrow access to it.

Lorrie Cranor gave an invited talk at the workshop on Building a Better Privacy Policy and made the point that even P3P privacy policies are difficult for people to comprehend.

Having a simple ontology for social media data could help us move forward toward better privacy controls for online social media systems. I like Schneier’s broad categories and wonder what a more complete treatment defined using Semantic Web languages might be like.

English, Policy, Security, Semantic Web, privacy, social media


Papers with more references are cited more often



This movie requires Flash Player 9
August 14th, 2010

The number of citations a paper receives is generally thought to be a good and relatively objective measure of its significance and impact.

Researchers naturally are interested in knowing how to attract more citations to their papers. Publishing the results of good work helps of course, but everyone knows there are many other factors. Nature news reports on research by Gregory Webster that analyzed the 53,894 articles and review articles published in Science between 1901 and 2000.

The advice the study supports is “cite and you shall be cited”.

A long reference list at the end of a research paper may be the key to ensuring that it is well cited, according to an analysis of 100 years’ worth of papers published in the journal Science.
     The research suggests that scientists who reference the work of their peers are more likely to find their own work referenced in turn, and the effect is on the rise, with a single extra reference in an article now producing, on average, a whole additional citation for the referencing paper.
     ’There is a ridiculously strong relationship between the number of citations a paper receives and its number of references,” Gregory Webster, the psychologist at the University of Florida in Gainesville who conducted the research, told Nature. “If you want to get more cited, the answer could be to cite more people.’

A plot of the number of references listed in each article against the number of citations it eventually received reveal that almost half of the variation in citation rates among the Science papers can be attributed to the number of references that they include. And — contrary to what people might predict — the relationship is not driven by review articles, which could be expected, on average, to be heavier on references and to garner more citations than standard papers.

English, Semantic Web, social media


Researchers prove Rubics Cube solvable in 20 moves or less



This movie requires Flash Player 9
August 13th, 2010

Using a combination of mathematical tricks, good programming and 35 CPU-years on Google’s servers, a group of researchers have proved that every position of Rubik’s Cube can be solved in 20 moves or less. The group consists of Kent State mathematician Morley Davidson, Google engineer John Dethridge, math teacher Herbert Kociemba, and programmer Tomas Rokicki.

This is an amazing result and a testament to more than 30 years of work on the problem. The Cube was invented in 1974 and almost immediately the subject for programs to solve it. In 1981, Morwen Thistlethwaite proved that any configuration could be solved in no more than 52 moves. Periodically, tighter upper bounds for the maximum solution length were found. This result ends the quest — there are some configurations (about 300M) that require 20 moves to solve and there are none that require more than 20 moves.

In their own words, here’s how the group solved all 43,252,003,274,489,856,000 Cube positions:

  • We partitioned the positions into 2,217,093,120 sets of 19,508,428,800 positions each.
  • We reduced the count of sets we needed to solve to 55,882,296 using symmetry and set covering.
  • We did not find optimal solutions to each position, but instead only solutions of length 20 or less.
  • We wrote a program that solved a single set in about 20 seconds.
  • We used about 35 CPU years to find solutions to all of the positions in each of the 55,882,296 sets.

This reminds me of the first program I wrote for my own enjoyment, which used brute force to find all solutions to Piet Hein’s Soma Cube. In 1969 I had a summer job as the night operator for an IBM 360 and I would turn off the clock to run my program so that the management wouldn’t know how much computer time I was consuming.

See this BBC story more more information on this amazing result.

AI, English, Games, General, Google, social media


Swoogle has five faces



This movie requires Flash Player 9
August 13th, 2010

Seen on the Web: “Swoogle is an alien from outer space send out to spy on the modnation circuit. He got five faces so he can watch them from all angles without turning his head. However only his front shows many emotions. His right face is always angry, his left face is always in awe for some reason.”

English, Semantic Web, Swoogle


Nativos, Visionarios digitales



This movie requires Flash Player 9
August 13th, 2010

Además de nativos son, a la luz de un interesante y fresco estudio, lo que podríamos llamar, siguiendo con los neologismos, “visionarios digitales”.

Lo firman entre RWW y Latitude  y constituye un ejercicio de innovación abierta basado en preguntar a 126 niños de 6 a 12 años lo que desearían para la web del futuro. Este es el ejercicio concreto, en el formato más adecuado para los sueños a esa edad:

“¿Qué te parecería realmente interesante o divertido para poder hacer con tu computadora o internet en un futuro? Dibuja lo que imagines.”

El vídeo resume los hallazgos, que resumimos, traducimos y comentamos a continuación:

 

Niños e investigadores como visionarios digitales:

Resulta curioso cómo los resultados apuntan a la realidad postdigital, de fusión entre web y no web, siendo los deseos de los niños sorprendentemente parecidos a lo que se desarrolla en este sentido, en la actualidad, en los laboratorios de investigación de las Universidades más avanzadas.  El grupo de interfaces fluidas del MIT, por ejemplo, está trabajando en una “impresora de comida” imaginada por los niños del estudio (“Gastronomía digital”). También los desarrollos en tecnología SixthSense y similares son regalos de reyes reportados durante el estudio por los niños.

3D, interfaces táctiles, gestuales e incluso una apuesta por la web semántica :) , con un niño de 8 años manifestando que imagina un futuro en el que “Ayudemos a las computadoras a saber qué estamos pensando para que hagan más cosas por  nosotros, pudiendo ser controladas por voz y de forma táctil”.

 

“Conciencia” digital

Parece que son, así, bastante conscientes de la realidad, con sólo el 4% de los deseos de los niños siendo demandas imposibles en el estado de desarrollo actual de la computación (viajes en el tiempo, teletransportación, etc…).

Las jóvenes generaciones esperan tener interacciones más intuitivas con la tecnología, no sólo con el iPad. Lo que imaginan es la interacción con todo tipo de objetos del mundo real. Van mucho más allá de los smartphones y la ubicuidad, hasta muchas de las llamadas Tendencias 3.0, como la internet de las cosas y en cualquier lugar.

 

Como podéis ver en el cuadro triunfan los juegos. Pero no quedan muy atrás la creación, el diseño: el 31% de las ideas propuestas se refieren a herramientas para crear cosas (webs, juegos, vídeos para compartir, objetos físicos, etc…). Muchos de los participantes, de hecho, manifiestan preferencias por el diseño, claras aficiones por la expresión creativa, que consideran que puede encontrar un buen aliado en la tecnología.

Y el dato resulta coherente con lo que siempre manifiesto, los padres suelen ignorar esas preferencias, subestimando los usos creativos de internet y el aprovechamiento de la red en sentido positivo:  Cuando les encuestamos sobre las actividades favoritas de sus hijos sólo un 7% de los padres piensa en creación o diseño, mientras el 70% cree que sus hijos pasan la mayor parte del tiempo online jugando. La realidad, en cambio, nos dice que los porcentajes de niños que juegan y los que revelan aficiones creativas y artísticas son similares:

Un mundo social más amplio y accesible

Me ha interesado especialmente esta parte del estudio, que reafirma la importancia de educar la participación más allá de lo lúdico. Las redes sociales son especialmente populares entre los 10 y los 12 años y lo que expresan me parece interesantísimo: lo que los niños desean es una Sociedad conectada, campar a sus anchas en una esfera social expandida más allá de familia y amigos, en entornos de chat inmersivos que les permitan contactar con personas en los lugares más remotos del planeta.

La conectividad contínua a personas e información a través de la web (esa ventana permanentemente abierta al conocimiento de la que siempre hablamos) es natural ya para ellos y parece hacerles sentir más capaces e independientes, ante una vida con más oportunidades, dicen los investigadores. Apuestan por el autoaprendizaje (termino este post en compañía de los twitts de José de la Peña – @sandopen y la #noche_Asimov y acabo de ver uno en esta línea “La educación autodidacta es, creo firmemente, el único tipo de educación que existe”). Pueden ahora consultar cualquier información en Wikipedia (o el resto de la web), aprender a través de juegos, interacciones sofisticadas e incluso interactuar en un video chat internacional de forma gratuita. Y quieren, además, hacer ese tipo de cosas.

Tienen ganas de experimentar el wide world,  grande, diverso, con muchas posibilidades. Y de expresarse en él desde un optimismo digital,una autoconfianza que me parecen armas muy poderosas.


“Quiero hablar y ver a cualquier persona en el mundo, usando cualquier lenguaje”

¿Te ha gustado? Compártelo



  • del.icio.us
  • Twitter
  • Facebook
  • Meneame
  • StumbleUpon
  • Wikio
  • Bitacoras.com
  • Diigo
  • FriendFeed
  • Ping.fm
  • PDF
  • Print
  • email
  • LaTafanera
  • RSS

2010, Aprendizaje, Evolución, Planeta educativo, Psicologia, Sociedad de la conversacion, Spanish, TRABAJOS DESTACADOS, Vídeos, Web Semántica, Web3D, cibercultura, cognitivismo, comunidades, conectivismo, cultura 2.0, curiosidades, derechos humanos, diversidad, e-learning2.0, educacion, educación 2.0, innovación, inteligencia colectiva, sociología, video-documentales, videos-creacion, web3.0, zeitgeist evolución


Libraries and the Big Society



This movie requires Flash Player 9
August 12th, 2010

This is a preview of an article written for the upcoming issue of Panlibus Magazine.

Libraries and the Big Society

Towards the end of July in Liverpool, the UK’s new Prime Minister, David bigsocietyCameron, finally set out his plans for his vision of the Big Society. The initiative is based on the devolution of power from Westminster to local communities, and the empowerment of citizens to run local public services (including, of course, public libraries).

Cutting costs or empowering communities

In announcing the Big Society, David Cameron focused on the pivotal role of people and communities saying it is a “big advance for people power” and that his “great passion is building the Big Society”. Much of the discourse around the Big Society is underpinned by the need for the government to reduce the budget deficit. The Opposition has certainly made this point, and certain individuals, notably Tessa Jowell, are also enraged about the lack of originality.

What does this mean to our public library service?

Public libraries are going through tough times, and this announcement raises broad questions about the public library mission. The DCMS review earlier in the year attempted to answer these questions and set out a framework, but the new government has tried to distance itself from this review, only acknowledging the relevance of a small number of points (library membership from birth, free internet access, co-location). Ed Vaizey, Culture Minister, has scrapped his election promise of setting up a Library Development Agency, and has instead set up a “support programme”. This will go ahead in conjunction with the Museums, Libraries and Archives Council (MLA) and the Local Government Association (LGA), despite the DCMS announcement that the MLA is to be abolished within two years. So, this announcement only raises more questions about how important the government really feels libraries are.

Library advocate Tim Coates has set out some basic points as to what libraries should do to address these problems, including increasing opening hours, improving stock and making sure the library meets local needs. Meeting local needs is something that perhaps the Big Society will address.

So, will the Big Society benefit libraries?

It has been widely reported that the Big Society is essentially about having volunteers running public services. The main benefit of this will be reduced costs. According to the government, freeing up budget and cutting costs will allow libraries “to focus efforts on frontline, essential services and ensure greater value for money”.

The use of volunteers in libraries throws the professional status of librarianship in doubt. Many professional librarians argue that considerable formal education is needed and that this new initiative implies that someone with no experience can come in and effectively do the same job. Since these volunteers will be unpaid, how can we ensure reliability and quality? Who actually has time to volunteer for a day a week at their local library? We may end up with libraries opening less and not meeting local needs, all of which contradict the Big Society.

Volunteers in libraries are not uncommon though. The Summer Reading Challenge utilises many volunteers, this year more than ever before, and libraries find them useful. Young adults who have volunteered, after participating in the Challenge when they were younger, can encourage reader development in the current wave of children taking part.

Even before the Big Society was announced, many libraries across the country were closing, or were under review. The Big Society proposals set out plans for local citizens who oppose the closures to get involved, work in libraries and fill the gaps in local authority budgets. Perhaps a mixed economy of reliable, professional librarians and enthusiastic volunteers will reinvigorate public libraries. Let’s hope so.

English, Libraries, Public


Cine, Realidad, Emoción aumentada para la era postdigital.



This movie requires Flash Player 9
August 11th, 2010

Quería confrontar, conectar, presentar hoy dos tipos de Realidad aumentada, local y emocional, que creo que están cambiando realmente el mundo.

La primera, en una reciente película de Keiichi Matsuda presentada en el London’s 3D Film Festival imagina un futuro interactivo, con capas de información digital que cambian, se modifican según nuestros deseos. La arquitectura de nuestras ciudades deja de ser exclusivamente  física para completarse con espacios sintéticos que (socialmente, añado), vamos construyendo.

 

 

Creo que unificábamos ambos conceptos en una entrevista reciente:

Es hora de superar la dicotomía entre lo digital y lo real. Empieza a hablarse de postdigitalismo, de internet de las cosas, de web wide world en lugar de world wide web como forma de significar la importancia de lo humano en esta aparente voràgine tecnológica. Así, por ejemplo, recientes descubrimientos identifican cambios a nivel bioquímico, como la liberación de hormonas que nos hacen más generosos, más empáticos también como colectivo y que ocurren de forma indistinta ante situaciones de lo real o de lo virtual. Buen ejemplo de ello fue la generosidad que desarrollamos  desde la web después de los terremotos de Haití.

Cualquier experiencia de conexión con la alegría, con el sufrimiento globales, a través de twitter o cualquier otro medio de conexión, desarrolla, en mi opinión y tal y como se está experimentando en múltiples Universidades,  la solidaridad, la empatía en el ser humano, algo que creo que podemos empezar a denominar también, aumentando la importancia del nexo con lo tecnológico, “Emoción aumentada”.

Dicho de otro modo la tecnología estaría funcionando como cordón umbilical, como medio de conexión con una humanidad que en cierto modo nos alimenta.

La tecnología, introduciendo elementos explicativos del Conectivismo en todo ello, proporciona capas informativas como las que veíamos en el primer vídeo pero  también, a través del aumento de la conectividad social y la transparencia, experiencias de emoción aumentada.


Lo expresa, con algunos matices existencialistas, iluministas que no termino de compartir pero que resultan curiosamente afines a lo que presentaba hace poco en Madrid (Educación Nonstop en la web), el siguiente vídeo sobre la Civilización empática de Rifkin.

 

Ampliaba algunos de estos aspectos en la que considero  la mejor entrevista que me han realizado hasta la fecha. Fue después de la Thinking party en Madrid y la publicaban hace poco en el portal. Espero que la disfrutéis:

¿Te ha gustado? Compártelo



  • del.icio.us
  • Twitter
  • Facebook
  • Meneame
  • StumbleUpon
  • Wikio
  • Bitacoras.com
  • Diigo
  • FriendFeed
  • Ping.fm
  • PDF
  • Print
  • email
  • LaTafanera
  • RSS

2010, Evolución, Net-art, curiosidades en la red, Planeta educativo, Redes sociales, Sociedad de la conversacion, Spanish, TRABAJOS DESTACADOS, Videotutoriales, Vídeos, Web3D, cibercultura, colaboraciones, comunidades, conectivismo, cultura 2.0, curiosidades, dispositivos, educacion, filosofía, futurismo, innovación, inteligencia colectiva, medios, multimedia, prospectiva, social media, sociología, video-arte, video-documentales, videos-creacion, web3.0, zeitgeist evolución