Tim Berners-Lee on Linked Data

Workshop at MIT. Raw notes about raw data.

RDF Book Mashup:

If you are building a new website, you can pull a lot of this information. Suppose no-one's put the baseball stats out there, you could do that, and pull in geographic information to match up ballparks or player's hometowns.

Raw Data Now! I won't make you chant this.

If you actually go to someone and

It is gratifying because people are putting their data on the web, but it is amazing the reasons you. She's mine, you don't get to love her, you don't understand her, you might abuse her, you might not realize she's not perfect. In a bureaucracy people are worried they'll be blamed for it.

It's about collaboration-- putting data on the web is about working with people you don't know.

One of the joys of doing this web stuff is finding someone
Or bumping into someone who thanks you for saving their day by something you put on the web.

Once people understand that others are interested, others care, they get it. And people do, they blog, about anything. People interested in wandering in the same woods they wander in...

Raw doesn't mean raw in the statistical sense, but raw in that you haven't made a beautiful web site for it.

If you just put the data out there first, other people may make a better web site with it than you (government agency or whatever)

Linked Data in the life sciences. PubChem. Drog bank.

Discover new drugs to treat Alzheimer's: what proteins are involved in signal transduction...
Google gets 223,000 hits, no answers.

Google can try but it's a question no one has asked before.

(Of course now you'll get links to this talk, or notes on it.)

A [SPARQL query?] can give 32 results, 32 answers.

Social networks.

It turns out that it's not just peope into linked data like -- Cartoon by David Simonds in the economist of people trying to climb walls from and to facebook, myspace, bebo, linkedin, flickr, orkut...

Mark of a good site is realizing it's not the only site in the world. Make it pass the test of independent invention. If both of you had the exact same idea, and developed independently, could your sites work together.

If you have to be the center of the world, you will either succeed and own everything, or you will die.

But it's smart, and polite to your users, to acknowledge they have another life as well.

Open Street map-- like Google maps, except at the bottom instead of saying that if you use this map anywhere on the web, even if you trace over it, it is ours and your first-born son is too, it just says creative commons.

Going to the TED talk, i noticed the theater didn't actually have a name. Went to the edit tab, selected the theater, and added the name.

Within a few days, hours, or minutes, everyone has the name of the theater. In some places it is better than google in information, everywhere it is better than google in licensing.

And in principle you can SPARQL the map.

Cycle map has bike trails-- and also elevation, because cyclists really care about that!

Linked Data Standards.

British Prime Minister asked me How should the UK use the internet?

Put all the government's data on the web.


I'm not used to that response...

Build a company that is part of the ecosystem, that pulls data in and gives it out, that is most exciting.

RSS is just a data feed for headlines. There are many, many other sorts of data. Whatever your data is

Even if you are producing tiles, you will need a data feed, because that is how people will be buying things based on it.

"Raw Data Now" meme: Rufus Pollock.

Audience mostly mail but good female representation, mostly white with asian, Indian, and others with skin from that part of the world, probably making it over to the middle east. And one black man!

Most data is sitting in databases.

The URI for the concept of MIT student might be mit.

MIT has to worry about what a student means and what that URI is

You can request a URI at W3C. So people know that if MIT goes away, W3C will still be there...

Ten universities have sat down, defined undergrad, graduates, weird special cases, then figure out how the Europeans fit, then 6 months of meetings... and have a definition of graduate students that everyone accepts internationally. That is hard work. [But it is important and doable.]

You do not create an ontology in the sky that covers everything. Takes too long.

The URI: when you pull it, you get data about it. RDF about it. Basically you get some human-readable data that explains student is a class which is the domain of these functions: attends relates a student and a class.

We stopped calling them schema and are calling them ontologies, schema implies constraints on the syntax of a document, ontologies are about real things: it tells you that the student is, say, sub-class of a human beaing-- longer lasting data than


subject predicate object
this person attends this course

subject predicate object provenance

It's always four store.

We need to be able to write rules not just about is this true based on reliable source but--

if someone says they are vegetarian, believe them, but if they say someone else is vegetarian, well, maybe not.

Q: Rent-seeking behavior, taking something open and closing it so they can make money.

A: There are people in business school learning this stuff and you'll have to beware of them because they may mess up everyone.

They are governing a country or a county and its a place and they need to just make it available. So their requirement to break even by charging money for them somehow is difficult.

In general it is worth studying different models for keeping

In the U.S., strong convention that if it is produced from taxpayer money, it should be freely available.

But there are still many ways to make money by taking the data and making a great product with it.

Q: I talk to information architects about ontologies, they get excited-- but then when you tell them it's better to use other existing ones, they want to do it themselves. Does it become siloed in a new way?

A: It's much easier to make your own ontologies. The wonderful human urge to be created.

It's a reasonable thing to do first, if it really isn't out there. There's the "Not Invented Here" problem. "Oh, you haven't finished yours, so i can start from scratch." "Your use case is separate from mine."

Turns out the top ten terms will be the most used-- that's what unique about RDF - you can agree on the top ten and then go and make your own next 90 yourself, which might really be MIT specific-- both are valuable, your specific data, to you, and the standardized shareability of the in-common data.

Q: Assume wildly successful, who stands outside the flower garden, and people get involved again and decide on quality?

A: One of the big questions facing humankind since the beginning: how do we decide what is right? Scientific method. And then, what do we do? For that, we have democracy. In both systems, still working on it.

Track problems in the process-- do the people have the right motivations, etc. [But you can see.]

RDFa, microformats, embedding linked data inside HTML?

RDFa is great to start with-- with microformats, every time someone writes one i have to write some more python. Once you've written an engine that reads RDF, it will read any. My machine can slurp up the RDF or RDFa and pull metadata from the ontology even if it's never been seen before.

I think it's a time when RDFa is valuable. In general, people out there can't be expected to go to the trouble of marking things up, but a PHP script, can do that.


Juries still out on whether people generate the separate RDF availability with the script, or embed on the page. Both good.

Searched words: 
Giant Global Graph


