Friday, May 11, 2012

Schema.org markup for external lists


The world is too rich, complex and interesting for a single schema to describe fully on its own. With schema.org we aim to find a balance, by providing a core schema that covers lots of situations, alongside extension mechanisms for extra detail. There are many situations where the use of existing controlled vocabularies, standards and datasets would improve schema.org markup. This is the role of the schema.org "external enumerations" mechanism.

We introduce "external enumerations" with a simple example - countries - and encourage implementors to join the schema.org community in W3C's 'Web Schemas' group where the full details are being discussed.

Each schema.org type (such as Person, PostalAddress) is associated with a set of properties, such as
"nationality", "addressCountry". In turn, each property has one or more expected types; in this case, both the "nationality" of a Person, and the "addressCountry" of a PostalAddress expect to have a Country value. Rather than adding large lists of specific countries to schema.org, instead we encourage the use of external lists.  We will publish a set of well-known authority lists, linked to the types and properties they are used with. To get started, we take simple Wikipedia links as an example of such an authority. Other more specialist examples (such as IPTC codes) will follow.

Taking our existing Movie example in Microdata, let's add nationality details for one of the actors. To do this, we simply add a link:


<div itemscope itemtype="http://schema.org/Movie">
 <h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
 <span itemprop="description">Jack Sparrow and Barbossa embark on[...]</span>
 <div itemprop="actor" itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">Johnny Depp</span>
  <link itemprop="nationality" href="http://en.wikipedia.org/wiki/United_States"/>
 </div>
</div> 
 
Here we use  'http://en.wikipedia.org/wiki/United_States' to stand for the specific country. Other authorities also publish useful structured data about countries and have stable URLs that could be used. For example, we could use the UN FAO's GeoPolitical Ontology, and their URL for the USA. From a schema.org perspective, we do not take account of any types and properties defined by these external sites, since it is important to support a variety of quite different authority lists, who often have different ways of modeling things. Each external authority essentially supplies a set of URI/URL item identifiers that can be dropped into schema.org markup.

We've shown here the use of Wikipedia links for identifying members of the Country type. Take a look at the detailed document for discussion on how to use this with Microdata's 'itemid' attribute, if you want to describe the Country (or other object) in further detail. The W3C wiki also gives other examples, and shows how the markup would look in RDFa Lite

While there are more details to work out as we start to apply this idea across schema.org, we wanted to share this initial example.  The basic idea is very simple: everywhere in schema.org where external lists will help, we will need to have a specific schema.org type (like Country), for which the external authority supplies identifiers. In some cases, we will have to add new types to support this. Beyond the basics presented here, there are various technical details of syntax, discussion of exactly which authorities and URI identifiers to use, and so on. We welcome suggestions (here or via the Web Schemas group) for existing enumerations that would be useful additions, and feedback on the general approach.