Thursday, June 7, 2012

SemTech, RDFa, Microdata and more...

Schema.org was launched a year ago. This week several of the schema.org team returned to the SemTechBiz conference for a panel to discuss where we are, and where we're going.

Schema.org is all about shared vocabulary, rather than any specific markup encoding. As we reported last year, the RDFa Working Group have been working hard to address feedback from schema.org and others. Yesterday's panel gave us the chance to be the first to welcome W3C's announcement that RDFa 1.1 is now a full W3C recommendation. This new standard, in particular the RDFa Lite specification, brings together the simplicity of Microdata with improved support for using multiple schemas together.

What does this mean for schema.org? We want to say clearly that we continue to support Microdata, and in particular those who have championed the adoption of Microdata over the last year. Billions of pages now use schema.org markup thanks to these early adopters, and Microdata continues to be a fine way to publish and share structured data. Our approach is "Microdata and more". As implementations and services begin to consume RDFa 1.1, publishers with an interest in mixing schema.org with additional vocabularies, or who are using tools like Drupal 7, may find RDFa well worth exploring.

Beyond Microdata and RDFa in HTML, the SemTechBiz conference covered numerous other ways of sharing schema.org structured data. Examples included JSON-LD, the use of schema.org with DocBook XML (via RDFa), and W3C's relational database mapping technology.

We are also pleased to announce today a discussion paper on the use of OData and Schema.org, posted in the Web Schemas wiki. OData defines a RESTful interface for working with data on the Web. The newest version of OData allows service developers and third parties to annotate data or metadata exposed by an OData Service. Defining common OData Vocabulary encodings of the schema.org schemas facilitates the understanding and even transformation of data across these different encodings.

But what of the schema itself? The largest change so far was the integration of the IPTC/rNews vocabulary. Building on this model, we have been encouraging public collaboration, discussion and debate on schemas via the W3C Web Schemas community. Aside from the addition of JobPosting, numerous small improvements and fixes, including a new Comment type and a more detailed schema for SoftwareApplication, we have been preparing for a '1.0' release later this month. We maintain a public list of proposals under community discussion, and will typically incorporate vocabulary when we see a combination of interest from major publishers and consumers alongside rough consensus on the schema design.

The schema.org 1.0 vocabulary is expected to include substantial additions including support for genealogy (via historical-data.org), e-commerce (through collaboration with Good Relations), Learning / Education (with LRMI), a Medical/health vocabulary, additions for describing technical/code and API documentation, and for improved modeling of TV/Radio content. Discussion is also underway around Sports, Forums, and numerous other topics. For each of these, the W3C Wiki is the best place to start, and to contribute. Sometimes proposers or community members will use other mailing lists, Github or elsewhere, but the Wiki and mailing list are the main focus of shared discussions.

You can read full details of each work-in-progress, or follow this blog for news of new vocabulary. While we will continue to extend schema.org throughout the year (e.g. we expect IPTC will complete rNews 1.1 around October) we are also well aware that we can't cover everything. SemTech gave us the chance to discuss collaboration with the Wikidata project; this should allow schema.org descriptions to draw upon the vast content of Wikipedia. This combination of the growing schema.org vocabulary with 'external enumerations' from sites like Wikipedia, alongside new syntaxes such as RDFa Lite and OData will keep us busy over the next year, and will create exciting possibilities for search, structured data and the Web.