Thursday, June 7, 2012

SemTech, RDFa, Microdata and more...

Schema.org was launched a year ago. This week several of the schema.org team returned to the SemTechBiz conference for a panel to discuss where we are, and where we're going.

Schema.org is all about shared vocabulary, rather than any specific markup encoding. As we reported last year, the RDFa Working Group have been working hard to address feedback from schema.org and others. Yesterday's panel gave us the chance to be the first to welcome W3C's announcement that RDFa 1.1 is now a full W3C recommendation. This new standard, in particular the RDFa Lite specification, brings together the simplicity of Microdata with improved support for using multiple schemas together.

What does this mean for schema.org? We want to say clearly that we continue to support Microdata, and in particular those who have championed the adoption of Microdata over the last year. Billions of pages now use schema.org markup thanks to these early adopters, and Microdata continues to be a fine way to publish and share structured data. Our approach is "Microdata and more". As implementations and services begin to consume RDFa 1.1, publishers with an interest in mixing schema.org with additional vocabularies, or who are using tools like Drupal 7, may find RDFa well worth exploring.

Beyond Microdata and RDFa in HTML, the SemTechBiz conference covered numerous other ways of sharing schema.org structured data. Examples included JSON-LD, the use of schema.org with DocBook XML (via RDFa), and W3C's relational database mapping technology.

We are also pleased to announce today a discussion paper on the use of OData and Schema.org, posted in the Web Schemas wiki. OData defines a RESTful interface for working with data on the Web. The newest version of OData allows service developers and third parties to annotate data or metadata exposed by an OData Service. Defining common OData Vocabulary encodings of the schema.org schemas facilitates the understanding and even transformation of data across these different encodings.

But what of the schema itself? The largest change so far was the integration of the IPTC/rNews vocabulary. Building on this model, we have been encouraging public collaboration, discussion and debate on schemas via the W3C Web Schemas community. Aside from the addition of JobPosting, numerous small improvements and fixes, including a new Comment type and a more detailed schema for SoftwareApplication, we have been preparing for a '1.0' release later this month. We maintain a public list of proposals under community discussion, and will typically incorporate vocabulary when we see a combination of interest from major publishers and consumers alongside rough consensus on the schema design.

The schema.org 1.0 vocabulary is expected to include substantial additions including support for genealogy (via historical-data.org), e-commerce (through collaboration with Good Relations), Learning / Education (with LRMI), a Medical/health vocabulary, additions for describing technical/code and API documentation, and for improved modeling of TV/Radio content. Discussion is also underway around Sports, Forums, and numerous other topics. For each of these, the W3C Wiki is the best place to start, and to contribute. Sometimes proposers or community members will use other mailing lists, Github or elsewhere, but the Wiki and mailing list are the main focus of shared discussions.

You can read full details of each work-in-progress, or follow this blog for news of new vocabulary. While we will continue to extend schema.org throughout the year (e.g. we expect IPTC will complete rNews 1.1 around October) we are also well aware that we can't cover everything. SemTech gave us the chance to discuss collaboration with the Wikidata project; this should allow schema.org descriptions to draw upon the vast content of Wikipedia. This combination of the growing schema.org vocabulary with 'external enumerations' from sites like Wikipedia, alongside new syntaxes such as RDFa Lite and OData will keep us busy over the next year, and will create exciting possibilities for search, structured data and the Web.

15 comments:

  1. I should also have mentioned that http://schema.org/docs/datamodel.html has been updated (thanks to Stéphane Corlosquet), to give an idea how schema.org looks when written in RDFa Lite.

    ReplyDelete
    Replies
    1. I have tried developing an example using RDFa Lite alongside another Microdata example I developed for the draft TVRadio schema. I am missing things like how to decalre datatypes such as datetime and an associated machine interpretable date and time in addition to the human readable value. Any advice on how this could be done in RDFa Lite?

      Delete
    2. There is a specific spec in progress (tracking the evolution of HTML5) for additional details on RDFa in HTML (4 and 5). In the section Additional RDFa Processing Rules, interpretation of the @datetime attribute (available in the 'time' element of HTML5) is specified. If used, its content will be examined and if it's a valid xsd:dateTime, xsd:date or xsd:time, that datatype will be used for the extracted literal. (Also, while RDFa Lite does not include the @datatype attribute, it is still available in RDFa, so you can use that for any kind of datatyping.)

      Delete
    3. In a similar case, how do examples like
      <meta itemprop="interactionCount" content="UserTweets:1203"/>
      translate to RDFa Lite? I'm missing the @content there and wouldn't want to wite "UserTweets:1203" in the readable text of my document.

      Delete
    4. Ossi: the content attribute is part of HTML, that's why it's not explicitly in the RDFa Lite specification. Your example can be translated to RDFa Lite simply by replacing itemprop with property:
      < meta property="interactionCount" content="UserTweets:1203" />

      Delete
    5. Oh... meta. It wouldn't work with span, though, would it?

      Delete
  2. Wow, that's a lot of progress :)

    Just wanted to clarify one thing. You say:

    "As implementations and services begin to consume RDFa 1.1, publishers with an interest in mixing schema.org with additional vocabularies, or who are using tools like Drupal 7, may find RDFa well worth exploring."

    There's unfortunately been a lot of confusion around the Drupal 7 implementation of RDFa and what it supports, so I want to make sure it's clear; Drupal 7 does not output RDFa 1.1, it outputs RDFa 1.0. Six months ago a project was nominally created to add RDFa 1.1 support, but there have been no commits on that project to date.

    ReplyDelete
    Replies
    1. The themes shipped with Drupal 7 core will always remain RDFa 1.0, but some contributed themes such as omega already have their doctype aligned to match RDFa 1.1. The RDFa project you are referring to intends to clean up some of the core markup and enable RDFa 1.1 new features, but I must admit that I've been so busy with the RDFa WG during the past few months that I haven't been able to give it the time it deserves. Now that spec is published expect some commits soon.

      Delete
  3. Hey, you know I'm all about teh Docbook and RDFa! Any more deets on that?

    ReplyDelete
  4. I've been watching this space for 12 years now and I'm still really at a loss as to why I would add this stuff as a developer? The return on time investment has never really been articulated. What browser supports RDFa? what does RDFa do for me or for my users? What do I get straight out of the box?

    ReplyDelete
  5. Any chance you guys could update the Rich Snippets tool http://www.google.com/webmasters/tools/richsnippets so that it recognizes http://schema.org/ and not just http://data-vocabulary.org/?

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. About the GoodRelations collab. Does it mean that, in the meanwhile, I need to integrate GoodRelations markups alongside the "shema.org Product" ones I've already included ?

    ReplyDelete

Note: Only a member of this blog may post a comment.