schema blog: 2012

Thursday, November 8, 2012

Good Relations and Schema.org

Today, we are pleased to announce that we are integrating e-commerce schemas from the GoodRelations project into Schema.org. The addition of these widely adopted schemas into schema.org will make it more easy for Web publishers to express structured data about products, offers, companies, stores and related facts.

Schema.org was designed as a platform where the Web community can come together and share structured schemas that improve the ability for search engines to understand the content of Web pages. Our collaboration with GoodRelations exemplifies this: GoodRelations provides a rich, well known and widely used terminology for e-commerce data sharing. By integrating GoodRelations into schema.org, we make it easier for publishers to adopt, and also combine such vocabulary. Just as with IPTC rNews previously, and many other collaborations, our approach has been to bring together existing work in a way that hides multi-schema complexity behind a common datamodel.

Effective immediately, the GoodRelations vocabulary (http://purl.org/goodrelations/) is directly available from within the schema.org site for use with both HTML5 Microdata and RDFa. Webmasters of e-commerce sites can use all GoodRelations types and properties directly from the schema.org namespace to expose more granular information for search engines and other clients, including delivery charges, quantity discounts, and product features. Enumerated lists of values remain managed at GoodRelations URLs, following our general approach for referencing 'external enumerations'.

Integrating these schemas has involved making a few decisions, and we welcome all feedback on the approach taken here. In order to have consistent naming conventions between schema.org and GoodRelations, some terms were given new names for use within schema.org. There are also a few cases where existing schema.org vocabulary differed in terminology or 'level of detail' with GoodRelations. We will continue to improve our documentation, examples and FAQ to make clearer the new expressivity that these additions bring. But we wanted to share this progress as early as possible, since it provides an important step forward for structured data and e-commerce on the Web.

Good Relations has been developed and maintained by Martin Hepp since 2002 and continues as an ongoing project. We look forward to seeing it reach new audiences via schema.org.

R.V. Guha

Google

Wednesday, July 11, 2012

Describing Datasets with schema.org

Earlier this year, we received a proposal for a 'Datasets' addition to schema.org, via the Web Schemas group at W3C. Based on informal conversations with various potential publishers and consumers, this work has great potential and we would like to invite interested parties to take a detailed look at the proposal, to identify any implementation issues or potential improvements. It is a small but useful vocabulary and the wiki overview also includes a summary of its relationship to related initiatives from the Linked Data community. More details and demos are also available from RPI.

As with all such proposals, it is listed in our Wiki area and we encourage comments and discussion on the public-vocabs@w3.org mailing list. As schema.org grows into more specialist areas, we are aware that we can't expect everyone to join one big mailing list. In particular for this Datasets vocabulary, which is particularly relevant to the community around open government and public-sector data, we want to take care to solicit comments from potential publishers. As always, comments are welcomed in the public W3C-hosted mailing list and Wiki, via blog discussions, or if you prefer, by direct email to the schema.org team.

This topic is particularly exciting due to the huge number of datasets that have been made public in recent years. While each dataset may ultimately be expressed in detailed, domain-specific form (e.g. using specific scientific or statistical schemas), the Datasets proposal focuses on the high level common characteristics that are shared across thousands of otherwise diverse datasets.

So what are the next steps with this vocabulary? We would like to hear from publishers of such datasets, to confirm what we've been hearing anecdotally, which is that such an extension to schema.org would be useful, used and a good fit to the available metadata.

As always with schema.org, the hard work is in building and demonstrating rough consensus around a design. This week's post on the data.gov site from Chris Musialek is an important step in that direction, and we welcome comments from others that will help us move things forward. From Chris's post:

We've been watching the schema.org datasets schema space for a while now, as Data.gov is very interested in adding schema.org support for our listing of over 450,000 datasets. We think this will help the major search engines create better relevance rankings of Federal government data, where many searches begin.

We wanted to come out publicly saying that we've reviewed the current datasets schema proposal in draft, and we are comfortable with the current state of things. There is definitely work still left to do, but there seems to be pretty solid agreement on everything but the details, which seem very resolvable. At this point, if the group would solidify on the dataset proposal, then Data.gov would support and use it.

Many thanks to Chris for opening the conversation about this work. If you have feedback on any aspect of the Datasets proposal, do please share your experience...

Tuesday, June 26, 2012

Health and Medical vocabulary for schema.org

We are pleased to announce a major set of additions to schema.org that improve our coverage of health and medical topics. Although there are many existing efforts around structured data for health and medicine, such structure is today typically available only 'behind the scenes' rather than shared in the Web using standard markup. Our design goals therefore differed from many previous initiatives, in that we focused on markup for use by Webmasters and publishers. Our main goal was to create markup that will help patients, physicians, and generally health-interested consumers find relevant health information via search.

This collaborative project drew upon search expertise from the schema.org partners but also gained immeasurably through feedback from expert reviewers including the US NCBI; physicians at Harvard, Duke and other institutions, as well as from several health Web sites. Contributions from the W3C Healthcare and Lifesciences group and Web Schemas community also helped bridge the complex worlds of Web standards, search and medicine/healthcare.

A note on scope: the new health and medical schema additions are intended to cover both consumer- and professionally-targeted health and medical web content, so any given piece of content may use only the relevant subset of the schema. Also, we've focused on creating lightweight markup that easily surfaces key health and medical entities in web pages and captures the relationships between them. As such, we envision these additions as complementary to the many very good and comprehensive medical ontologies, meta-thesauri, and controlled vocabularies that have been created in the medical domain. When such resources are available, our proposed schema can link to and take advantage of them, e.g. via the code property of MedicalEntity. Finally, while today the additions are not aimed at supporting use cases like automated reasoning, medical records coding, or genomic tagging, these could be interesting domains for future extension.

The Web contains a wealth of information on health and medicine and we hope this contribution will make it easier for users (whether patients, consumers, physicians or family members) to make the most of the information that is shared in the Web. For interested parties we have prepared a more detailed overview document. As with all schema.org vocabulary, we will continue to evolve the schema and welcome your feedback, suggestions and implementation experience here, via W3C, or by mail.

-- Aaron Brown, Google
-- C. Michael Gibson, MD, Wikidoc

Monday, June 11, 2012

New Vocabularies for Technical Publishing

Three new vocabularies have been proposed as the result of a collaborative effort by several Technology companies. They are specifically for use with Technical Articles, API reference documentation, and Code.

These proposed vocabularies will improve search engines’ understanding of documentation with technical content, and thus greatly increase the discoverability of this documentation.

The following snippets highlight the potential of these new vocabularies.

TechArticle

Informs which product version the content is referring to

This content is for version 4; and the current version is 4.5.

Informs where to get more information on the overall concept

This content on “Hyper-V Server 8 Beta” is about the broader concept of virtualization:

<metaitemprop="url" content="http://technet.microsoft.com/en-US/virtualization"/>

</span>

Maps content to the audience’s intent

This is content that describes how to do something:

itemprop="genre" content="How-to"

This content describes steps for troubleshooting:

itemprop="genre" content="Troubleshooting"

APIReference

Disambiguates version and usage

This content refers to a managed assembly:
itemprop="programmingModel" content="Managed"
itemprop="assembly" content="mscorlib.dll" />

Defines platform category

This reference documentation applies to the phone platform:

itemprop="aboutProduct" content=".Net Framework 4.5"

itemprop="targetPlatform" content="phone"

This reference documentation applies to the desktop platform:

itemprop="aboutProduct" content=".Net Framework 4.5"

itemprop="targetPlatform" content="desktop"

Code

Defines section of content as sample code

This Code is a C++ sample inserted in an article:

C++

</div>

This is a full visual studio solution in an MSDN Code Gallery:

</div>

We would like this community’s feedback concerning the above proposals.

Thanks!

Charlie Jiang and Kenley Lamaute

Thursday, June 7, 2012

SemTech, RDFa, Microdata and more...

Schema.org was launched a year ago. This week several of the schema.org team returned to the SemTechBiz conference for a panel to discuss where we are, and where we're going.

Schema.org is all about shared vocabulary, rather than any specific markup encoding. As we reported last year, the RDFa Working Group have been working hard to address feedback from schema.org and others. Yesterday's panel gave us the chance to be the first to welcome W3C's announcement that RDFa 1.1 is now a full W3C recommendation. This new standard, in particular the RDFa Lite specification, brings together the simplicity of Microdata with improved support for using multiple schemas together.

What does this mean for schema.org? We want to say clearly that we continue to support Microdata, and in particular those who have championed the adoption of Microdata over the last year. Billions of pages now use schema.org markup thanks to these early adopters, and Microdata continues to be a fine way to publish and share structured data. Our approach is "Microdata and more". As implementations and services begin to consume RDFa 1.1, publishers with an interest in mixing schema.org with additional vocabularies, or who are using tools like Drupal 7, may find RDFa well worth exploring.

Beyond Microdata and RDFa in HTML, the SemTechBiz conference covered numerous other ways of sharing schema.org structured data. Examples included JSON-LD, the use of schema.org with DocBook XML (via RDFa), and W3C's relational database mapping technology.

We are also pleased to announce today a discussion paper on the use of OData and Schema.org, posted in the Web Schemas wiki. OData defines a RESTful interface for working with data on the Web. The newest version of OData allows service developers and third parties to annotate data or metadata exposed by an OData Service. Defining common OData Vocabulary encodings of the schema.org schemas facilitates the understanding and even transformation of data across these different encodings.

But what of the schema itself? The largest change so far was the integration of the IPTC/rNews vocabulary. Building on this model, we have been encouraging public collaboration, discussion and debate on schemas via the W3C Web Schemas community. Aside from the addition of JobPosting, numerous small improvements and fixes, including a new Comment type and a more detailed schema for SoftwareApplication, we have been preparing for a '1.0' release later this month. We maintain a public list of proposals under community discussion, and will typically incorporate vocabulary when we see a combination of interest from major publishers and consumers alongside rough consensus on the schema design.

The schema.org 1.0 vocabulary is expected to include substantial additions including support for genealogy (via historical-data.org), e-commerce (through collaboration with Good Relations), Learning / Education (with LRMI), a Medical/health vocabulary, additions for describing technical/code and API documentation, and for improved modeling of TV/Radio content. Discussion is also underway around Sports, Forums, and numerous other topics. For each of these, the W3C Wiki is the best place to start, and to contribute. Sometimes proposers or community members will use other mailing lists, Github or elsewhere, but the Wiki and mailing list are the main focus of shared discussions.

You can read full details of each work-in-progress, or follow this blog for news of new vocabulary. While we will continue to extend schema.org throughout the year (e.g. we expect IPTC will complete rNews 1.1 around October) we are also well aware that we can't cover everything. SemTech gave us the chance to discuss collaboration with the Wikidata project; this should allow schema.org descriptions to draw upon the vast content of Wikipedia. This combination of the growing schema.org vocabulary with 'external enumerations' from sites like Wikipedia, alongside new syntaxes such as RDFa Lite and OData will keep us busy over the next year, and will create exciting possibilities for search, structured data and the Web.

Friday, May 11, 2012

Schema.org markup for external lists

The world is too rich, complex and interesting for a single schema to describe fully on its own. With schema.org we aim to find a balance, by providing a core schema that covers lots of situations, alongside extension mechanisms for extra detail. There are many situations where the use of existing controlled vocabularies, standards and datasets would improve schema.org markup. This is the role of the schema.org "external enumerations" mechanism.

We introduce "external enumerations" with a simple example - countries - and encourage implementors to join the schema.org community in W3C's 'Web Schemas' group where the full details are being discussed.

Each schema.org type (such as Person, PostalAddress) is associated with a set of properties, such as
"nationality", "addressCountry". In turn, each property has one or more expected types; in this case, both the "nationality" of a Person, and the "addressCountry" of a PostalAddress expect to have a Country value. Rather than adding large lists of specific countries to schema.org, instead we encourage the use of external lists. We will publish a set of well-known authority lists, linked to the types and properties they are used with. To get started, we take simple Wikipedia links as an example of such an authority. Other more specialist examples (such as IPTC codes) will follow.

Taking our existing Movie example in Microdata, let's add nationality details for one of the actors. To do this, we simply add a link:

<div itemscope itemtype="http://schema.org/Movie">
 <h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
 <span itemprop="description">Jack Sparrow and Barbossa embark on[...]</span>
 <div itemprop="actor" itemscope itemtype="http://schema.org/Person">

  <span itemprop="name">Johnny Depp</span>

  <link itemprop="nationality" href="http://en.wikipedia.org/wiki/United_States"/>
 </div>
</div>

Here we use 'http://en.wikipedia.org/wiki/United_States' to stand for the specific country. Other authorities also publish useful structured data about countries and have stable URLs that could be used. For example, we could use the UN FAO's GeoPolitical Ontology, and their URL for the USA. From a schema.org perspective, we do not take account of any types and properties defined by these external sites, since it is important to support a variety of quite different authority lists, who often have different ways of modeling things. Each external authority essentially supplies a set of URI/URL item identifiers that can be dropped into schema.org markup.

We've shown here the use of Wikipedia links for identifying members of the Country type. Take a look at the detailed document for discussion on how to use this with Microdata's 'itemid' attribute, if you want to describe the Country (or other object) in further detail. The W3C wiki also gives other examples, and shows how the markup would look in RDFa Lite.

While there are more details to work out as we start to apply this idea across schema.org, we wanted to share this initial example. The basic idea is very simple: everywhere in schema.org where external lists will help, we will need to have a specific schema.org type (like Country), for which the external authority supplies identifiers. In some cases, we will have to add new types to support this. Beyond the basics presented here, there are various technical details of syntax, discussion of exactly which authorities and URI identifiers to use, and so on. We welcome suggestions (here or via the Web Schemas group) for existing enumerations that would be useful additions, and feedback on the general approach.