Wednesday, June 2, 2021 is ten! was announced ten years ago this week! Without getting all emotional, this feels like an appropriate moment to thank everyone who has contributed to this effort in any way - from publishers, technologists and standards experts to those who build applications that use the markup, or who work on publishing it from sites across the web. Some but not all of you are named in our documentation (about us; release notes);  but the truth is that the success of comes in large part from the millions of sites who have adopted it as a practical way to communicate the meaning of their content using structured data. was founded on the idea of making it easier and simpler for the ordinary, everyday sites that make up the web to use machine-readable data, and for that data to enable an ecosystem of applications used by millions of people. While it's hard to predict exactly what the next decade will bring, if we can all keep these founding concerns in mind as we improve, refine and curate our growing collection of schemas, we'll be doing our part to continue improving the web.

Dan & Guha

Tuesday, May 11, 2021

Announcing Schema Markup Validator: (beta)

Announcing preview availability of for review and feedback.

As agreed last year, is the new home for the structured data validator previously known as the Structured Data Testing Tool (SDTT). It is now simpler to use, and available for testing. will integrate feedback into its draft documentation and add it more explicitly to the website for the next official release.

SDTT is a tool from Google which began life as the Rich Snippets Testing Tool back in 2010. Last year Google announced plans to migrate from SDTT to successor tooling, the Rich Results Test, alongside plans to "deprecate the Structured Data Testing Tool". The newer Google tooling is focused on helping publishers who are targeting specific search features offered by Google, and for these purposes is a huge improvement as it contextualizes many warnings and errors to a specific target application.

However, many publishers had also appreciated SDTT as a powerful and general purpose structured data validator. Headlines such as "Google Structured Data Testing Tool Going Away; SEOs Are Not Happy" captured something of the mood. started out written only in Microdata, before embracing RDFa 1.1 Lite and JSON-LD 1.0. There are now huge amounts of data in all of these formats and more (see webdatacommons report). endorsed these multiple encodings, because they can each meet different needs and constraints experienced by publishers. The new validator will check all of these formats.

Amongst all this complexity, it is important to remind ourselves of the importance of simplicity and usability of markup for its founding purpose: machine-readable summaries of ordinary web page content. Markup that - when well-formed - helps real people find jobs, educational opportunities, images they can re-use, learn from fact checkers or find a recipe to cook for dinner.

This is the focus of the new Schema Markup Validator (SMV). It is simpler than its predecessor SDTT because it is dedicated to checking that you're using JSON-LD, RDFa and Microdata in widely understood ways, and to warning you if you are using types and properties in unusual combinations. It does not try to check your content against the information needs of specific services, tools or products (a topic deserving its own blog post). But it will help you understand whether or not your data expresses what you hope it expresses, and to reflect the essence of your structured data back in an intuitive way that reflects its underlying meaning.

The service is powered by Google's general infrastructure for working with structured data, and is provided to the project as a Google-hosted tool. We are also happy to note that many other validators are available, both commercial (e.g. Yandex's) and opensource. For example, the Structured Data Linter, JSON-LD Playground, SDO-Check and Schemarama tools. We hope that the new Schema Markup Validator will stimulate collaboration among tool makers to improve consistency and developer experience for all those working on systems that consume data. 

Please share any feedback with the community via Github, Twitter (#schemasmv), or the W3C community group.

Tuesday, December 1, 2020

Modernizing US health provider sites to improve directory accuracy

The world's attention has turned to healthcare this year, with many initiatives exploring the use of open data and standards. has made a number of efforts already to contribute to the global Coronavirus response, including the creation of SpecialAnnouncement markup, improvements around events, jobs, hospital reporting, and other schemas to reflect our changed reality. 

This week we have invited longstanding collaborator Aneesh Chopra to provide an introduction to some important developments in the United States, where is being used to improve the accuracy of information about healthcare provider directories. 

Guest post by Aneesh Chopra, former U.S. CTO (2009-2012) and President/Co-Founder of CareJourney:

Just over 9 years ago, the community launched a markup for JobPostings, an important resource to meet a call to action in helping veterans find jobs that valued their skills. During the early months of the pandemic, this community responded with an important upgrade to the nation’s health IT infrastructure to democratize access more trusted health information online.

Today, at an API Summit hosted by the Office of the National Coordinator for Health IT, Kathy Hempstead of the Robert Wood Johnson Foundation announced an open collaboration that builds upon the same regulations and industry standards to improve another important aspect of the consumer health navigation experience - searching for timely, accurate provider directory information.

For many Americans, finding a health plan that includes their trusted providers is critically important, but often requires tedious work looking up each plan’s provider directory. Sadly, as CMS found in a recent review, nearly 50% of provider directories contained inaccuracies regarding whether the provider was accepting new patients, practicing at the address listed, or reachable via the listed phone number. 

Regulators have attempted to solve these problems by imposing penalties on government-sponsored plans for inaccurate information, but an additional solution may be at hand. A provision embedded in CMS’ interoperability regulations requires government-sponsored health plans to publish machine-readable access to timely directory information by July 2021. In an effort to reduce administrative burdens, a multi-stakeholder collaborative is looking to both improve the quality of physician websites to include this information and to enable health plans to source timely, accurate information from them to comply with the rules.

Similar to the work that was done to make it easier for consumers to find COVID announcements on physician websites, such as testing availability, revised office hours or telemedicine services, this collaborative will work to standardize how to publish structured provider directory information. To further simplify the search experience, providers can now publish their website URL when updating their “digital contact information” on CMS’ NPPES NPI Registry.

Adding structured data from a provider’s website to the portfolio of tools health plans use today – including plan-agnostic reporting tools, “secret shopper” visits, mailings, and a number of emerging data-driven solutions – should result in a reduction in the administrative burden of updating physician directories. Directory maintenance is burdensome. The average practice has over 20 health plan contracts and directories to maintain with over 50% of these updates being conducted via phone or fax. According to a 2019 CAQH (Council for Affordable Quality Healthcare®) report, the average practice spends $1,000 each month for directory maintenance. Yet at the same time, physician practices find value in online marketing as a way to attract new patients, and invest an average of $650 per month to design websites and optimize search results, according to a study by Zocdoc. A web standardization effort will therefore have multiple benefits. It will allow physicians to more efficiently communicate useful information via search engines that can also be used to populate health plan directories and to meet regulatory compliance, thus reducing administrative burdens. 

CareJourney, with support from The Robert Wood Johnson Foundation, seeks to engage public and private sector stakeholders in an effort to accelerate the development and adoption of web standards for physician information, and to curate a portfolio of tools to structure this information on a practice’s website. Our goal is to improve consumer access to provider information while lowering physician burden. We anticipate the following benefits: 
  • Increased consumer access to high-quality, accurate provider information, such as whether a doctor, practicing at this location, is seeing new patients from my plan.
  • Consistent webpage documentation and maintenance practices that are sufficient to meet health plan regulatory requirements
  • Improved search engine results by leveraging the structured website markup 
This effort will benefit from the active participation of the healthcare community and we welcome additional participants to play a part in our initiative. Assistance in testing and providing feedback on the proposed web standards will be critical and extremely helpful in further promotion and adoption. Once the resulting open information and markup instructions are freely available, we welcome assistance in the widespread dissemination. Finally, we are grateful the prominent search engines are engaged in a process for site maintenance that ensures physicians keep their websites properly structured at the lowest possible administrative burden.

Thank you, in advance, for your interest in advancing this important work! Please sign up here to participate!

Monday, April 6, 2020

COVID-19 schema for CDC hospital reporting

The COVID-19 pandemic requires various medical and government authorities to aggregate data about available resources from a wide range of medical facilities. Clearly standard schemas for this structured data can be very useful.

The Centers for Disease Control (CDC) in the U.S. defined a set of data fields to facilitate exchange of this data. We are introducing a representation of these data fields. 

The purpose of this schema definition is to provide a standards-based representation that can be used to encode and exchange records that correspond to the CDC format, with usage within the U.S. primarily in mind. While the existence of this schema may provide additional implementation options for those working with US hospital reporting data about COVID-19, please refer to the CDC and other appropriate bodies for authoritative guidance on the latest reporting workflows and data formats.

Depending upon context, any of the formats and standards that work with may be applicable for encoding this data, including the Microdata, RDFa and JSON-LD data formats, as well as related technologies such as W3C SPARQL for data query. JSON-LD is in most cases likely to be the most appropriate format. There is no assumption that data encoded using this schema should necessarily be published on the public Web, nor that it would be used by search engines.

We will continue to improve this vocabulary in the light of feedback, and welcome suggestions for improvements and additions particularly from US healthcare organizations who are using it. This CDC-based vocabulary follows other recent changes we have made to For details of recent changes see our release notes and our previous post announcing the SpecialAnnouncement markup, which is now supported at both Bing (blog, docs) and Google (blog, docs). As the global response to COVID-19 evolves we will do our best to improve's vocabularies to represent the changes that Coronavirus is bringing to society, and to assist those using structured data to help with the response.

Monday, March 16, 2020

Schema for Coronavirus special announcements, Covid-19 Testing Facilities and more

The COVID-19 pandemic is causing a large number of “Special Announcements” pertaining to changes in schedules and other aspects of everyday life. This includes not just closure of facilities and rescheduling of events but also new availability of medical facilities such as testing centers.

We have today published 7.0, which includes fast-tracked new vocabulary to assist the global response to the Coronavirus outbreak.

It includes a "SpecialAnnouncement" type that provides for simple date-stamped textual updates, as well as markup to associate the announcement with a situation (such as the Coronavirus pandemic), and to indicate URLs for various kinds of update such a school closures, public transport closures, quarantine guidelines, travel bans, and information about getting tested.  

Many new testing facilities are being rapidly established worldwide, to test for COVID-19. now has a CovidTestingFacility type to represent these, regardless of whether they are part of long-established medical facilities or temporary adaptations to the emergency.

We are also making improvements to other areas of to help with the worldwide migration to working online and working from home, for example by helping event organizers indicate when an event has moved from having a physical location to being conducted online, and
whether the event's "eventAttendanceMode" is online, offlline or mixed. 

We will continue to improve this vocabulary in the light of feedback (github; doc), and welcome suggestions for improvements and additions particularly from organizations who are publishing such updates. 

Dan Brickley, R.V.Guha, Google.
Tom Marsh, Microsoft.

Wednesday, January 22, 2020 6.0 version 6.0 has been released. See the release notes for full details.  As always, the release notes have full details and links (including previous releases e.g. 5.0 and 4.0).

We are now aiming to release updated schemas on an approximately monthly basis (with longer gaps around vacation periods). Typically, new terms are first added to our "Pending" area to give time for the definitions to benefit from implementation experience before they are added to the "core" of As always, many thanks to everyone who has contributed to this release of

Dan Brickley, for

Tuesday, April 2, 2019 3.5: Simpler extension model, projects, grants and funding schemas, and new terms for describing educational and occupational credentials version 3.5 has been released. This release moves a number of terms from the experimental "Pending" area into the core. It also simplifies and clarifies the extension model, reducing our emphasis on using named subdomains for topical groups of schemas. New terms introduced in Pending area include improvements for describing projects, grants and funding agencies; for describing open-ended date ranges (e.g. datasets); and a substantial vocabulary for Educational and Occupational Credentials. Many thanks to all who contributed!

Wednesday, May 2, 2018 and

Over the past few years we have seen a number of application areas benefit from markup. discussions have often centered around the importance of ease of use, simplicity and adoption for publishers and webmasters. While those principles will continue to guide our work, it is also important to work to make it easier to consume structured data, by building applications and making more use of the information it carries. We are therefore happy to welcome the new Data Commons initiative, which is devoted to sharing such datasets, beginning with a corpus of fact check data based on the ClaimReview markup as adopted by many fact checkers around the world. We expect that this work will benefit the wider ecosystem around structured data by encouraging use and re-use of related datasets.

Tuesday, August 29, 2017 3.3: News, fact checking, legislation, finance, schedules, howtos, tourism and toilets! 3.3 has been released. As always, the release was prepared, debated and finalized by the community group, and features a range of additions, adjustments, bugfixes and clarifications to improve the expressiveness and usability of our schemas.

See the release notes for full details, but of particular note are some changes made around the NewsArticle type (in collaboration with the Trust Project on whose work this is largely based). For many years, our definition of NewsArticle was simply "a news article". With this release we add (via our "pending" mechanism) some more subtlety around News, making it possible to mark-up categories of news including opinion pieces, background articles, reportage, as well as as also introducing types for satirical and advertiser content. We also add properties that encourage greater transparency around News creation and publication. These are flagged as "pending" to emphasize that early adopter feedback on the new vocabulary is particularly welcomed, via Github, the W3C group, or the site's feedback form. These developments complement our earlier work to support interoperability amongst fact-checking sites via the ClaimReview type. Following discussion at GlobalFact4 conference, we have also amended the definition of the "expires" to highlight its applicability to fact checking content.

Other highlights of 3.3 include new terminology (also pending implementor feedback) for describing legislation, based on the European Legislation Identifier (ELI) ontology and the work of the ELI taskforce. We have also added an overview page giving more details on our finance-related terminology, contributed by the FIBO community, alongside a proposed design for describing schedules, new subtypes distinguishing user from critic reviews, and a generalization of our recipes schema called "HowTo" for recipe-like tasks that don't result in food. We've also added types for TouristAttraction and for PublicToilet...