Thursday, December 9, 2021

Enriching Claim Reviews - Sharing Experience From Factchecking

ClaimReview is one of the hidden jewels that can be found amongst's 2700+ definitions. To kick off a series of guest posts around this topic, we bring you a guest post from two members of the factchecking and open data community who have been exploring the potential of this work.

Enriching ClaimReview for fact checkers

Andrew Dudfield, Leigh Dodds

To help fight bad information, search engines, social media platforms and other sources of online content are increasingly highlighting the work of fact checkers by labelling posts with incorrect, misleading or harmful information.

This is often enabled through the publication of structured data that uses's ClaimReview schema. This defines a simple approach for tagging fact checking articles that has been collaboratively developed and documented through For more background for a journalistic and factchecking audience, see Duke Reporters' Lab's dedicated site

At Full Fact we have recently been exploring ways to revise and extend the claim review metadata to provide more detail that might enable further reuse and labelling of content, and further insights into the fact checking process.

Our investigation has covered the following areas:

  • Current scope and application of ClaimReview. The currently defined model provides a lot of flexibility around how much detail might be included in a review, with Google, Bing and others recommending specific profiles of that in their structured markup documentation. Full Fact has adopted their own approach for applying the standard, we have reflected on that experience and the current recommendations from Google, Bing and other adopters, in pursuit of a Full Fact profile that is close to those used elsewhere.

  • Identifiers and linking. How can additional links and identifiers be included in ClaimReview markup to support disambiguation and aggregation of data? We've concluded that already provides useful properties which, if applied consistently, can support this goal. In line with its growing role as a clearing house for identifiers, we believe Wikidata would be a useful common target for linking data across the community, e.g. making use of's sameAs property. Full Fact articles now contain experimental markup in this direction.

  • Enriching Claims. The standard was revised to include the notion of a Claim: a statement made by an author that appears in one or more locations. We believe this is an important part of the data model and one that should be more widely adopted. Consistent use of Claim markup would help to clearly indicate situations in which a person or organisation is repeatedly making the same claim, or where others are repeating the same misinformation. In addition to enriching claims with author and appearance information, the addition of topic information would provide a useful dimension to the data, helping to surface related claims and fact checks. 

  • Corrections and actions. The second wave of fact checking is about more than just writing fact checks, it involves taking action to tackle disinformation. How might we surface data about requests for corrections to published content, and record when those corrections have been made? currently includes some vocabulary to help describe corrections and comments which we've explored. But further work is needed to define a useful way of recording and sharing the other activities undertaken by fact checkers

  • Citing evidence. Finally citing evidence is an essential part of performing a fact check, so how can we use existing vocabulary to help to surface the key resources, papers and datasets that were used in producing a fact check?

Building on the open, collaborative approach that the community has taken so far, we have published our research notes that explore these questions in more detail and present some early recommendations. We welcome feedback from anyone working to tackle online disinformation.

We would also like to propose the creation of an informal mailing list or community group to support ongoing discussion and experimentation, e.g. hosted at W3C alongside the broader CG.

Our joint goal would be to create proposals for further enhancing the current model, as necessary, and documenting useful patterns of applying the current model to real-world scenarios including both tagging content and developing APIs.

Wednesday, June 2, 2021 is ten! was announced ten years ago this week! Without getting all emotional, this feels like an appropriate moment to thank everyone who has contributed to this effort in any way - from publishers, technologists and standards experts to those who build applications that use the markup, or who work on publishing it from sites across the web. Some but not all of you are named in our documentation (about us; release notes);  but the truth is that the success of comes in large part from the millions of sites who have adopted it as a practical way to communicate the meaning of their content using structured data. was founded on the idea of making it easier and simpler for the ordinary, everyday sites that make up the web to use machine-readable data, and for that data to enable an ecosystem of applications used by millions of people. While it's hard to predict exactly what the next decade will bring, if we can all keep these founding concerns in mind as we improve, refine and curate our growing collection of schemas, we'll be doing our part to continue improving the web.

Dan & Guha

Tuesday, May 11, 2021

Announcing Schema Markup Validator: (beta)

Announcing preview availability of for review and feedback.

As agreed last year, is the new home for the structured data validator previously known as the Structured Data Testing Tool (SDTT). It is now simpler to use, and available for testing. will integrate feedback into its draft documentation and add it more explicitly to the website for the next official release.

SDTT is a tool from Google which began life as the Rich Snippets Testing Tool back in 2010. Last year Google announced plans to migrate from SDTT to successor tooling, the Rich Results Test, alongside plans to "deprecate the Structured Data Testing Tool". The newer Google tooling is focused on helping publishers who are targeting specific search features offered by Google, and for these purposes is a huge improvement as it contextualizes many warnings and errors to a specific target application.

However, many publishers had also appreciated SDTT as a powerful and general purpose structured data validator. Headlines such as "Google Structured Data Testing Tool Going Away; SEOs Are Not Happy" captured something of the mood. started out written only in Microdata, before embracing RDFa 1.1 Lite and JSON-LD 1.0. There are now huge amounts of data in all of these formats and more (see webdatacommons report). endorsed these multiple encodings, because they can each meet different needs and constraints experienced by publishers. The new validator will check all of these formats.

Amongst all this complexity, it is important to remind ourselves of the importance of simplicity and usability of markup for its founding purpose: machine-readable summaries of ordinary web page content. Markup that - when well-formed - helps real people find jobs, educational opportunities, images they can re-use, learn from fact checkers or find a recipe to cook for dinner.

This is the focus of the new Schema Markup Validator (SMV). It is simpler than its predecessor SDTT because it is dedicated to checking that you're using JSON-LD, RDFa and Microdata in widely understood ways, and to warning you if you are using types and properties in unusual combinations. It does not try to check your content against the information needs of specific services, tools or products (a topic deserving its own blog post). But it will help you understand whether or not your data expresses what you hope it expresses, and to reflect the essence of your structured data back in an intuitive way that reflects its underlying meaning.

The service is powered by Google's general infrastructure for working with structured data, and is provided to the project as a Google-hosted tool. We are also happy to note that many other validators are available, both commercial (e.g. Yandex's) and opensource. For example, the Structured Data Linter, JSON-LD Playground, SDO-Check and Schemarama tools. We hope that the new Schema Markup Validator will stimulate collaboration among tool makers to improve consistency and developer experience for all those working on systems that consume data. 

Please share any feedback with the community via Github, Twitter (#schemasmv), or the W3C community group.

Tuesday, December 1, 2020

Modernizing US health provider sites to improve directory accuracy

The world's attention has turned to healthcare this year, with many initiatives exploring the use of open data and standards. has made a number of efforts already to contribute to the global Coronavirus response, including the creation of SpecialAnnouncement markup, improvements around events, jobs, hospital reporting, and other schemas to reflect our changed reality. 

This week we have invited longstanding collaborator Aneesh Chopra to provide an introduction to some important developments in the United States, where is being used to improve the accuracy of information about healthcare provider directories. 

Guest post by Aneesh Chopra, former U.S. CTO (2009-2012) and President/Co-Founder of CareJourney:

Just over 9 years ago, the community launched a markup for JobPostings, an important resource to meet a call to action in helping veterans find jobs that valued their skills. During the early months of the pandemic, this community responded with an important upgrade to the nation’s health IT infrastructure to democratize access more trusted health information online.

Today, at an API Summit hosted by the Office of the National Coordinator for Health IT, Kathy Hempstead of the Robert Wood Johnson Foundation announced an open collaboration that builds upon the same regulations and industry standards to improve another important aspect of the consumer health navigation experience - searching for timely, accurate provider directory information.

For many Americans, finding a health plan that includes their trusted providers is critically important, but often requires tedious work looking up each plan’s provider directory. Sadly, as CMS found in a recent review, nearly 50% of provider directories contained inaccuracies regarding whether the provider was accepting new patients, practicing at the address listed, or reachable via the listed phone number. 

Regulators have attempted to solve these problems by imposing penalties on government-sponsored plans for inaccurate information, but an additional solution may be at hand. A provision embedded in CMS’ interoperability regulations requires government-sponsored health plans to publish machine-readable access to timely directory information by July 2021. In an effort to reduce administrative burdens, a multi-stakeholder collaborative is looking to both improve the quality of physician websites to include this information and to enable health plans to source timely, accurate information from them to comply with the rules.

Similar to the work that was done to make it easier for consumers to find COVID announcements on physician websites, such as testing availability, revised office hours or telemedicine services, this collaborative will work to standardize how to publish structured provider directory information. To further simplify the search experience, providers can now publish their website URL when updating their “digital contact information” on CMS’ NPPES NPI Registry.

Adding structured data from a provider’s website to the portfolio of tools health plans use today – including plan-agnostic reporting tools, “secret shopper” visits, mailings, and a number of emerging data-driven solutions – should result in a reduction in the administrative burden of updating physician directories. Directory maintenance is burdensome. The average practice has over 20 health plan contracts and directories to maintain with over 50% of these updates being conducted via phone or fax. According to a 2019 CAQH (Council for Affordable Quality Healthcare®) report, the average practice spends $1,000 each month for directory maintenance. Yet at the same time, physician practices find value in online marketing as a way to attract new patients, and invest an average of $650 per month to design websites and optimize search results, according to a study by Zocdoc. A web standardization effort will therefore have multiple benefits. It will allow physicians to more efficiently communicate useful information via search engines that can also be used to populate health plan directories and to meet regulatory compliance, thus reducing administrative burdens. 

CareJourney, with support from The Robert Wood Johnson Foundation, seeks to engage public and private sector stakeholders in an effort to accelerate the development and adoption of web standards for physician information, and to curate a portfolio of tools to structure this information on a practice’s website. Our goal is to improve consumer access to provider information while lowering physician burden. We anticipate the following benefits: 
  • Increased consumer access to high-quality, accurate provider information, such as whether a doctor, practicing at this location, is seeing new patients from my plan.
  • Consistent webpage documentation and maintenance practices that are sufficient to meet health plan regulatory requirements
  • Improved search engine results by leveraging the structured website markup 
This effort will benefit from the active participation of the healthcare community and we welcome additional participants to play a part in our initiative. Assistance in testing and providing feedback on the proposed web standards will be critical and extremely helpful in further promotion and adoption. Once the resulting open information and markup instructions are freely available, we welcome assistance in the widespread dissemination. Finally, we are grateful the prominent search engines are engaged in a process for site maintenance that ensures physicians keep their websites properly structured at the lowest possible administrative burden.

Thank you, in advance, for your interest in advancing this important work! Please sign up here to participate!

Monday, April 6, 2020

COVID-19 schema for CDC hospital reporting

The COVID-19 pandemic requires various medical and government authorities to aggregate data about available resources from a wide range of medical facilities. Clearly standard schemas for this structured data can be very useful.

The Centers for Disease Control (CDC) in the U.S. defined a set of data fields to facilitate exchange of this data. We are introducing a representation of these data fields. 

The purpose of this schema definition is to provide a standards-based representation that can be used to encode and exchange records that correspond to the CDC format, with usage within the U.S. primarily in mind. While the existence of this schema may provide additional implementation options for those working with US hospital reporting data about COVID-19, please refer to the CDC and other appropriate bodies for authoritative guidance on the latest reporting workflows and data formats.

Depending upon context, any of the formats and standards that work with may be applicable for encoding this data, including the Microdata, RDFa and JSON-LD data formats, as well as related technologies such as W3C SPARQL for data query. JSON-LD is in most cases likely to be the most appropriate format. There is no assumption that data encoded using this schema should necessarily be published on the public Web, nor that it would be used by search engines.

We will continue to improve this vocabulary in the light of feedback, and welcome suggestions for improvements and additions particularly from US healthcare organizations who are using it. This CDC-based vocabulary follows other recent changes we have made to For details of recent changes see our release notes and our previous post announcing the SpecialAnnouncement markup, which is now supported at both Bing (blog, docs) and Google (blog, docs). As the global response to COVID-19 evolves we will do our best to improve's vocabularies to represent the changes that Coronavirus is bringing to society, and to assist those using structured data to help with the response.

Monday, March 16, 2020

Schema for Coronavirus special announcements, Covid-19 Testing Facilities and more

The COVID-19 pandemic is causing a large number of “Special Announcements” pertaining to changes in schedules and other aspects of everyday life. This includes not just closure of facilities and rescheduling of events but also new availability of medical facilities such as testing centers.

We have today published 7.0, which includes fast-tracked new vocabulary to assist the global response to the Coronavirus outbreak.

It includes a "SpecialAnnouncement" type that provides for simple date-stamped textual updates, as well as markup to associate the announcement with a situation (such as the Coronavirus pandemic), and to indicate URLs for various kinds of update such a school closures, public transport closures, quarantine guidelines, travel bans, and information about getting tested.  

Many new testing facilities are being rapidly established worldwide, to test for COVID-19. now has a CovidTestingFacility type to represent these, regardless of whether they are part of long-established medical facilities or temporary adaptations to the emergency.

We are also making improvements to other areas of to help with the worldwide migration to working online and working from home, for example by helping event organizers indicate when an event has moved from having a physical location to being conducted online, and
whether the event's "eventAttendanceMode" is online, offlline or mixed. 

We will continue to improve this vocabulary in the light of feedback (github; doc), and welcome suggestions for improvements and additions particularly from organizations who are publishing such updates. 

Dan Brickley, R.V.Guha, Google.
Tom Marsh, Microsoft.

Wednesday, January 22, 2020 6.0 version 6.0 has been released. See the release notes for full details.  As always, the release notes have full details and links (including previous releases e.g. 5.0 and 4.0).

We are now aiming to release updated schemas on an approximately monthly basis (with longer gaps around vacation periods). Typically, new terms are first added to our "Pending" area to give time for the definitions to benefit from implementation experience before they are added to the "core" of As always, many thanks to everyone who has contributed to this release of

Dan Brickley, for

Tuesday, April 2, 2019 3.5: Simpler extension model, projects, grants and funding schemas, and new terms for describing educational and occupational credentials version 3.5 has been released. This release moves a number of terms from the experimental "Pending" area into the core. It also simplifies and clarifies the extension model, reducing our emphasis on using named subdomains for topical groups of schemas. New terms introduced in Pending area include improvements for describing projects, grants and funding agencies; for describing open-ended date ranges (e.g. datasets); and a substantial vocabulary for Educational and Occupational Credentials. Many thanks to all who contributed!

Wednesday, May 2, 2018 and

Over the past few years we have seen a number of application areas benefit from markup. discussions have often centered around the importance of ease of use, simplicity and adoption for publishers and webmasters. While those principles will continue to guide our work, it is also important to work to make it easier to consume structured data, by building applications and making more use of the information it carries. We are therefore happy to welcome the new Data Commons initiative, which is devoted to sharing such datasets, beginning with a corpus of fact check data based on the ClaimReview markup as adopted by many fact checkers around the world. We expect that this work will benefit the wider ecosystem around structured data by encouraging use and re-use of related datasets.