ClaimReview is one of the hidden jewels that can be found amongst schema.org's 2700+ definitions. To kick off a series of guest posts around this topic, we bring you a guest post from two members of the factchecking and open data community who have been exploring the potential of this work.
Enriching ClaimReview for fact checkers
To help fight bad information, search engines, social media platforms and other sources of online content are increasingly highlighting the work of fact checkers by labelling posts with incorrect, misleading or harmful information.
This is often enabled through the publication of structured data that uses Schema.org's ClaimReview schema. This defines a simple approach for tagging fact checking articles that has been collaboratively developed and documented through Schema.org. For more background for a journalistic and factchecking audience, see Duke Reporters' Lab's dedicated site.
Our investigation has covered the following areas:
Current scope and application of ClaimReview. The currently defined model provides a lot of flexibility around how much detail might be included in a review, with Google, Bing and others recommending specific profiles of that in their structured markup documentation. Full Fact has adopted their own approach for applying the standard, we have reflected on that experience and the current recommendations from Google, Bing and other adopters, in pursuit of a Full Fact profile that is close to those used elsewhere.
Identifiers and linking. How can additional links and identifiers be included in ClaimReview markup to support disambiguation and aggregation of data? We've concluded that Schema.org already provides useful properties which, if applied consistently, can support this goal. In line with its growing role as a clearing house for identifiers, we believe Wikidata would be a useful common target for linking data across the community, e.g. making use of schema.org's sameAs property. Full Fact articles now contain experimental markup in this direction.
Enriching Claims. The standard was revised to include the notion of a Claim: a statement made by an author that appears in one or more locations. We believe this is an important part of the data model and one that should be more widely adopted. Consistent use of Claim markup would help to clearly indicate situations in which a person or organisation is repeatedly making the same claim, or where others are repeating the same misinformation. In addition to enriching claims with author and appearance information, the addition of topic information would provide a useful dimension to the data, helping to surface related claims and fact checks.
Corrections and actions. The second wave of fact checking is about more than just writing fact checks, it involves taking action to tackle disinformation. How might we surface data about requests for corrections to published content, and record when those corrections have been made? Schema.org currently includes some vocabulary to help describe corrections and comments which we've explored. But further work is needed to define a useful way of recording and sharing the other activities undertaken by fact checkers
Citing evidence. Finally citing evidence is an essential part of performing a fact check, so how can we use existing Schema.org vocabulary to help to surface the key resources, papers and datasets that were used in producing a fact check?
Building on the open, collaborative approach that the community has taken so far, we have published our research notes that explore these questions in more detail and present some early recommendations. We welcome feedback from anyone working to tackle online disinformation.
We would also like to propose the creation of an informal mailing list or community group to support ongoing discussion and experimentation, e.g. hosted at W3C alongside the broader Schema.org CG.
Our joint goal would be to create proposals for further enhancing the current Schema.org model, as necessary, and documenting useful patterns of applying the current model to real-world scenarios including both tagging content and developing APIs.