Are you accessible?

03 Jul 2019 By David van Driessche

PDF standards PDF/UA Archiving

In different parts of the world, rules around accessible documents and when you are required to supply them, exist under different names. As an example, in North America people will talk about section 508 or WCAG AA, in Europe, they might refer to the EU accessibility directive. But whatever the name, they usually do the same thing: they establish specific rules different documents must follow to be labeled accessible. And for PDF documents, those rules boil down to compliance with the ISO standard for accessible PDF: PDF/UA.

PDF/UA

The “UA” in PDF/UA comes from “Universal Accessibility” and it is an ISO standard that defines the rules for PDF documents to be labeled accessible. Those rules have been selected in such a way that the document can easily be used by adaptive technology such as screen readers. The most important rules are:

  • All fonts used in the document must be properly constructed so that the text in the document can be read correctly by adaptive technology. In more technical terms, this would be described as the need for all fonts to be embedded and to include correct Unicode mappings. All text also needs to be labeled as the correct language: if the document is English but includes some Spanish words, both the document as a whole and those Spanish words must be labeled as such.
  • Documents usually contain two elements: the actual content you or I would be interested in, and visual elements to lighten up the content. Screen reading devices need to be able to distinguish between those two because they want to ignore any visual elements that don’t include real meaning. If a non-text element does contain meaning, the document needs to contain a description of such an element so that the screen reading software can include it.
  • When you read a newspaper or magazine, where articles have been positioned creatively on the page, you might have been confused at some point about the correct reading order of an article you were interested in. Does it continue in the next column, does it jump to the next page… Reading order needs to be included in an accessible document as well so software can follow the logical flow of all text. On top of that, screen reading software also needs to be able to identify what type of content it is dealing with. Is it a heading, and if so of which level? Is it a paragraph of text? Footnote? Table? All of this information is embedded in the document as structure information and it helps to make the document easier to navigate.
Is my document accessible?

Most ISO standards for PDF documents can easily be verified by software. The process of checking compliance with an ISO standard is typically referred to as preflight (from aviation where the pilots will check the plane before it takes off), and software exists to preflight against those ISO standards.

PDF/UA is a bit of a problem child in this department. Many of the rules in the standard can indeed be verified by software, but unfortunately not all. A human usually is required to validate that all rules are properly followed… Why? One small example should make this clear. Imagine a document containing English and Spanish text again. Software can preflight this document and tell me whether or not all text has been labeled with a language. That’s just checking the metadata for the text. But how can the software be certain that the right text is labeled with the right language? Humans are good at this, software… not so much.

And the PDF/UA standard contains quite a few of such cases, where the software has to be assisted by a human in order to fully validate compliance. That doesn’t mean software can’t make the process easier of course. Examples of software applications in this field are the PDF Accessibility Checker from the Swiss foundation "Zugang für alle", a free tool to check everything a software application can check for PDF/UA and is considered the first tool based entirely on the Matterhorn protocol, and callas pdfaPilot, a commercial tool to check everything a software application can check for PDF/UA.

Two notable points for pdfaPilot. First of all – while it’s a commercial tool – the PDF/UA verification part is always free. And secondly, pdfaPilot includes help with the human verification part as well, by converting the PDF document into an HTML representation. This HTML version then makes it much easier to check the structure of the document because it is shown in a nicely color-coded format. It also makes it very easy to see whether non-text elements are either labeled as not important or have a proper alternative description.

Back to overview