Complex projects become compact files - Mapping document structures with PDF/A-3

04 Juli 2018 By Akash Choudhary

PDF/A-3 PDF standards

PDF/A-3 is a powerful tool for mapping projects and document structures and also for converting work documents into archive documents. Compared to PDF/A-1 and PDF/A-2, which are nonetheless valid, this standard part is characterized by the fact that any file formats can be embedded. Even those that are not PDF/A-compliant, i.e. long-term archivable. On the one hand, this involves risks, since the archivability of embedded files can only be partially checked. On the other hand, companies can adapt to this and use PDF/A-3 to convert digital project files into archive documents, for example, through integration in specialist applications or prescribed processes, and take advantage of this. It is the task of the implementing companies to create and enforce an internal policy that enables them to control which files and above all which formats within the PDF/A-3 document structures are stored in the archive. The usually best way to enforce these internal guidelines is to enable the creation of project files only via specialized applications and from there to ensure control over the composition of the repositories or hybrid documents.

There are always files in projects that cannot be converted to PDF/A completely or adequately. These include, for example, films, 3D models, data structures, e.g. as XML or Excel tables, in which the formulas are lost due to conversion. When using PDF/A-1 and PDF/A-2, these formats cannot be embedded at all or only with loss of information. This is different with PDF/A-3 where entire projects with all components can be stored in a system-neutral, standardized format.

PDF/A-3 also standardizes the way in which such document structures are mapped in PDF and thus creates a reliable way to access the embedded files. This is important because embedding files in "normal" PDF can be done in very different ways and therefore specialized PDF software is required to find all possible "storage locations" safely. PDF/A-3 also offers "non-PDF specialized" applications, a business application or an ERP system, a secure way to find and access the embedded files within the PDF structure.

Especially in engineering, design drawings or 3D models cannot be converted sufficiently well into a "flat" PDF page. This industry was therefore one of the first to discover the PDF/A-3 format and use it both for system-independent exchange of project documents and for archiving projects. However, the ability to store projects in a system-independent format with PDF/A-3 is not only useful for data exchange. An equally important aspect for the applying companies is the avoidance of migration problems when changing an archiving system. In conventional project archiving, documents and their metadata can usually be transferred, but not the project contexts.

However, when using PDF/A-3, it is essential to be clear about the files and formats that may appear in the PDF/A-3-based project files. It certainly does not make sense to view PDF/A-3 simply as a container for any files. Unlike PDF/A-1 and PDF/A-2, it must be ensured that only the intended "exception formats" appear in a project file. In order to enforce "clean" project files, it is usually essential that the PDF/A-3 project files are generated from the specialist application or a workflow system.

For example, the documents are exported from the DMS and, as soon as a project is set to the final workflow status, are automatically converted to PDF/A-3 together with attachments. In many cases, a new PDF file is created as the main document, an index PDF is embedded in which the documents belonging to the project are linked from there. All documents for which this makes sense are converted to PDF/A. Only defined exceptions, 3D drawings, XML structures or Excel documents are additionally embedded in the original format. Integration into a document application can also allow the conversion to be performed automatically at low load times when a certain workflow status is reached. As with archiving in general, each project file should be provided with metadata (title, author, project number, etc.) in order to guarantee independence from the system and to enable it to be located outside the system.

Another advantage of PDF/A-3 in project archiving is that documents can be converted to PDF/A very early on, since the document formats that can still be changed can be carried along. Conversion problems are detected at an early stage and not during archiving.

The system-independent archiving of e-mails has similar requirements as the archiving of projects, because e-mails are heterogeneous document structures due to their attachments. They can therefore be archived system-independently in a similar way to projects with PDF/A-3 in order to save them outside a specialized archiving system or to migrate them during a system change.

The possibility of embedding any format has already opened up new ways in the field of electronic invoice exchange. In the Central User Guidelines of the German Electronic Invoice Forum (ZUGFeRD), PDF/A-3 is defined as the transfer format in which the invoice data record is also embedded in XML format. The feature of PDF/A-3, the way files are embedded in the PDF, is a great advantage here, since no specialized PDF software is required to access and process the XML data set at the recipient, e.g. in an ERP application. The invoice can be processed either "conventionally" on the basis of the PDF view or automatically using the XML structure.

Back to overview