Digital personnel records for eternity

07 nov. 2018 By Dietrich von Seggern

How long the contents of personnel files must be stored varies, depending on various factors like company policy or country's laws. In some cases, its up to 30 years after an employee leaves the company, for example, when an attempt is made to claim a pension. In Germany, if the personnel files contain tax-relevant documents, they must be kept for six years. Can companies just keep paper records? Well ofcourse but what about the risks like receipts on thermal paper are no longer readable after such a long time. In addition, purely paper-based storage requires additional space on top of a risk that the documents will be destroyed by fire or water. In addition, the Federal Data Protection Act stipulates that sensitive files must be stored in such a way that only personnel department employees have access to them.

You already got the idea that it makes sense to store personnel files digitally with an appropriate authorization system to protect against unauthorized access, right? But how? What should the format be like so that they can be reproduced again years later, if necessary. The risk that the software with which a document was created no longer exists in a version compatible with the document format must be kept as low as possible. It is possible to maintain all applications and versions for years, but with disproportionate effort. This also applies to a continuous migration of data to the latest versions. Now that we are talking about problems, let's talk about the last one followed by what I think is the solution. The content of a digital personnel file can be quite heterogeneous: contracts in PDF, scanned certificates of incapacity for work in JPEG, company agreements in Word... This diversity of formats, which is difficult to control, brings with it numerous problems, some of which are immediately noticeable while others become noticeable only after a few years. The conversion into a manufacturer-independent, uniform archive format is an approach that can provide immediate relief.

So what does the market offer regarding this: TIFF, JPEG, PDF and PDF/A. A comparison of these four formats reveals considerable differences. The old TIFF format results in extremely large files, especially in the case of color documents. JPEG files are smaller, but usually have a poorer playback quality. Metadata for describing and identifying documents is not uniformly supported - and you cannot search text in JPEG. The traditional PDF format eliminates many of these problems and is appreciated for its layout fidelity, among other things (Yes, we are getting closer to the solution...). However (are you?), due to the many different versions and its immense functionality, it is not a reliable basis for the trustworthy archiving of documents - it simply allows too many things that endanger long-term readability. The PDF/A format as a standardized version of this PDF format stores documents comparatively small, reproduces them true to the original, supports metadata and enables full text search (VOILA!). The visual representation of the documents is undoubtedly retained, the file always contains all the components required for representation. The display is therefore independent of a specific operating system, product or manufacturer. To achieve this, the PDF/A format contains restrictions that reduce the diversity of PDF to a level that makes sense for archivability and ensure that each PDF/A document is complete and complete in itself. Metadata is embedded in a standardized form based on the eXtensible Metadata Platform (XMP) developed by Adobe. PDF/A also provides a binding display of colors.

Since PDF/A was first published, ISO has continually expanded its range of services. PDF/A-1 already focused on conversion and validation, but it can only contain one document at a time. With PDF/A-2, the possibility was added to create PDF packages from a main document and any number of embedded PDF/A files. With Version 3, such PDF/A packages can also contain external formats, so that personnel files can be conveniently organized in a PDF/A file with all the associated documents. It should be noted that PDF/A-3 standardizes the way in which files are embedded in PDFs and thus ensures that they can be reliably found within the internal PDF structure. This is an important aspect because of the many possibilities PDF offers for embedding files, since files can be extracted without having to read the complete PDF file. In addition, the ISO standard requires certain metadata that defines the file type of the embedded file, the type of relationship between itself and the main PDF (e.g. source file), and the reference point (e.g. document, page, or page component).

Now that you know that PDF/A is the way to go, there are numerous programs that can be used to scan paper originals and save them as PDF/A files. It can be useful to provide the PDF/A file with searchable text using a text recognition function (OCR). If a company needs to digitize a high volume of pages, for example for the resolution of paper-based inventory files, it can be useful to commission an external service provider. As a rule, all digitally created documents can be converted to PDF/A without any problems. Many application programs, such as Microsoft Office or Open Office / Libre Office, already provide functions with which documents can be saved directly in PDF/A. To ensure that the documents really comply with the PDF/A standard, it is recommended to use a verification tool, a so-called validator. It ensures that the PDF/A files also comply with the ISO standard and thus remain readable for decades.

Many archives already contain considerable quantities of PDF files. Converting these PDF files to PDF/A is possible in several ways: from single-user solutions to systems with high throughput, various products are available on the market. Here, too, a validator should act as a kind of guardian to ensure that only flawless PDF/A files are archived. After all, it is crucial for the success of any archiving strategy that later access to the documents does not fail due to incomplete incoming inspection. There are solutions where each conversion is automatically completed with an independent validation before a PDF receives the PDF/A entry. Modern conversion solutions have far-reaching functions which, for example, repair incomplete fonts, integrate missing fonts retrospectively and correct inconsistent metadata. In order to achieve the highest possible level of automation, they work with hot folders. The PDF files contained therein are then processed according to their specifications without manual intervention and then stored in the respective target folders. The user receives information about files via appropriate reporting, which led to problems during conversion. So how are you archiving your documents?