The future of documents

19 sept. 2018 By Dietrich von Seggern

I just met a lot of academics and professionals discussing interesting requirements and solutions related to documents. The DocEng (pronounce ˈdɒkndʒ for document engineering) is organized by computer scientists (professors and students) and was held in Halifax, Canada this year. I decided to give it a visit thinking that where, if not there can I hear about what requirements will PDF face in the future. And I indeed came back with some interesting ideas.

But first I noticed that - as expected - there were almost no talks on PDF but rather about text mining, collaboration, interlinking of documents... PDF was perceived as a reliable replacement of paper but not more: It can’t be edited, it can be annotated - but it is difficult to search or summarize annotations across documents, it is almost never accessible, it does not allow users to access structured data (e.g. in tables) in a useful way... So, it was perceived as too dumb as to think about it as a possible subject of their research.

The PDF Association had organized a panel discussion “Industry perspectives” and I was wondering before whether they would give us a chance and how that would go. It went surprisingly well. Attendees almost immediately came up with interesting questions and it was exciting to see how much interest there actually was for the “hidden” features of PDF, mainly the tagging structure and associated (embedded files) - see recent blog post "Save the data".

The main problem is that many of the authoring tools - and for them that is in many cases LaTex - are not able to create such feature rich PDF files. (There have been discussions with some of the “LaTex” people about tagged PDF but it seems to be a major effort for them to implement that.) Of course, when it comes to PDF files that have been created several years ago (which in research is often the case) there is obviously not much that a user can do about that. However, it was pretty obvious that they would be more than just willing to make sure that they create better PDFs for their future successors.

For us as the PDF industry, there is a lot to learn about what users are missing - supposedly in PDF but in reality in our products. Why are there no good tools that allow you to organize your research across documents? Why is it not easier to understand whether or not a PDF file has tagging and if so whether it is built according to the “real" structure? Why is it not much easier to create links between PDF files, e.g. after embedding one into the other? Our task as PDF Association is to explain what PDF can do and to encourage developers to create as rich as possible PDF files and PDF environments.

These people work with documents in a professional manner and try to find new ways for digital processes. I guess this at least is a valuable source of information for us and maybe we can as well generate some more interest in PDF. So I was happy to hear that in 2019 I will not have to travel that far to hear more: DocEng 2019, again in September, will take place right here in Berlin! Enjoy a picture of Nova Scotia! (Source: ACM DocEng 2018 Twitter)