PDF/A conformance and metadata

The PDF/A generation mechanism is described in more detail in the pdfx manual.

Configuration

  • First, compile the document using version=final in the document class parameters (the draft version is supposed to fail PDF/A validation)!
  • The PDF/A conformance level can be configured with the pdfaconformance parameter
  • The default conformance level is a-2b (PDF/A-2b: PDF/A-2b – Level B (basic) conformance)

    • other supported values: a-1b, a-2u, a-3b, a-3u, a-1a, a-2a, a-3a, and none
    • the values a-1a, a-2a, and a-3a are still marked as experimental
    • none disables pdfx
    • the default conformance level should be sufficient for most thesis documents

Metadata file (thesis.xmpdata)

  • The pdfx package explicitly disables the propagation of document metadata from the LaTeX preamble to PDF metadata fields (such pdfauthor, pdftitle, pdfsubject, pdfkeywords, etc.). This should not be altered with hyperref as all the embedded metadata must be synchronized in conformant PDF/A documents.
  • Instead, the embedded PDF/A metadata is provided in XMP format and read from <maindoc>.xmpdata (just replace <maindoc> here with the main tex file, e.g. thesis).
  • For detailed documentation of the metadata fields and the process, please refer to the pdfx.pdf sections 2.2--2.3.
  • The following code (thesis.xmpdata) is provided as an example:

    \Title{Baking through the ages}
    \Author{A. Baker\sep C. Kneader}
    \Language{en-GB}
    \Keywords{cookies\sep muffins\sep cakes}
    \Publisher{University of Turku}

Tools

Note: PDF/A support is a rather new feature in LaTeX, e.g. the pdfx 1.6.3 library was announced 2019/2/27. If the document fails to validate, consider compiling with a more recent LaTeX distribution. The template has been tested with TeXLive 2017+.

VeraPDF (generic instructions)

A free VeraPDF tool can be used to test if the resulting PDF files are PDF/A conformant. A snapshot of VeraPDF is hosted at UTU GitLab.

The following script shows how to invoke the VeraPDF tool:

$ latexmk -pdf -shell-escape thesis.tex

$ wget https://tech.utugit.fi/soft/thesis/veraPDF-apps/files/greenfield-apps-latest.jar -O validator.jar
$ java -cp validator.jar org.verapdf.apps.GreenfieldCliWrapper --format text -v thesis.pdf

Note: Java 11 (or later) runtime is required to run the script.

VeraPDF (Docker image)

A headless Java 11 JRE is provided in the Docker image for VeraPDF powered PDF/A validation. The UTU build of the VeraPDF tool is also included in the LaTeX Docker image and can be run by simply invoking:

$ pdfa-validate thesis.pdf

The GitLab CI script in this project demonstrates the validation of CI/CD generated PDFs with this tool.

Limitations

  • pdfTeX should not have any significant limitations
  • luaLaTeX expects all input documents to be utf-8 encoded (mainly concerns Windows users)
  • XeLaTeX generated documents may fail when using OTF fonts (pdfx manual section 3.1.1).
  • XeLaTeX requires the extra parameter (-output-driver="xdvipdfmx -z 0") which is included in the provided latexmkrc.
  • XeLaTeX will not generate conforming PDF/A documents when the output-directory is changed in latexmk.
  • Configuring a default action (hyperref) when opening the PDF will produce non-conforming PDF files.

What are these PDF/A conformance rules?

The VeraPDF page contains a list of validation rules:

Related: