PDF/A - An Overview
PDF/A is a specialized version of the popular PDF format designed specifically for long-term document preservation. The “A” in PDF/A stands for “Archive”, signifying its purpose to ensure that documents are preserved and displayed consistently over time, regardless of changes in software, hardware, or operating systems.
Key features of PDF/A
One of the major advantages of PDF/A is its ability to always render exactly the same, no matter the system or software used for viewing. This is achieved through its unique design principles:
Self-Contained Files: A PDF/A file is 100% self-contained. This means that all information needed to display the document—fonts, images, color profiles, and any other essential data—is embedded directly in the file itself.
No Reliance on External Sources: Unlike standard PDFs, PDF/A does not depend on external resources to render the content. This makes it an ideal format for archiving important documents for long-term use.
PDF/A Standards
There are four different version of PDF/A files and each has its own ISO standard. Higher number of the version means that more features are a part of the file format.
PDF/A-1
Original PDF/A standard, the most restrictive and the most commonly used today.
Based on PDF 1.4.
ISO 19005-1
PDF/A-2
Adds support for transparency, layers, improved image compression and attachments (provided that those attachments are in PDF/A format).
Based on PDF 1.7
ISO 19005-2
PDF/A-3
Permits any file type as attachment.
Based on PDF 1.7
ISO 19005-3
PDF/A-4
Based on newest PDF standard - PDF 2.0.
ISO 19005-4
PDF/A Conformance Levels
Level B (Basic)
PDF/A-1B, PDF/A-2B, PDF/A-3B
B-level conformance requires only that documents conform with guidelines for reliable viewing and therefore, is the easiest level to achieve.
Level A (Accessible)
PDF/A-1A, PDF/A-2A, PDF/A-3A
“Accessible” conformance is a superset of B-level conformance. It adds requirements for information intended to preserve a document’s logical structure, semantic content, and natural reading order.
Level U (Unicode)
PDF/A-2U, PDF/A-3U
Like ‘level A’, U-level conformance requires character mapping to Unicode.
Level E
PDF/A-4E
Level E introduces support for RichMedia and 3D type annotations as well as embedded files to create a PDF/A version compatible with modern geospatial, construction, and engineering workflows. (The “E” stands for engineering).
Level F
PDF/A-4F
Allows file types of any format to be embedded inside PDF.
PDF/A Requirements
A PDF/A compliant file must meet the requirements of the PDF/A standard. Some requirements prohibit certain functions that could hinder long-term archiving, while others guarantee reliable reproduction.
Minimum requirements include:
All content must be embedded (fonts, colors, text, images, etc.) and must not reference external content
The file does not contain audio or video (unless PDF/A-4F)
The file does not contain JavaScript
The files does not use LZW compression
The file is not encrypted or password protected
Metadata in the file is encoded using Extensible Metadata Platform (XMP) technology
The file does not use XFA forms
Interactive form fields must have an appearance dictionary
PDF/A Verification
JavaUtils can detect if document is PDF/A-1A or PDF/A-1B. It uses PDFBox preflight which will become deprecated in future, so it is recommended to use veraPDF.
PDFBox preflight, veraPDF and Adobe Pro can verify if document is PDF/A compliant BUT sometimes they can give different results. For example PDFBox preflight (our JavaUtils) says that document is PDF/A-1B compliant but Adobe Pro says it is not.
Adobe Reader (= free version) cannot verify if document is PDF/A compliant. If document claims to be PDF/A compliant, Adobe Reader shows message bar after opening such document with this information.
Adobe Pro (= paid version) has special tool which can verify if document truly is PDF/A compliant.