Part of NewFormat AB Portal
OpenFormat
Products and Services
Offerings / Promotions / Prices
Guides and Datasheets
Customer Contact Form
Home/NewFormat
NewFormat AB
PDF standards make the world work
|
Our solutions are based on
tested, reliable, and highly accessible software solutions.
NewFormat is Appointed Reseller and Solution Partner in
Nordic (Sweden, Denmark, Finland, Iceland, Norway)
and
Baltic Region (Estonia, Latvia, Lithuania)
of leading PDF and PDF/A solutions from callas software GmbH.
|
PDF Workflow Technology
Your PDF problems solved
callas software finds simple ways to handle complex PDF challenges
Solutions to preflight, validate, optimize, correct and repurpose PDF files
for document archiving/e-archiving and long-term preservation
callas software plug-ins for Adobe Acrobat
enhance native capabilities and features of Adobe Acrobat.
|
callas pdfaPilot
The all-round tool for
long-term preservation/archiving according to the ISO Standard PDF/A
and
accessible PDF according to the ISO Standard PDF/UA
|
Accessible PDF
A fully PDF/UA compliant PDF can be just as
accessible as a WCAG compliant website.
|
|
News - New key features in callas pdfaPilot
News - New key features in callas pdfapilot
callas software blog
- callas pdfaPilot supports ISO PDF/A-4, PDF/X-6 and PDF/VT-3
- callas pdfaPilot converts files to both PDF/A-4 and PDF/X-6 simultaneously,
and take advantage of improvements to PDF/A conversion for fillable forms.
- callas pdfaPilot supports PDF/UA,
the ISO Standard (ISO 14289-1) for accessible PDF
callas pdfaPilot Desktop comes with powerful PDF tools for free!
- No license activation or other obligations.
These tools provide experienced users and developers with
a wide range of information about the internal structure of
PDF files that make them easier to process.
- Full text recognition included (OCR) for over 100 languages
- callas pdfaPilot can be used as an application for
- The QuickFix Engine integrated into pdfaPilot also enables
very fast PDF processing, as well as scaling, sorting and deleting pages.
- callas pdfaPilot supports the standard data model for e-invoices:
- Compatible with the veraPDF project
- callas pdfaPilot is since v7 fully compatible with the veraPDF project
currently being promoted by the European Union.
callas License Server supports callas pdfaPilot
- A dynamic way of licensing processing capacity services,
based on the need of peak performance,
on Amazon Web Services, Microsoft Azure or others.
- callas pdfaPilot is also available in the cloud or as a
web services for automated PDF editing:
See also:
|
callas pdfaPilot
Tools to convert documents to
the PDF/A Standard for Long-Term Archiving
Conversion of documents to PDF/A
PDF/A
The recommended document format
for long-term preservation of digital documents
PDF/A is the recommended document file format of choice for
long-term document archiving and preservation and guarantee that
documents will be readable and reusable over a long period of time.
PDF/A-1, PDF/A-2 and PDF/A-3 are all complementing PDF formats
and ISO Standards (ISO 19005-1, 19005-2 and 19005-3)
used for long-term archiving of digital documents.
callas pdfaPilot - Tools converting documents to PDF/A
callas pdfaPilot are solutions for conversion of PDF documents or
native (born digital) documents and emails with attachments into
robust, searchable PDF/A files for long-term archiving.
callas pdfaPilot provides dynamic technical preflight/validation and
correction, and optimization of PDF files for document management,
document archiving/e-archiving and long-term preservation.
Server-based automation of
conversion and validation in one go
callas pdfaPilot facilitates long-term e-archiving for enterprises,
public authorities and agencies, organizations and others
regardless of size and business.
In principle, it is advisable to keep all documents in one format,
and PDF is the first choice here as the lowest common denominator
for digitally generated or paper-based documents.
In order to relieve employees of the burden of conversion and at
the same time ensure that all PDF/A files are of a consistently high quality,
decision-makers should rely on server-based solutions that include both
conversion and validation and also provide features that automate
the processes surrounding the processing of PDFs:
Benefits of centralised server-based
conversion to PDF/A
Dietrich von Seggern, Managing Director callas software,
explains why you should not leave the generation of PDF/A files
to individual staff members:
What callas pdfaPilot can do
callas pdfaPilot turns all kinds of PDFs into PDF/A files according to
ISO Standards PDF/A-1, PDF/A-2, PDF/A-3, PDF/A-4, PDF/UA and PDF/X
Typical PDF/A, PDF/UA and PDF/X documents
- PDFs created in Office applications
- e-mails with attachments
- interactive forms
- invoices / e-Invoices
- construction drawings
- technical documentation
- presentations
- newspapers
- books
callas pdfaPilot converts native digital office documents created with
Microsoft Word, PowerPoint, Excel, Publisher, Visio, Project,
Apache OpenOffice, LibreOffice, Pages (macOS only),
PostScript and EPS, PNG, JPG and TIFF images into
archive-ready PDF/A files.
Non-compliant PDF files can easily be fixed
and saved as valid PDF/A documents.
callas pdfaPilot takes care of
quality, safety, speed and efficiency
Whenever PDF files need to be prepared on a large scale
callas pdfaPilot takes care of quality, safety, speed and efficiency:
- automate and standardize output processes
- care for quality, safety, speed and efficiency
- provide fully PDF compliant files
- increase productivity
- reduce errors
- validate and correct inaccuracies for compliance with
the PDF/UA standard for universally accessible PDF
Practical use of callas pdfaPilot:
callas pdfaPilot is compatible with Adobe® Acrobat® DC
(in fact it uses the same rock-solid PDF/A technology),
but in addition allows you to automate and batch process PDF files.
callas pdfaPilot - Solves Long-Term Archiving Challenges
Long-term preservation challenges for archives
Electronic archives must prove electronic documents are authentic,
reliable, complete, unaltered, and usable across time.
Make documents and emails last a lifetime
In many environments, there are regulatory requirements that
all communication regarding specific topics has to be archived for
a certain period of time.
How long that is can differ widely, from a few years to
multiple tens of years, or even for all eternity.
The format of choice for archiving is defined in
the ISO standard PDF/A ("PDF for Archiving").
This ISO standard defines four different PDF/A versions and
callas pdfaPilot can automatically convert all of documents and
emails into the PDF/A flavor required for your purposes.
Additional job information (metadata) is fully supported and
the solution is configurable enough to let it adapt to your workflow;
instead of the other way around.
Archiving documents is a critical operation of modern workflows;
for that reason callas pdfaPilot has been extensively tested
and complies with the ISO standards as verified with
the veraPDF test corpus.
The fact that the very same technology is used in the markets
de facto standard, Adobe Acrobat, is testament to its quality.
Archiving typical office documents
The best way to have good PDF/A files is to
properly create them to begin with.
For this, callas pdfaPilot can automatically convert
Microsoft Word, Excel, PowerPoint, Project,
Publisher and Visio files into quality PDF files for you.
Simply drag them on the document window of
callas pdfaPilot Desktop and the conversion
takes place in the best possible way.
Apache OpenOffice and LibreOffice documents are
fully supported too, and on macOS Pages and Keynote
of course work as well.
Using the integrated Adobe engine, callas pdfaPilot even does
quality conversion of EPS and PostScript files for you,
without requiring Adobe Acrobat Distiller to be installed.
These days,
emails are an essential part of
business communication in most organizations.
Many countries have laws and regulations around the archival of
such communication, either to be used in potential future litigation
or as part of sound accounting practices.
Because emails are usually handled by email servers with limited
storage capacities, there are no guarantees that an email you
receive today will still be on the server ten years from now.
When you download the email from your server it is possibly
converted into the proprietary format of the email client, and
there is no guarantee that you will have such a client in the future.
In addition, there are attachments to the emails that require a
proper viewer for all file formats that may occur here.
Luckily callas pdfaPilot lets you store these emails including
all attachments in a PDF/A format that will be available and
readable many years from now.
But even in environments where such regulatory guidelines don’t apply,
there are good reasons to archive emails.
In our modern society an amazing amount of business intelligence
is captured in emails. Being able to maintain that information in a
structured way and in the same format that is used for other
documents and being able to including emails and their
attachments efficiently will only become more important.
callas pdfaPilot supports PDF/UA,
the ISO Standard (ISO-14289-1) for accessible PDF
Accessibility is important for digital PDF documents to ensure that
people who need to use assistive technology (such as screen readers)
can access and assimilate the content of these documents.
In addition, local laws and regulations in many countries provide
that PDF files must be accessible to everyone in the society.
As a result, checking against the ISO standard PDF/UA,
the ISO standard that governs what is an accessible PDF file,
has gained importance for service providers, governments and
enterprise customers alike.
callas software provides a number of software tools that support
the PDF/UA standard for digital PDF accessibility.
A successful validation of a PDF document against PDF/UA-1,
always requires the PDF document to pass at least two stages of
validation before the document is published or shared with others:
- a programmatic syntax test,
which can be performed by a trusted software, and
- an interactive semantic test,
manually carried out by a human.
While callas pdfaPilot is a commercial tool,
the PDF/UA verification part is always free to use and
includes help with the human verification part as well,
by converting the PDF document into an HTML representation
(See "Man versus Machine" below).
- Checking for universal accessibility
Making sure PDF files are according to the PDF/UA standard
for universal accessibility is done easily with
the Preflight feature in callas pdfaPilot.
The solution contains quality control profiles that contain all
the rules for PDF/UA and can verify them in processed PDF files.
This provides a first idea of whether the PDF documents
at least follow the basic rules and implement the necessary
constructs for accessibility.
It checks whether all content in the document is tagged,
whether images have alternate descriptions,
whether a language identification is used for all text, etc.…
- Man versus Machine
While most other quality control processes
can be entirely done using computer software,
this is much more difficult for PDF/UA.
The PDF/UA accessibility standard for example states
that all images must have a proper description.
Of course software can check that a description is available,
but is it really good enough for someone who cannot see the image?
For this reason, the callas solutions also provide functionality
such as the conversion of PDF/UA into color-coded HTML which
can be used by a human to easy check that document structure
was applied correctly, that all image descriptions make sense,
and so on.
Click picture to view and verify
structure tags of a tagged PDF document.
This is a free/no-cost feature in
callas pdfaPilot, pdfToolbox and pdfGoHTML.
This feature is available not only in the main callas pdfaPilot solution,
but also in callas pdfGoHTML, a free and separate plug-in for
Adobe Acrobat DC; making it possible for everyone to quickly
evaluate the quality of the tagging in an accessible PDF document.
Thanks to its unique checking feature for tagging structure in a PDF file
callas pdfaPilot also offers optimized exporting to PDF/UA.
- Not Restricted to PDF
The input documents to create PDF/UA compatible
PDF files are often not PDF files.
For that reason, the callas solutions also support
the conversion of other file formats into PDF.
This is supported for Microsoft Office documents,
Apache Open Office and LibreOffice documents,
various image file formats and emails.
After conversion to PDF, the documents can then be further
made compliant with the appropriate PDF/UA version.
- Export of tagged PDF to EPUB
callas pdfaPilot lets you take advantage of PDF tagging
in a very different area as well:
The tagging structure can be used in order to
automatically create EPUB files from PDF.
This PDF to EPUB feature converts PDFs into eBook files
that can immediately be used on mobile devices such as
smartphones or tablets.
Use the right PDF standard for e-archiving:
How do you choose between variants of PDF/A standards:
PDF/A-1, PDF/A-2, PDF/A-3 or PDF/A-4?
(callas software blog, March 15, 2023)
How PDF/A-2 and PDF/A-4
are better than PDF/A-1 (2021)
callas software presents advantages with PDF/A-2 and PDF/A-4 vs PDF/A-1;
to avoid problems in PDF/A-1 conversion:
PDF/A validation with callas pdfaPilot
Common question: What is a PDF/A validator exactly validating ?
The rule of thumb is:
A PDF/A validator (callas pdfaPilot) checks the aspects
that are specifically governed by the PDF/A standard.
Preflight example:
Validation against the PDF/A-2a Standard with callas pdfaPilot:
Report: PDF/A-2a - Summary of Checks Carried Out (pdf)
Use of callas pdfaPilot in practice:
Use cases: Preflight, validation, correction and conversion
|
callas pdfaPilot Webinars/Video Recordings, Tutorials, Demonstrations
Video Demonstrations of callas pdfaPilot - Popup-window
callas pdfaPilot - Tutorials
|
Datasheets/User Manuals
callas pdfaPilot Desktop, Server, CLI and SDK
|
callas pdfaPilot Training
callas pdfaPilot - Instructor-led Training
callas pdfaPilot Self-Training
callas pdfaPilot and its sister product callas pdfToolbox
shares the same concepts, basic architecture and many features.
Below self-training for users of callas pdfToolbox is
applicable and recommended also for users of callas pdfaPilot.
This PDF-based self-pace training help users fully
understand and utilize the power of callas pdfToolbox.
The training material is focused on profile creation and
includes exercises with solutions and explanations.
Simply download the following PDF training manual,
open it in a PDF viewer and you are good to go:
|
Price information
callas pdfaPilot software can be obtained via license acquisition or renting.
License is per desktop/laptop and unlimited number of documents, or
per server and unlimited number of users and unlimited number of documents.
For 5pcs or more of desktop licenses volume discount apply!
|
Reference Cases/Customer Testimonials
callas pdfaPilot Desktop, Server, CLI and SDK
2021: Debeka, Germany
Debeka archives emails in compliance with ISO rules thanks to callas pdfaPilot.
The Debeka Group is one of the five biggest players in the insurance and
building-society savings sector in Germany, offering a wide range of
insurance and financial services.
Now, the group is introducing a central document conversion solution which will
process emails and their attachments as they are readied for long-term archiving.
Debeka uses PDF/A as its chosen file format.
Every day, it now converts an average of approximately 2,400 emails and
their associated attachments into the ISO standard for document archiving.
Longer-term, Debeka plans to expand the solution step by step to cover
additional archiving processes. In total, the group expects that by the final phase,
it will need to convert 300,000 to 400,000 documents into PDF/A every day.
Integrator: actino software, Germany, (also cooperating partner to NewFormat).
For more details, please see:
or contact NewFormat.
2019: Beuth Verlag
Beuth Verlag trusts callas software for quality assurance of PDF/A files.
callas pdfaPilot and callas pdfToolbox ensure that the DIN standards published
by Beuth Verlag comply with the ISO standards and are therefore suitable
for long-term archiving.
At the same time, a high degree of automation has significantly reduced
the amount of manual work required.
Beuth Verlag, a subsidiary of DIN, the German Institute for Standardization,
distributes national and international standards and develops specialized
publications for industries, scientific research, trade, the service sector,
academic studies and crafts.
For more details, please see:
or contact NewFormat.
- 2023: Malmö University, Sweden
- 2023: University of Eastern Finland, Finland
- 2023: Verified Global AB, Sweden
- 2022: Wellspect AB, Mölndal, Sweden
- 2019: University of Turku, Finland
- 2018: Tampere University, Finland
- 2018: Nobel Media AB / Nobel Prize Outreach AB, Stockholm, Sweden
- 2017: Dentsply IH AB, Mölndal, Sweden
- 2016: ProReNata AB, Stockholm, Sweden
- 2016: University of Skövde, School of Informatics, Sweden
- 2014: The Centre for Business History / Centrum för Näringslivshistoria, Sweden
- 2013: Region- and City Archives of Göteborg, Sweden
- 2013: Region Västra Götaland, Sweden
2013: callas pdfaPilot integration with Alfresco ECM (Pressrelease)
Integrator: Redpill Linpro (Göteborg, Sweden):
2013: Mercedes-Benz Schweiz AG, Switzerland:
Fully automatic conversion to PDF/A with callas pdfaPilot SDK
for electronic longterm archiving - Datasheet (pdf)
- 2012: The National Archives of Sweden, Sweden
2012: Rosendahls, Denmark, print producer for public sector:
Conversion of office documents to PDF/A with callas pdfaPilot,
including PDF tagging using axaio MadeToTag to make PDF content
accessible for all - Datasheet (pdf)
2010: Chamber of Commerce, Italy:
2010: Chamber of Commerce, Italy:
2010: EU Publications Office, Luxembourg:
Strategic Decision: Publish only in PDF/A format.
Converted 11 million paper pages with callas pdfaPilot
to digital PDF/A documents for long-term archiving
in its digital library/e-archive - Datasheet (pdf)
|
Products
|
callas pdfaPilot
PDF/A Technology Solutions for Document Archiving and Preservation
|
Click Here for Free Trial of callas pdfaPilot Desktop, Server, CLI and SDK
(You will be asked to fill in a trial request form.
To help us identify your software download and support you,
please enter the code NewFormat
in the form field named Preferred Reseller).
|
Overview
|
callas pdfaPilot Desktop Switchboard
|
callas pdfaPilot the professional tool to create
ISO standard compliant PDF/A files for long-term archiving
callas pdfaPilot is an easy-to-use, quick and reliable tool.
Easy to use for PDF/A support, as a fast standalone application or
as a plug-in inside Adobe® Acrobat®.
callas pdfaPilot is compatible with Adobe® Acrobat® DC.
callas pdfaPilot Desktop installs standalone or
as a plugin inside Adobe® Acrobat®:
- When installed as a standalone application,
pdfaPilot completely replaces the inbuilt
preflight capability of Adobe Acrobat;
in such cases there is no need to also
buy / install Adobe Acrobat.
- When installed as a plugin into Adobe Acrobat,
pdfaPilot vastly extends and enhances the basic,
inbuilt preflight capability of Adobe Acrobat.
For businesses and government agencies, archives and libraries,
that store, preserve and long-term maintain documents,
callas pdfaPilot is the professional tool to
validate and/or create PDF/A-files.
callas pdfaPilot – Features overview
callas pdfaPilot is easy to learn and use while bringing
numerous features to create ISO standard compliant PDF/A files.
Whether you are looking for a specific color feature,
some document initial view setup, a font repair feature or
just a check to find all red text larger than 12 pt
- the feature lists shown below will make finding
what you are looking for a breeze:
- "Switchboard actions"
- frequently needed features, as easy to use as a phone dial.
- "Testable properties"
- can be used to analyze a PDF or find objects in a PDF.
- "Configurable operations"
- can be used to create your own corrections
and adjustments for PDFs.
- "Configured profiles"
- are combined checks and fixups suitable for
many use cases and industries.
- "Configured checks"
- can be used for a quick analysis of PDFs,
or for finding that specific object in your multi-page PDF.
- "Configured fixups"
- allow you to fix or adjust a certain aspect in a PDF
by just running the fixup.
|
callas pdfaPilot comes in several flavours
for specific needs and automation requirements
callas pdfaPilot product family
The callas pdfaPilot product family provides fully PDF/A compliant
archiving solutions and is the easiest yet most powerful PDF/A
preflight and correction application on the market:
callas pdfaPilot Desktop
For individual and interactive use on desktops.
Desktop application for interactive conversion to PDF/A,
standalone or installed as plugin in Adobe® Acrobat®.
Aimed at bringing fully compliant/reliable PDF/A-based
long-term archiving to a variety of industries.
Offers a neat batch feature to run a quality control profile or
a PDF fix-up on a whole folder of files.
Combines the analysis and validation of existing PDF file
according to PDF/A standard, corrects non-conform,
non-compliant PDF contents and converts to PDF/A.
callas pdfaPilot Server & CLI
Fully customizable server solutions that
allow for automation of workflows through
watched folders or ready for integration into
existing systems through a command line interface.
callas pdfaPilot Server
Is used to automate callas pdfaPilot or when
integrating callas pdfaPilot into an existing workflow.
Professional server solution to automatically create
ISO standard compliant PDF/A files.
Hot folder based automation of PDF/A processes.
Processes high volumes of PDF files:
- PDF/A verification
- comprehensive correction of typical errors
- conversion of PDF files according to PDF/A standard.
Provides clear reports with extensive configuration options.
Provides full multi-processor support for checking and
fixing PDF/A files.
callas pdfaPilot CLI command line interface-based automation of
PDF/A processes is the fastest way for developers to leverage
callas PDF technologies into custom solutions.
High volume PDF/A verification, comprehensive correction of
typical errors, conversion and optimization of PDF files
according to PDF/A standard.
Clear reports with extensive configuration options.
Additional production and/or developer licenses on request.
callas pdfaPilot SDK Software Development Kit
Programming library/software development kit for developers
with a need for PDF/A optimization, validation and correction.
Facilitates the development of any kind of application that
quickly incorporate into custom solutions and workflows
for PDF/A file creation, optimization, validation and
correction for long-term archiving.
|
PDF/A in One Click
callas pdfaPilot saves
non-compliant PDF files as valid PDF/A documents
for archiving with the touch of a button.
callas pdfaPilot converts
Microsoft Office and OpenOffice documents
directly for archiving in PDF/A.
callas pdfaPilot converts
e-mails with attachments for archiving in PDF/A.
callas pdfaPilot converts
documents to PDF/A and PDF/X at the same time.
Useful for organisations that archive (PDF/A) and also need to print (PDF/X).
callas pdfaPilot Desktop validates
PDF files for conformance with PDF/UA,
the ISO Standard for Accessible PDF.
(Click on the picture for more details)
|
Automation
Automation
- callas pdfaPilot provides several alternatives for automation:
- callas pdfaPilot Desktop offers a neat batch feature that
lets you run a quality control profile or a PDF fix-up on a
whole folder of files (up to 100 files)
– easy to use and quick as lightning.
- callas pdfaPilot Server free up operators by hands-off automation
to process higher volumes of files (unlimited number of files).
The hot folder interface lets you setup different watched folders
that automatically pick up files and process them.
Process Plans
- callas pdfaPilot comes with full support for process plans.
- Process Plans take automation flexibility to the next level:
- They allow defining a number of steps a PDF file
should be processed through.
Each step can be a preflight profile,
a single preflight check or fix, or an action
(such as creating images, creating a booklet,
exporting as PostScript file…).
- Process plans allow generating preflight reports
based on the result of each step, or can jump between
different steps based on the processing result of a previous step.
Easy, Automated Impositioning
Impositioning is the action of placement of the pages on a press sheet
so that the sheet after printing and folding carries the pages in
the correct sequence in the finished printed product.
- callas pdfaPilot scales, fits, moves, extends, rotates and
flips pages and page content easily.
- It also imposes booklets, prepares PDF document for
double sided printing, splits double pages into single pages,
duplicates pages, and optimize PDF for use as
slide presentation and handouts, etc.
Adobe Compliant Transparency Flattening
- Handling PDF documents with live transparency when your workflow
or output devices don’t directly support them can be really hard.
- callas pdfaPilot effortlessly flattens those transparencies
while preserving text, images and line-art as much as
possible and without changing the intent of the document.
- And you can rely on that result as it uses
proven Adobe flattening technology.
Remote Configuration and Monitoring
- Every callas pdfaPilot Server comes with
one license of callas pdfaPilot Desktop.
- callas pdfaPilot Desktop allows connecting to your
callas pdfaPilot servers; all configuration and monitoring of
callas pdfaPilot Servers on your network can be done over
the network from the comfort of your own desktop computer.
Straightforward integration into any
e-archiving or MIS environment
- Of course callas pdfaPilot Server supports hot folders.
- It also allows integration using the command-line (CLI) or an
extensive SDK ideal for embedding the latest PDF technology
straight in your e-archiving and/or administration system.
callas pdfaPilot Multi-Server Productivity via Dispatcher
- callas pdfaPilot doesn't even limit itself to one system.
By using automatic load balancing, callas pdfaPilot allows for
configuring a dispatcher that then distributes work across
the network to satellites running on different systems.
- High performance coupled with easy setup and maintenance.
- Allows for load balancing as well as for fail over configuration.
- Processing of files according to "dispatcher-satellite" concept.
- Dispatcher:
A server that doesn’t process files itself,
but instead receives files in its job folders and
sends them to satellites to have them processed.
- Satellite:
A satellite is basically a regular server that will process files.
- As many "satellites" as required can connect to
callas pdfaPilot Dispatcher so that processing does not stop
even if one of the satellites is switched off.
- Fail over:
Processing does not stop
even if one of the satellites is switched off.
Automation of workflows
- callas pdfaPilot integrates seamlessly with
FileTrain for automation of work flows.
|
callas pdfaPilot and Robotic Process Automation (RPA)
Robotic Process Automation (RPA) - Office workflows using PDF
The subject of Robotic Process Automation (RPA),
a more or less very open concept of advanced automation,
is gaining increasing attention in the software industry.
An increasing number of providers from a diverse range of sectors
are introducing applications that can complete tasks previously
handled by humans.
By adopting these RPA applications, users expect to benefit from
optimizing their processes, avoiding errors and eliminating monotonous,
repetitive tasks.
RPA processes can be created in an agile and modular manner
in individual steps without programming knowledge.
callas software is specializing in advanced office automation since very long.
callas background is prepress, but when it comes to requirements on an
automation engine the solutions are more than advanced and many of
the features (conversion to PDF or PDF/A, improving internal structures,
creating booklets, OCR, making sure everything is searchable etc etc)
are relevant for office/document automation as well,
which is where the term RPA is coming from.
Robotic Process Automation (RPA) can learn from prepress
What applies to PDF automation applies to every automation:
processes and processed materials have to be well-defined and standardized.
Robotic Process Automation (RPA) for office processes can build on
experiences that have been made for PDF automation in prepress and print.
How is that?
There are vertical application areas where PDFs are not only
administrative documents, but also the production material itself.
This applies e.g. to manufacturing or design/architectural industries,
where PDF-based drawings carry the actual production data.
But even more advanced are automated prepress workflows.
PDF files from arbitrary sources are prepared in a fully automated
process from which they go straight to the printing machine.
Depending on the diversity and quality of the files,
this can place high demands on the PDF software.
Print shops receive their raw material (PDF files) from numerous
companies and agencies in varying degrees of quality.
Manual corrections should be avoided in RPA, so the first step in
most RPA processes will be the standardization of input material.
This normalization makes sure that the raw material works in
all subsequent processing steps up to printing.
The PDF format provides a solid foundation for RPA applications
In order for RPA applications to work smoothly, processes must both
have a standard structure and work with standardized files.
This is the only way to maximize the number of files
processed by a single automation. In many cases, this means that
RPA can only work with data that it has itself generated.
But why not use a standard here too?
There are many good reasons to use the PDF format whenever
such processes need to interact with third-party data.
After all, PDF is the lowest common denominator for nearly all
document types processed or received in offices.
Office files, emails and even images can be easily converted to PDF,
giving RPA applications a standardized starting point for processing.
Moreover, with the many features it has accumulated over the years,
PDF is the most powerful, versatile document format in the world.
However, not every PDF meets the prerequisites for automatic
processing equally well.
This starts not only with a reliable display model for the document,
but rather the actual data the document contains.
Making PDFs RPA-ready
Data standardization ("normalization"), metadata, tagging
and adjusting of cross-media provisions make PDFs RPA-ready:
Standards, such as PDF/X, are used to streamline
the definition of normalization requirements:
if the raw material is converted to PDF/X,
subsequent steps can rely on that.
Normalization should be the first step
- at least if the source of the data (PDF) is not always the same.
That might even include the creation of PDF
from other source formats:
- Document conversion, office conversion
If the source is an image,
OCR (Optical Character Recognition) should be applied:
- Apply OCR for scanned PDFs
Since PDF is so flexible,
the indata is normalized to a PDF with a certain quality.
This could be based on internally developed standards,
but in most cases an already existing official standard like PDF/A
leave out the need for development of own internal standards.
If needed, requirements can be adjusted to internal needs.
In order to process PDF files automatically,
they need to meet a few requirements.
For example, after scanning a document,
it is generally necessary to fully index the text of it using OCR,
assigning Unicode characters to the extracted content.
Only then RPA processes can evaluate the actual text.
Even "born digital" PDFs may not necessarily
have full Unicode support.
This is where dedicated validation tools
(and repair tools, where necessary) come into play!
One very concrete example would be print files
exported from an ERP system, used to consolidate
outgoing invoices.
A PDF software tool will search through the text,
finding keywords or separators that it uses to split
the single PDF into separate invoices.
Naturally, this only works if the software
can recognize the keywords, and without OCR,
this is only possible if the text has already been
"translated" into Unicode.
Normalization to PDF/A-2u:
- Font embedding
- Make sure Unicode representation is present
And then, the actual processing can take place:
- Extract text
- Embed files
- Merge files
- Split PDF pages at certain keywords
- Normalize page sizes
- Add page numbers if missing
- …
Avoid password protection of documents.
One of the "simplest" stumbling blocks,
in fact a more or less unavoidable one for document-based RPA,
is the password protection that is often applied to documents
without thinking.
Technical and legal restrictions prevent content from
being extracted from password-protected files,
which instead just have to be returned to the sender.
Integrating metadata into PDFs.
By integrating metadata into PDFs, RPA applications can be
provided with pointers that show how to process a given file.
For instance, it may make sense to extract information before
converting the source file, and then add the extracted
information to the PDF.
Consider the following example:
A retail company receives a set of product descriptions
from a supplier in PDF format.
They can add entries to the metadata,
which they then use to classify the files.
If their customers need information, these descriptions can
then be added to individual product catalogs and used to
generate a table of contents.
"Tagging" of PDF files.
Ideally, the PDF files will be "tagged".
This means that not just the semantic component of the text
is defined in Unicode, but also that headers, paragraphs,
image descriptions and tables are described ("tagged")
in a structured data format.
These tags allow the RPA application to tell how to structure
text content (particularly in multi-column layouts),
extract headers and organize images using their descriptions.
Since assigning tags to PDF documents after the fact is a very
time-intensive process, AI is generally used for tasks like
reading forms correctly at the field level.
This makes it even more important to fully index
the text of PDF files as described above.
Tagged PDF its also the foundation and prerequisite for
universally Accessible PDF according to the ISO Standard PDF/UA.
Conclusions on
Robotic Process Automation (RPA) with callas pdfaPilot
Automated workflows can become even smarter
with the help of intelligent RPA technologies.
Businesses looking to maximize process automation
can and should first establish a framework for seamlessly
leveraging RPA-based applications.
Part of this is about building a solid foundation,
using maximally homogeneous, standardized data.
As the highest common factor for Office files,
high-quality PDFs are a good starting point for this foundation.
The prerequisite is that these PDF files can be adapted
to meet the internal requirements and standards.
One of the lessons learned in prepress is that automated processes
cannot usually be created by programmers from scratch.
Real processes are often better developed, adjusted and
fine-tuned when real files in day-to-day work are used.
In order to bring together specialist knowledge and programming,
tools are ideal by which RPA processes can be created in an agile and
modular manner in individual steps without programming knowledge.
That's why callas pdfaPilot has graphical workflow editor that enables
RPA-developers to design processes visually in a drag-and-drop interface
without having to involve programming.
Once a developed RPA process is tested and does what it should,
it can be seamlessly exported from the workflow editor to
the server responsible for the fully automated processing.
|
callas pdfaPilot Desktop, Server, CLI and SDK - Details
callas pdfaPilot Desktop, Server, CLI and SDK
Details - Popup-window
|
For more information contact NewFormat
NewFormat AB
Smörblommegränd 14, SE-165 72 Hässelby (Stockholm), Sweden
tel:+46 (0)70 631 53 01
All content © copyright 2008-2024 NewFormat AB. All rights reserved.
All product names, trademarks and registered trademarks
are property of their respective owners.
callas software Partner
|