/

May 29, 2023

Efficient PDF Generation – Part 1

Problem Statement

As part of product development, developers frequently encounter the need to provide comprehensive PDF reports. While multiple approaches are available to generate the rich PDF reports required for your product, building an efficient and elegant solution necessitates a certain amount of research. Surprisingly, such a solution has been elusive. 

This article aims to take you through the journey we undertook and the solution we decided on to develop sophisticated PDF reports for one of our clients.

High-Level Requirement

We had two broad requirements:

  1. Generate a high-fidelity PDF report, approximately 3-4 pages long, containing tabular data components, dynamically generated pie charts, and a summary report in a specific format. The designs of the PDF pages were provided by our UX designers.
  1. Ensure efficient utilization of compute and memory resources for PDF generation, enabling support for the product’s scalability as it grows over time.

Approaches

Option 1: HTML to PDF

There are primarily two options available for converting HTML to PDF:

  1. Using a Webkit package.
  2. Utilizing a headless browser to render your HTML into the PDF layout.

In the following discussion, we will explore both approaches, examining their benefits and challenges.

Webkit rendering of HTML

WebKit is a browser engine developed by Apple and primarily used in Safari web browsers and iOS web browsers. Webkit rendering involves printing the rendered output, in our case, to PDF.

Rendering using Webkit is memory-efficient and faster compared to using a headless browser, as it leverages the rendering power of the Webkit browser engine.

Some packages that utilize Webkit rendering include wkhtml, PDFkit, and others. For more details on wkhtmltopdf, follow this link.

While this approach appears straightforward, there are potential challenges. Support for all HTML fonts and CSS properties is not available. Additionally, these packages read template files from local storage and save the generated files to disk, which can result in high I/O file operations.

Therefore, before generating the PDF, you will need to create HTML code specifically for the PDF, along with custom CSS properties, as certain CSS libraries like Bootstrap may not function as expected.

Headless browser approach

Examples of headless browsers include Puppeteer and PhantomJS.

Generating a PDF using a headless browser is a straightforward process. You can either pass in the file URL that you want to convert into a PDF or provide a generated HTML template with updated placeholder values that will be converted into a PDF file.

One advantage of this approach is that it supports almost all HTML and CSS properties.

The main challenge of using this approach is its memory-intensive nature. Each request opens a new browser session, resulting in significant memory consumption.

Additionally, PDFs generated using these workflows tend to have large file sizes. If you anticipate a large number of requests, this approach may not be as efficient.

Option 2: SVG to PDF

Our use case required a solution capable of scaling and handling a larger volume of PDF generation requests. Additionally, the PDF designs provided by our designer were ambitious, featuring rich designs, dynamic charts, and infographics. Consequently, we opted to explore SVG conversion.

According to Wikipedia: Scalable Vector Graphics (SVG) is an XML-based vector image format for two-dimensional graphics that supports interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999. SVG images and their behaviors are defined in XML text files.

SVG is a markup language similar to HTML and is widely used for generating components. It has been extensively tested for rendering complex drawings on the web. Since SVG contains all the drawing information within itself, it works exceptionally well with SVGtoPDF extensions of PDFkit (WebKit rendering).

One major advantage of this approach is that you won’t need to struggle with faithfully converting designs into PDF layouts. Designers can export PSD designs directly into SVG templates, saving significant development time.

If you choose this approach, the first step is to familiarize yourself with the basics of SVG. This will enable you to efficiently update it dynamically and render the required dynamic values. You can learn the basics of SVG from the following resources:

  1. W3schools
  2. TutorialsPoint

The biggest difference between HTML and SVG is the responsiveness of the design. While HTML rendering allows for responsiveness in nature, adjusting dynamically to the content we fit in, SVG lacks the same level of flexibility as it works with absolute coordinates. For example, in a use case where your PDF report contains tabular content with a varying number of records based on user data:

Using HTML, you can render such a table using a loop, dynamically adding rows. However, in SVG, you need to precisely position each new row with absolute coordinates in the PDF document.

Thus, SVG has less flexibility when it comes to serving repeated dynamic content. However, this is not a weakness if your use case has a reasonable upper limit on variable records. In such cases, you can create different templates with varying numbers of rows. Depending on the user data, you can choose the corresponding template and update the dynamic data in those rows.

Conclusion

Based on our explorations, we highly recommend Option #2, converting SVG to PDF for the following reasons:

  1. SVG to PDF is significantly more efficient than HTML to PDF conversion.
  2. SVG allows good results after scaling, so SVGs can be rendered in various sizes without losing quality.
  3. SVG output can be pretty rich and templates can be built directly using Photoshop.

In part 2 of this blog, we will explain how to go about creating your own SVG to PDF report with examples.