How to Generate PDF Documents in Python
Introduction
This article is the result of my first experience with PDF generation tools. Here we are not even talking about Django, but about printing regulated documents from Python using templating engines. Perhaps my experience will help someone save time and eliminate the need to search for the necessary information on the developer pages. Because if you read up to the discussions 7 years ago, you will see that many problems are not solved today.
Below I will detail how to generate a PDF document using two different utilities as an example.
Tasks
The main task was to create a beautiful document with a description of the technical task, the details of which the client left in the form on our website.
How it looks step by step:
- The user enters data into a web form
- The server inserts this data into the template
- Generates a PDF document based on the filled template
- And gives it to the user
Tools
If you go to Google for “Generating pdf documents in Python”, you will find that there are many tools for this, and each forum recommends a different tool.
I’ll show an example using the two.
- WKhtmltopdf is a console-based HTML to PDF rendering utility for which there are many Python-Django wrappers.
- WeasyPrint is a visual HTML and CSS rendering engine that can be exported to PDF and PNG.
Advantages and disadvantages
WKhtmltopdf is great for solving problems when you need to quickly generate a document where pages of text will be presented as an image (text inside the file cannot be selected), and you want to use all modern approaches when writing CSS (Flexbox, Shadowbox).
WeasyPrint will work in the opposite case, the text in the resulting documents can be copied / selected, but, unfortunately, there are some restrictions on the capabilities of CSS, which must be adhered to so that the generation of the document does not take several minutes.
Layout
As an example, consider a portion of an HTML document provided by a colleague of mine.
Note: to be able to substitute a file from static in css, for example background-image: url (“{% static ‘images / check.svg’%}”); we will need to put all the styles in the template in the <style> tag.
Final template
Example for WKhtmltopdf
Note: my attempts to install and use wkhtmltopdf on alpine: 3.12 failed with error code: ERROR: The command ‘wkhtmltopdf index.html out.pdf’ returned a non-zero code: 139.
For some reason in my practice wkhtmltopdf did not work on all versions of x alpine. Here are a few versions that did not have problems: 3.9.4 / 3.10.4 / 3.11.
Pipfile
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true[dev-packages][packages]
django = "~=2.2"
pdfkit = "~=0.6"[requires]
python_version = "3.6"
Code to generate PDF
from django.template.loader import render_to_string, get_template
import tempfile
import pdfkitdef generate_pdf(request, data):
html_template = render_to_string('portable_document/technical_specification_template.html', data)
pdfkit.from_string(html_template, 'out.pdf')
Result
As mentioned above, you will not be able to interact with the text printed on this PDF file. It looks like every word here is an image that you can stretch and rotate.
Outcome
- Put the correct version of alpine.
- Use whatever wrapper we like.
- Use any style.
- Get PDF, where pages are pictures.
Example for WeasyPrint
Documentation and source
Note: first of all, we will install all the dependencies required for this library. Then, since we are using the lightweight version of alpine, we need to install the default fonts.
RUN apk — update — upgrade add gcc musl-dev jpeg-dev zlib-dev libffi-dev cairo-dev pango-dev gdk-pixbuf-dev msttcorefonts-installer fontconfig
RUN update-ms-fonts
Otherwise, we would get the following picture:
Pipfile
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true[dev-packages][packages]
django = "~=2.2"
weasyprint = "~=52.1"[requires]
python_version = "3.6"
Code to generate PDF
Result
This is how the generated PDF looks like, where we can interact with the text. To the left of the document, there is a section break that was generated automatically.
Note
As I said before, when using WeasyPrint, you should avoid using Flexbox, it negatively affects performance. Which I will demonstrate next. More details.
Also, WeasyPrint does not currently support rendering of shadows. More details here.
Total
- Install standard fonts if using alpine
- We do not use Flexbox and Shadow in the layout (we are looking for alternatives):
a) ‘display: flex’ -> ‘display: inline-block’;
b) ‘flex-direction: column’ -> ‘display: block’;
3. As a result, we get generated PDF documents in which you can interact with text
Performance comparison
I do not pretend to be the standard of these values, since execution time was measured as follows:
- start_time = time.clock ()
- generate_pdf ()
- print (time.clock () — start_time, “seconds”)
When using the same template (it is presented in the “Layout” section) and when booting from the network, and not from statics, 5 tests run in a row showed the following results:
Conclusion
The result of the above is obvious: to generate PDF, the tool must be selected based on specific tasks.
If speed of document generation and the ability to use modern CSS tools (for example, Flexbox and Shadowbox) are a priority, WKhtmltopdf will do. But the final text in the document cannot be selected and copied.
If you want the text to be interactive after generating the file, use WeasyPrint. But in this case, you will have to spend more time looking for alternatives to Flexbox and Shadowbox.