Converting HTML to Pdf with Python and Qt

In our recent project we needed to convert HTML to Pdf. We first tried PHP based library called tcpdf, but it has lots of limitations.
Finally solution was to use Qt. Here is the code written in PyQT:

import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

app = QApplication(sys.argv)

web = QWebView()
web.load(QUrl("http://www.google.com"))
#web.show()

printer = QPrinter()
printer.setPageSize(QPrinter.A4)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName("file.pdf")

def convertIt():
    web.print_(printer)
    print "Pdf generated"
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

sys.exit(app.exec_())

To convert HTML text we can also use setHtml() instead of load() which takes url of the page you want to convert. But on Windows when we use setHtml() the loadFinished() signal was not emitted( bug?? ) whereas it works on Linux.

7 thoughts on “Converting HTML to Pdf with Python and Qt

  1. Hi,
    U have some questions about above code.

    1. It’s possible to convert the local html files like(c:\test.html)
    2. Another question is i tries to implement this inside another python program but my python script was terminated. It’s possible to use this code as a function?.

    Thanks
    loganathan

  2. Thanks for the code. Was it easy to do? I have tried a few programs to convert html to pdf but the one I have found to be the most user friendly is http://docraptor.com. I tried the free version and it was simple.

  3. For the code to work properly, you may have to put the “def” part before the:
    app = QApplication(sys.argv)
    Otherwise it would give the follow error:

    Traceback (most recent call last):
    File “./using_pyqt.py”, line 21, in convertIt
    web.print_(printer)
    RuntimeError: underlying C/C++ object has been deleted
    Error in sys.excepthook:
    Traceback (most recent call last):
    File “/usr/lib/python2.6/dist-packages/apport_python_hook.py”, line 48, in apport_excepthook
    if not enabled():
    File “/usr/lib/python2.6/dist-packages/apport_python_hook.py”, line 21, in enabled
    import re
    ImportError: No module named re

    Original exception was:
    Traceback (most recent call last):
    File “./using_pyqt.py”, line 21, in convertIt
    web.print_(printer)
    RuntimeError: underlying C/C++ object has been deleted

  4. It is not converting the files from localhost whenever I am trying to convert from localhost file into pdf it is generating the file but inside that file following message comes:
    Not Found
    The requested URL /accounts/login/ was not found on this server.
    Why???
    Or if I have to convert multiple url links to 1file will it work??

  5. It’s very cool, after a long time of searching for other methods. Thank you very much.

Leave a comment