As part of my ongoing efforts to stave off living in a cardboard box in the middle of a lake, I took on a project for a friend and former cow-orker last year, to produce PDF files from a database for a client. I did it in PHP, because that was the only tool available that looked like it might work, but I now regret it. Apparently, when it tries to generate the whole 400-odd page on the friend's server, it crashes Apache and the whole server needs to be rebooted (!). Even running on my laptop causes interesting side-effects, often requiring a Level Two Diagnostic*. I suspect the third-party library really isn't up to the sort of contortions I put it through, sadly.
So I'm looking for an alternative. Mr Death, the aforementioned friend, has control over the server, so he can install whatever I need, but I really should try to stick with languages that other people know. That and the Linuxness of the system mean no Delphi, no Common Lisp and no hand-coded PostScript. My best bet is probably Python. So if any Pythonistas out there can recommend a good, fast, open-source PDF generation library for Python, please let me know. The features it needs, in increasing order of trickiness, are:
- Scaled images in the text
- Table-based layouts in some sections
- Header and footer definition
- Widow/orphan prevention, so that if a paragraph is going to be broken before the end of its column, it "moves" to the top of the next column.
- Two columns per page in some sections
- Image watermarks behind the text
- Multiple font styles within the one fully-justified paragraph
- Page numbers in the form "Page X of Y", filled in after you actually find out how many pages there are.
- Out-of-order page generation, since I'll need to create a table of contents on the fly and then stick it at the start of the document.
The library I used (and extended) had these features, but it got horribly slow, largely because the widow/orphan control forced it to generate nearly every paragraph twice. The underlying library didn't have caching of pre-calculated paragraphs, and the code is so badly written that I couldn't begin to add that sort of feature.
Can anyone advise? I'm already aware of ReportLab, and I've asked on their mailing list if it can handle the last and greatest of those features, but actual experience from real live users would be nice too.
* For those of my audience unfamiliar with the ways of system administration:
Level One Diagnostic: is it plugged in?
Level Two Diagnostic: have you tried rebooting?
Level Three Diagnostic: we've reformatted your hard disk for you; would you like it back?
