jugad2 - Vasudev Ram on software innovation: reporting

Sunday, July 24, 2016

Control break report to PDF with xtopdf

By Vasudev Ram

Hi readers,

Control break reports are very common in data processing, from the earliest days of computing until today. This is because they are a fundamental kind of report, the need for which is ubiquitous across many kinds of organizations.

Here is an example program that generates a control break report and writes it to PDF, using xtopdf, my Python toolkit for PDF creation.

The program is named ControlBreakToPDF.py. It uses xtopdf to generate the PDF output, and the groupby function from the itertools module to handle the control break logic easily.

I've written multiple control-break report generation programs before, including implementing the logic manually, and it can get a little fiddly to get everything just right, particularly when there is more than one level of nesting (i.e. no off-by-one errors, etc.); you have to check for various conditions, set flags, etc.

So it's nice to have Python's itertools.groupby functionality handle it, at least for basic cases. Note that the data needs to be sorted on the grouping key, in order for groupby to work. Here is the code for ControlBreakToPDF.py:

from __future__ import print_function

# ControlBreakToPDF.py
# A program to show how to write simple control break reports
# and send the output to PDF, using itertools.groupby and xtopdf.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# http://jugad2.blogspot.com
# https://gumroad.com/vasudevram

from itertools import groupby
from PDFWriter import PDFWriter

# I hard-code the data here to make the example shorter.
# More commonly, it would be fetched at run-time from a 
# database query or CSV file or similar source.

data = \
[
    ['North', 'Desktop #1', 1000],
    ['South', 'Desktop #3', 1100],
    ['North', 'Laptop #7', 1200],
    ['South', 'Keyboard #4', 200],
    ['North', 'Mouse #2', 50],
    ['East', 'Tablet #5', 200],
    ['West', 'Hard disk #8', 500],
    ['West', 'CD-ROM #6', 150],
    ['South', 'DVD Drive', 150],
    ['East', 'Offline UPS', 250],
]

pw = PDFWriter('SalesReport.pdf')
pw.setFont('Courier', 12)
pw.setHeader('Sales by Region')
pw.setFooter('Using itertools.groupby and xtopdf')

# Convenience function to both print to screen and write to PDF.
def print_and_write(s, pw):
    print(s)
    pw.writeLine(s)

# Set column headers.
headers = ['Region', 'Item', 'Sale Value']
# Set column widths.
widths = [ 10, 15, 10 ]
# Build header string for report.
header_str = ''.join([hdr.center(widths[ind]) \
    for ind, hdr in enumerate(headers)])
print_and_write(header_str, pw)

# Function to base the sorting and grouping on.
def key_func(rec):
    return rec[0]

data.sort(key=key_func)

for region, group in groupby(data, key=key_func):
    print_and_write('', pw)
    # Write group header, i.e. region name.
    print_and_write(region.center(widths[0]), pw)
    # Write group's rows, i.e. sales data for the region.
    for row in group:
        # Build formatted row string.
        row_str = ''.join(str(col).rjust(widths[ind + 1]) \
            for ind, col in enumerate(row[1:]))
        print_and_write(' ' * widths[0] + row_str, pw)
pw.close()

Running it gives this output on the screen:

$ python ControlBreakToPDF.py
  Region        Item     Sale Value

   East
                Tablet #5       200
              Offline UPS       250

  North
               Desktop #1      1000
                Laptop #7      1200
                 Mouse #2        50

  South
               Desktop #3      1100
              Keyboard #4       200
                DVD Drive       150

   West
             Hard disk #8       500
                CD-ROM #6       150

$

And this is a screenshot of the PDF output, viewed in Foxit PDF Reader:

So the itertools.groupby function basically provides roughly the same sort of functionality that SQL's GROUP BY clause provides (of course, when included in a complete SELECT statement). The difference is that with Python's groupby, you do the grouping and related processing in your program code, on data which is in memory, while if using SQL via a client-server RDBMS from your program, the grouping and processing will happen on the database server and only the aggregate results will be sent to your program to process further. Both methods can have pros and cons, depending on the needs of the application.

In my next post about Python, I'll use this program as one vehicle to demonstrate some uses of randomness in testing, continuing the series titled "The many uses of randomness", the earlier two parts of which are here and here.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Follow me on Gumroad to be notified about my new products:

My Python posts Subscribe to my blog by email

My ActiveState recipes

Share |

Monday, September 9, 2013

Publish MongoDB data to PDF with xtopdf

By Vasudev Ram

This program, MongoDBToPDF.py, is a demo of how to publish MongoDB (Wikipedia) data to PDF, using my xtopdf toolkit for PDF creation from other data formats.

To run this program, you need to have Python, Reportlab, xtopdf and MongoDB installed on your system.

Here is the MongoDBToPDF program:

# MongoDBToPDF.py
# Program to publish MongoDB data to PDF using xtopdf.
# Author: Vasudev Ram - http://www.dancingbison.com
# Copyright 2013 Vasudev Ram

from PDFWriter import PDFWriter

import pymongo
from pymongo import MongoClient

# Create a PDFWriter object and set some of its fields.
pw = PDFWriter("MongoDB_data.pdf")
pw.setFont("Courier", 12)
pw.setHeader("MongoDB data to PDF")
pw.setFooter("Generated by xtopdf")

# Connect to MongoDB database.
client = pymongo.MongoClient("localhost", 27017)
db = client.test

# Create a collection.
persons = db.persons

# Add some data to it.
db.persons.save({"Name": "Tom", "Age": 10})
db.persons.save({"Name": "Dick", "Age": 20})
db.persons.save({"Name": "Harry", "Age": 30})

# Loop over the collection and print the items to the screen.
for item in db.persons.find():
    print item["Name"], item["Age"]

# Create an index on the Name field.
db.persons.create_index("Name")

# Loop over the sorted items and print them to PDF.
for item in db.persons.find().sort("Name", pymongo.ASCENDING):
    pw.writeLine(item["Name"] + " | " + str(item["Age"]))

pw.close()

# EOF

Save the above program as MongoDBToPDF.py .

Start the MongoDB server (daemon / service) if it is not already running, with:

mongod

You can now run the program with:

python MongoDBToPDF.py

The output will be in the file MongoDB_data.pdf, which you can view in any suitable PDF viewer, such as Adobe Reader or Foxit PDF Reader.

The above program does the following, broadly speaking:

- creates a PDFWriter instance
- connects to a MongoDB database
- creates a collection and populates it with some person data for the demo
- creates an index on the Name field
- sorts the data on the Name field and prints the data to PDF

According to the Wikipedia article linked above, Craigslist, SAP, Forbes, The New York Times, SourceForge, The Guardian, CERN, Foursquare and eBay are some organizations that use MongoDB.

Read all xtopdf posts on jugad2.

Read all Python posts on jugad2.

- Vasudev Ram - Dancing Bison Enterprises

Contact me

Share |

jugad2 - Vasudev Ram on software innovation

Pages

Sunday, July 24, 2016

Control break report to PDF with xtopdf

Monday, September 9, 2013

Publish MongoDB data to PDF with xtopdf

Blog Archive

Labels