jugad2 - Vasudev Ram on software innovation: xtopdf

Managed WordPress Hosting by FlyWheel

Share |

Tuesday, December 6, 2016

[xtopdf] Wildcard text files to PDF with xtopdf and glob

By Vasudev Ram

First joker card image attribution

This is another in my series of applications that use xtopdf (source), my Python toolkit for PDF creation from other formats.

[ Here's a good overview of xtopdf, its uses, supported platforms and formats, etc. ]

I called this app WildcardTextToPDFpy.. It lets you specify a wildcard for text files, and then converts each of the text files matching the wildcard (like grades*2016.txt or monthly*sales.txt) [1], into corresponding PDF files - with the same names but with '.pdf' appended.

[1] For example, the wildcard grades*2016.txt could match grades-math-2016.txt and grades-bio-2016.txt (think a school or college), while monthly*sales.txt might match monthly-car-sales.txt and monthly-bike-sales.txt (think a vehicle dealership), so a PDF file will be generated for each text file matching the given wildcard.

The program uses the iglob function from the glob module in Python's standard library, similar to how this recent other post:

Simple directory lister with multiple wildcard arguments

used the glob function. The difference is that glob returns a list, while iglob returns a generator, so it will return the matching filenames lazily, on demand, as shown here:

$ python
>>> g1 = glob.glob('text*.txt')
>>> g1
['text1.txt', 'text2.txt', 'text3.txt']
>>>>>> g2 = glob.iglob('text*.txt')
>>> g2
<generator object iglob at 0x027C2850>
>>> next(g2)
'text1.txt'
>>> for f in g2:
...     print f
...
text2.txt
text3.txt

Here is the code for WildcardTextToPDF.py:

from __future__ import print_function

# WildcardTextToPDF.py
# Convert the text files specified by a filename wildcard,
# like '*.txt' or 'foo*bar*baz.txt', to PDF files.
# Each text file's content goes to a separate PDF file, with the 
# PDF file name being the full text file name (including the 
# '.txt' part), with '.pdf' appended.
# Requires:
# - xtopdf: https://bitbucket.org/vasudevram/xtopdf
# - ReportLab: https://www.reportlab.com/ftp/reportlab-1.21.1.tar.gz
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Product store: https://gumroad.com/vasudevram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com

import sys
import os
import glob
from PDFWriter import PDFWriter

def usage(argv):
    sys.stderr.write("Usage: python {} txt_filename_pattern\n".format(argv[0]))
    sys.stderr.write("E.g. python {} foo*.txt\n".format(argv[0]))

def text_to_pdf(txt_filename):
    pw = PDFWriter(txt_filename + '.pdf')
    pw.setFont('Courier', 12)
    pw.setHeader('{} converted to PDF'.format(txt_filename))
    pw.setFooter('PDF conversion by xtopdf: https://google.com/search?q=xtopdf')

    with open(txt_filename) as txt_fil:
        for line in txt_fil:
            pw.writeLine(line.strip('\n'))
        pw.savePage()

def main():
    if len(sys.argv) != 2:
        usage(sys.argv)
        sys.exit(0)

    try:
        for filename in glob.glob(sys.argv[1]):
            print("Converting {} to {}".format(filename, filename + '.pdf'))
            text_to_pdf(filename)
    except Exception as e:
        print("Caught Exception: type: {}, message: {}".format(\
            e.__class__, str(e)))

if __name__ == '__main__':
    main()

And here are the relevant files before and after running the program, with the program's output in between:

$ dir text*.txt/b
text1.txt
text2.txt
text3.txt

$ python WildcardTextToPDF2.py text*.txt
Converting text1.txt to text1.txt.pdf
Converting text2.txt to text2.txt.pdf
Converting text3.txt to text3.txt.pdf

$ dir text?.txt*/od/b
text1.txt
text2.txt
text3.txt
text1.txt.pdf
text2.txt.pdf
text3.txt.pdf

Finally, here's a cropped screenshot of the third output file, text3.txt.pdf, in Foxit PDF Reader (a lightweight PDF reader that I use):

Also look up:

[xtopdf] Batch convert text files to PDF (with xtopdf and fileinput)

for another variation on the approach to converting text files to PDF.

Speaking of generators, also check out this other post about them:

Python generators are pluggable

The image at the top of the post is of the earliest Joker card by Samuel Hart c. 1863, according to Wikipedia:

Joker (playing card)

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Share |

Saturday, November 5, 2016

[xtopdf] Batch convert text files to PDF (with xtopdf and fileinput)

By Vasudev Ram

file1.txt + file2.txt + file3.txt => file123.pdf

I created this new xtopdf app recently. (For those unfamiliar with it, xtopdf (source here) is my open source Python project for PDF generation from other formats and sources. Here is a good high-level overview of xtopdf, describing what it is and can do, its supported input formats, platforms (Windows, Linux, Mac OS X, Unix) and environments (CLI, GUI, Web), etc. The core of the xtopdf project is a library, and what I call xtopdf apps, are applications built using that library.)

This particular app lets you batch-convert multiple text files at a time, to a PDF file. The content of each text file starts on a new page in the PDF file. The program uses xtopdf (which uses ReportLab) and the fileinput module from Python's standard library. The program could be written without using the fileinput module too, and I've done a variant of it that way earlier, but I used fileinput this time for convenience, and to show a use of it.

(BTW, fileinput is a pretty useful module in its own right, for this sort of work - applying the same process (any process, not just PDF generation) to a bunch of input files. fileinput can also read from standard input if no input filenames are specified, but I don't use that feature here. Also, I used 4 functions from the fileinput module, on 4 consecutive lines, in this short program :) - not just for the sake of it, though; it made sense to do so.)

Here is the code, in file BatchTextToPDF.py:

from __future__ import print_function

# BatchTextToPDF.py
# Convert a batch of text files to a single PDF.
# Each text file's content starts on a new page in the PDF file.
# Requires:
# - xtopdf: https://bitbucket.org/vasudevram/xtopdf
# - ReportLab: https://www.reportlab.com/ftp/reportlab-1.21.1.tar.gz
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Product store: https://gumroad.com/vasudevram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com

import sys
import fileinput
from PDFWriter import PDFWriter

def usage(prog_name):
    sys.stderr.write("Usage: {} outfile.pdf infile1.txt ...".format(prog_name))

def main():

    if len(sys.argv) < 3:
        usage(sys.argv[0])
        sys.exit(0)

    try:
        pw = PDFWriter(sys.argv[1])
        pw.setFont('Courier', 12)
        pw.setFooter('xtopdf: https://google.com/search?q=xtopdf')

        for line in fileinput.input(sys.argv[2:]):
            if fileinput.filelineno() == 1:
                pw.setHeader(fileinput.filename())
                if fileinput.lineno() != 1:
                    pw.savePage()
            pw.writeLine(line.strip('\n'))

        pw.savePage()
        pw.close()
    except Exception as e:
        print("Caught Exception: type: {}, message: {}".format(\
            e.__class__, str(e)))

if __name__ == '__main__':
    main()

Here is a sample run of the program. I created 3 text files, text1.txt through text3.txt, with the respective number of lines in them. Then ran the command:

python BTTP123.pdf text1.txt text2.txt text3.txt

This created the PDF file BTTP123.pdf. Cropped screenshots of the 1st and 3rd (last) page of the PDF are below:

1st page:

3rd page:

In this example I've closed the PDFWriter instance manually, using pw.close(), but PDFWriter can also be used with the Python with statement, since I had added context manager support to PDFWriter earlier. I use the with statement in some of my xtopdf app examples, and not in others, to show that both possibilities exist.

Here is a Guide to installing and using xtopdf, including creating simple PDF e-books with it.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Share |

Wednesday, October 26, 2016

Read from CSV with D, write to PDF with Python

By Vasudev Ram

CSV => PDF

Here is another in my series of applications of xtopdf, my PDF creation toolkit for Python (xtopdf source here).

This xtopdf application is actually a pipeline (nothing Unix-specific though, will work on both *nix and Windows) - a D program reading CSV data and sending it to a Python program, which writes the data to PDF.

The D program, read_csv.d, reads CSV data from a .csv file, and writes it to standard output.

The Python program, StdinToPDF.py (which is part of the xtopdf toolkit), reads its standard input (which is redirected by the pipeline to come from the D program's standard output) and writes the data it reads, to PDF.

Here is the D program, read_csv.d:

/**************************************************
File: read_csv.d
Purpose: A program to read CSV data from a file and 
write it to standard output.
Author: Vasudev Ram
Date created: 2016-10-25
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
**************************************************/

import std.algorithm;
import std.array;
import std.csv;
import std.stdio;
import std.file;
import std.typecons;

int main()
{
    try {
        stderr.writeln("Reading CSV data from file.");
        auto file = File("input.csv", "r");
        foreach (record;
            file.byLine.joiner("\n").csvReader!(Tuple!(string, string, int)))
        {
            writefln("%s works as a %s and earns $%d per year",
                     record[0], record[1], record[2]);
        }
    } catch (CSVException csve) {
        stderr.writeln("Caught CSVException: msg = ", csve.msg, 
        " at row, col = ", csve.row, ", ", csve.col);
    } catch (FileException fe) {
        stderr.writeln("Caught FileException: msg = ", fe.msg);
    } catch (Exception e) {
        stderr.writeln("Caught Exception: msg = ", e.msg);
    }
    return 0;
}

The D program is compiled as usual with:

dmd read_csv.d

I ran it first (only the D program) with an invalid CSV file (it has an extra comma at the start on line 3, which invalidates the data by making "Driver" be in the salary column position), and got the expected error message, which includes the row and column number of the place in the CSV file where the program encountered the error - this is useful for fixing the input data:

$ type input.csv
Jack,Carpenter,40000
Tom,Blacksmith,50000
,Jill,Driver,60000
$ read_csv
Reading CSV data from file.
Jack works as a Carpenter and earns $40000 per year
Tom works as a Blacksmith and earns $50000 per year
Caught CSVException: msg = Unexpected 'D' when converting from type string to type int 
at row, col = 3, 3

Then I ran it again, in the regular way, this time with a valid CSV file, and as part of a pipeline, the other pipeline component being StdinToPDF:

$ read_csv | python StdinToPDF.py csv_output.pdf
Reading CSV data from file.

And here is a cropped view of the output as seen in Foxit PDF Reader:

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Share |

Monday, October 10, 2016

PDF cheat sheet: bin/oct/dec/hex conversion (0-255)

By Vasudev Ram

Numeral system image attribution

Hi readers,

Here is another in my series of PDF-generation applications built using xtopdf, my Python toolkit for PDF creation from various data formats.

This program generates a PDF cheat sheet for conversion of the numbers 0 to 255 between binary, octal, decimal and hexadecimal numeral systems. It can be useful for programmers, electrical / electronics engineers, scientists or anyone else who has to deal with numbers in those bases.

Here is the program, in file number_systems.py:

from __future__ import print_function
from PDFWriter import PDFWriter
import sys

'''
A program to generate a table of numbers from 
0 to 255, in 4 numbering systems:
    - binary
    - octal
    - decimal
    - hexadecimal
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store on Gumroad: https://gumroad.com/vasudevram
'''

def print_and_write(s, pw):
    print(s)
    pw.writeLine(s)

sa, lsa = sys.argv, len(sys.argv)
if lsa == 1:
    sys.stderr.write("Usage: {} out_filename.pdf\n".format(sa[0]))
    sys.exit(1)

with PDFWriter(sa[1]) as pw:

    pw.setFont('Courier', 12)
    pw.setHeader('*** Number table: 0 to 255 in bases 2, 8, 10, 16 ***')
    pw.setFooter('*** By xtopdf: https://google.com/search?q=xtopdf ***')
    b = "Bin"; o = "Oct"; d = "Dec"; h = "Hex"
    header = "{b:>10}{o:>10}{d:>10}{h:>10}".format(b=b, o=o, d=d, h=h)

    for i in range(256):
        if i % 16 == 0:
            print_and_write(header, pw)
        print_and_write("{b:>10}{o:>10}{d:>10}{h:>10}".format( \
            b=bin(i), o=oct(i), d=str(i), h=hex(i)), pw)

    print_and_write(header, pw)

And here is a screenshot of first page of the PDF output, after running the program
with the command: python number_systems.py BODH-255.pdf

Ans here is a screenshot of the last page:

You can get the cheat sheet from my Gumroad store here: gum.co/BODH-255.

(You can also get email updates about my future products.)

I named the output BODH-255.pdf (from Binary Decimal Octal Hexadecimal 255), instead of BODH-0-255.pdf, because, of course, programmers count from zero, so the 0 can be implicit :)

(The formatting of the output is a little unconventional, due to the use of the header line occurring after every 16 lines in the table, but it is on purpose, because I am experimenting with different formats, to see the effect on readability / usability).

Notice 1) the smooth and systematic progression of the numbers in the vertical columns (how the values change like a car's odometer), and also 2) the relationships between numbers in different columns in the same row, when compared across any rows where one number is a multiple of another, e.g. look at the rows for decimal 32, 64, 128). Both these effects that we see are due to the inherent properties of the numbers with those values and their representation in those number bases. Effect 1) - like a car's odometer - is most noticeable in the Bin(ary) column - because of the way all the 1 bits flip to 0 when the number bumps up to a power of 2 - e.g. when binary 111 (decimal 7) changes to binary 1000 (decimal 8 == 2 ** 3), but the effect can be seen in the other columns as well.

The image at the top of the post is from this Wikipedia page:

Numeral system

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Share |

Thursday, September 29, 2016

Publish Peewee ORM data to PDF with xtopdf

By Vasudev Ram

Peewee => PDF

Peewee is a small, expressive ORM for Python, created by Charles Leifer.

After trying out Peewee a bit, I thought of writing another application of xtopdf (my Python toolkit for PDF creation), to publish Peewee data to PDF. I used an SQLite database underlying the Peewee ORM, but it also supports MySQL and PostgreSQL, per the docs. Here is the program, in file PeeweeToPDF.py:

# PeeweeToPDF.py
# Purpose: To show basics of publishing Peewee ORM data to PDF.
# Requires: Peewee ORM and xtopdf.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

from peewee import *
from PDFWriter import PDFWriter

def print_and_write(pw, s):
    print s
    pw.writeLine(s)

# Define the database.
db = SqliteDatabase('contacts.db')

# Define the model for contacts.
class Contact(Model):
    name = CharField()
    age = IntegerField()
    skills = CharField()
    title = CharField()

    class Meta:
        database = db

# Connect to the database.
db.connect() 

# Drop the Contact table if it exists.
db.drop_tables([Contact])

# Create the Contact table.
db.create_tables([Contact])

# Define some contact rows.
contacts = (
    ('Albert Einstein', 22, 'Science', 'Physicist'),
    ('Benjamin Franklin', 32, 'Many', 'Polymath'),
    ('Samuel Johnson', 42, 'Writing', 'Writer')
)

# Save the contact rows to the contacts table.
for contact in contacts:
    c = Contact(name=contact[0], age=contact[1], \
    skills=contact[2], title=contact[3])
    c.save()

sep = '-' * (20 + 5 + 10 + 15)

# Publish the contact rows to PDF.
with PDFWriter('contacts.pdf') as pw:
    pw.setFont('Courier', 12)
    pw.setHeader('Demo of publishing Peewee ORM data to PDF')
    pw.setFooter('Generated by xtopdf: slides.com/vasudevram/xtopdf')
    print_and_write(pw, sep)
    print_and_write(pw, 
        "Name".ljust(20) + "Age".center(5) + 
        "Skills".ljust(10) + "Title".ljust(15))
    print_and_write(pw, sep)

    # Loop over all rows queried from the contacts table.
    for contact in Contact.select():
        print_and_write(pw, 
            contact.name.ljust(20) + 
            str(contact.age).center(5) + 
            contact.skills.ljust(10) + 
            contact.title.ljust(15))
    print_and_write(pw, sep)

# Close the database connection.
db.close()

I could have used Python's namedtuple feature instead of tuples, but did not do it for this small program.

I ran the program with:

python PeeweeToPDF.py

Here is a screenshot of the output as seen in Foxit PDF Reader (click image to enlarge):

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Share |

Wednesday, September 14, 2016

Func-y D + Python pipeline to generate PDF

By Vasudev Ram

Hi, readers,

Here is a pipeline that generates some output from a D (language) program and passes it to a Python program that converts that output to PDF. The D program makes use of a bit of (simple) template programming / generics and a bit of functional programming (using the std.functional module from Phobos, D's standard library).

D => Python

I'm showing this as yet another example of the uses of xtopdf, my Python toolkit for PDF creation, as well as for the D part, which is fun (pun intended :), and because those D features are powerful.

The D program is derived, with some modifications, from a program in this post by Gary Willoughby,

More hidden treasure in the D standard library .

That program demonstrates, among other things, the pipe feature of D from the std.functional module.

First, the D program, student_grades.d:

/*
student_grades.d
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Adapts code from:
http://nomad.so/2015/08/more-hidden-treasure-in-the-d-standard-library/
*/

import std.stdio;
import std.algorithm;
import std.array;
import std.conv;
import std.functional;

// Set up the functional pipeline by composing some functions.
alias sumString = pipe!(split, map!(to!(int)), sum);

void main(string[] args)
{
    // Data to be transformed:
    // Each string has the student name followed by 
    // their grade in 5 subjects.
    auto student_grades_list = 
    [
        "A 1 2 3 4 5",
        "B 2 3 4 5 6",
        "C 3 4 5 6 7",
        "D 4 5 6 7 8",
        "E 5 6 7 8 9",
    ];

    // Transform the data for each student.
    foreach(student_grades; student_grades_list) {
        auto student = student_grades[0]; // name
        auto total = sumString(student_grades[2..$]); // grade total
        writeln("Grade total for student ", student, ": ", total);
    }
}

The initial data (which the D program transforms) is hard-coded, but that can easily be changed to read it from a text file, for instance. The program uses pipe() to compose [1] the functions split, map (to int), and sum (some of which are generic / template functions).

So, when it is run, each string (student_grades) of the input array student_grades_list is split (into an array of smaller strings, by space as the delimiter); then each string in the array (except for the first, which is the name), is mapped (converted) to integer; then all the integers are summed up to get the student's grade total; finally these names and totals are written to standard output. That becomes the input to the next stage of the pipeline, the Python program, which does the conversion to PDF.

Build the D program with:

dmd -o- student_grades.d

which gives us the executable, student_grades(.exe).

The Python part of the pipeline is StdinToPDF.py, one of the apps in the xtopdf toolkit, which is designed to be used in pipelines such as the above - basically, any Unix, Windows or Mac OS X pipeline, that generates text as its final output, can be further terminated by StdinToPDF, resulting in conversion of that text to PDF. It is a small Python app written using the core class of the xtopdf library, PDFWriter. Here is the original post about StdinToPDF:

[xtopdf] PDFWriter can create PDF from standard input

Here is the pipeline:

$ student_grades | python StdinToPDF.py sg.pdf

Below is a cropped screenshot of the generated output, sg.pdf, as seen in Foxit PDF Reader.

[1] Speaking of functional composition, you may like to check out this earlier post by me:

fmap(), "inverse" of Python map() function

It's about a different way of composing functions (for certain cases). It also has an interesting comment exchange between me and a reader, who showed both how to do it in Scala, and another way to do it in Python. Also, there is another way to compose functions in D, using a function called compose; it works like pipe, but the composed functions are written in the reverse order. It's in the same D module as pipe, std.functional.

Finally (before I pipe down - for today :), check out this HN comment by me (on another topic), in which I mention and link to multiple other ways of doing pipe-like stuff in Python:

Comment on Streem – a new programming language from Matz

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

My Python posts Subscribe to my blog by email

Share |

Sunday, July 24, 2016

Control break report to PDF with xtopdf

By Vasudev Ram

Hi readers,

Control break reports are very common in data processing, from the earliest days of computing until today. This is because they are a fundamental kind of report, the need for which is ubiquitous across many kinds of organizations.

Here is an example program that generates a control break report and writes it to PDF, using xtopdf, my Python toolkit for PDF creation.

The program is named ControlBreakToPDF.py. It uses xtopdf to generate the PDF output, and the groupby function from the itertools module to handle the control break logic easily.

I've written multiple control-break report generation programs before, including implementing the logic manually, and it can get a little fiddly to get everything just right, particularly when there is more than one level of nesting (i.e. no off-by-one errors, etc.); you have to check for various conditions, set flags, etc.

So it's nice to have Python's itertools.groupby functionality handle it, at least for basic cases. Note that the data needs to be sorted on the grouping key, in order for groupby to work. Here is the code for ControlBreakToPDF.py:

from __future__ import print_function

# ControlBreakToPDF.py
# A program to show how to write simple control break reports
# and send the output to PDF, using itertools.groupby and xtopdf.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# http://jugad2.blogspot.com
# https://gumroad.com/vasudevram

from itertools import groupby
from PDFWriter import PDFWriter

# I hard-code the data here to make the example shorter.
# More commonly, it would be fetched at run-time from a 
# database query or CSV file or similar source.

data = \
[
    ['North', 'Desktop #1', 1000],
    ['South', 'Desktop #3', 1100],
    ['North', 'Laptop #7', 1200],
    ['South', 'Keyboard #4', 200],
    ['North', 'Mouse #2', 50],
    ['East', 'Tablet #5', 200],
    ['West', 'Hard disk #8', 500],
    ['West', 'CD-ROM #6', 150],
    ['South', 'DVD Drive', 150],
    ['East', 'Offline UPS', 250],
]

pw = PDFWriter('SalesReport.pdf')
pw.setFont('Courier', 12)
pw.setHeader('Sales by Region')
pw.setFooter('Using itertools.groupby and xtopdf')

# Convenience function to both print to screen and write to PDF.
def print_and_write(s, pw):
    print(s)
    pw.writeLine(s)

# Set column headers.
headers = ['Region', 'Item', 'Sale Value']
# Set column widths.
widths = [ 10, 15, 10 ]
# Build header string for report.
header_str = ''.join([hdr.center(widths[ind]) \
    for ind, hdr in enumerate(headers)])
print_and_write(header_str, pw)

# Function to base the sorting and grouping on.
def key_func(rec):
    return rec[0]

data.sort(key=key_func)

for region, group in groupby(data, key=key_func):
    print_and_write('', pw)
    # Write group header, i.e. region name.
    print_and_write(region.center(widths[0]), pw)
    # Write group's rows, i.e. sales data for the region.
    for row in group:
        # Build formatted row string.
        row_str = ''.join(str(col).rjust(widths[ind + 1]) \
            for ind, col in enumerate(row[1:]))
        print_and_write(' ' * widths[0] + row_str, pw)
pw.close()

Running it gives this output on the screen:

$ python ControlBreakToPDF.py
  Region        Item     Sale Value

   East
                Tablet #5       200
              Offline UPS       250

  North
               Desktop #1      1000
                Laptop #7      1200
                 Mouse #2        50

  South
               Desktop #3      1100
              Keyboard #4       200
                DVD Drive       150

   West
             Hard disk #8       500
                CD-ROM #6       150

$

And this is a screenshot of the PDF output, viewed in Foxit PDF Reader:

So the itertools.groupby function basically provides roughly the same sort of functionality that SQL's GROUP BY clause provides (of course, when included in a complete SELECT statement). The difference is that with Python's groupby, you do the grouping and related processing in your program code, on data which is in memory, while if using SQL via a client-server RDBMS from your program, the grouping and processing will happen on the database server and only the aggregate results will be sent to your program to process further. Both methods can have pros and cons, depending on the needs of the application.

In my next post about Python, I'll use this program as one vehicle to demonstrate some uses of randomness in testing, continuing the series titled "The many uses of randomness", the earlier two parts of which are here and here.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Follow me on Gumroad to be notified about my new products:

- Vasudev Ram - Online Python training and programming

Share |

Thursday, January 7, 2016

Code for recent post about PDF from a Python pipeline

By Vasudev Ram

In this recent post:

Generate PDF from a Python-controlled Unix pipeline ,

I forgot to include the code for the program PopenToPDF.py. Here it is now:

# PopenToPDF.py
# Demo program to read text from a shell pipeline using 
# subprocess.Popen, and write the text to PDF using xtopdf.
# Author: Vasudev Ram
# Copyright (C) 2016 Vasudev Ram - http://jugad2.blogspot.com

import sys
import subprocess
from PDFWriter import PDFWriter

def error_exit(message):
    sys.stderr.write(message + '\n')
    sys.stderr.write("Terminating.\n")
    sys.exit(1)

def main():
    try:
        # Create and set up a PDFWriter instance.
        pw = PDFWriter("PopenTo.pdf")
        pw.setFont("Courier", 12)
        pw.setHeader("Use subprocess.Popen to read pipe and write to PDF.")
        pw.setFooter("Done using selpg, xtopdf, Python and ReportLab, on Linux.")

        # Set up a pipeline with nl and selpg such that we can read from its stdout.
        # nl numbers the lines of the input.
        # selpg extracts pages 3 to 5 from the input.
        pipe = subprocess.Popen("nl -ba 1000-lines.txt | selpg -s3 -e5", \
            shell=True, bufsize=-1, stdout=subprocess.PIPE, 
            stderr=sys.stderr).stdout

        # Read from the pipeline and write the data to PDF, using the PDFWriter instance.
        for idx, line in enumerate(pipe):
            pw.writeLine(str(idx).zfill(8) + ": " + line)
    except IOError as ioe:
        error_exit("Caught IOError: {}".format(str(ioe)))
    except Exception as e:
        error_exit("Caught Exception: {}".format(str(e)))
    finally:
        pw.close()

main()

I ran it in the usual way with:

$ python PopenToPDF.py

to get the output shown in the previous post describing PopenToPDF.

Also, this is the one-off script, gen-file.py, that created the 1000 line input file:

with open("1000-lines.txt", "w") as fil:
    for i in range(1000):
        fil.write("This is a line of text.\n")
fil.close()

- Vasudev

Signup to hear about new products and services I create.

Signup to hear about new products and services I create.

Share |

Wednesday, December 23, 2015

Generate Windows Task List to PDF with xtopdf

By Vasudev Ram

While working at the DOS command line in Windows, I had the idea of using the DOS TASKLIST command along with xtopdf, my PDF generation toolkit, to generate a list of currently running Windows tasks to a PDF file, along with some other info, such as whether a task is a service or a console process, the process id, the memory usage, etc. The TASKLIST command shows all that information, by default.

I also sorted the output in ascending order by the Mem Usage field, by passing it through the DOS SORT command. (I could have sorted it by any other field such as the Image Name or the PID, of course.) I starred out some of the fields in the output.

Here are the steps to generate a Windows task list as a PDF, using xtopdf:

( I use $ as the prompt, even in DOS :)

1: Run TASKLIST and redirect its output to a text file.

$ tasklist > tasklist.out

2: Sort the file into another file.

$ sort /+65 tasklist.out > tasklist.srt

(Sort the output of TASKLIST by the character position of the Mem Usage field.)

3: Go edit tasklist to put the header lines back at the top :)
[ They get dislodged by the sort. ]

[ This is not Unix, so you can't easily do the fast, fluid command-line data munging that you can on Unix, unless you use something like Cygwin or UWin.

UWin was developed by David Korn, creator of the Korn Shell, for Windows. You can get UWin from the AT&T site here (after doing a convoluted license agreement dance, last time I checked). But IMO, the dance is not too long, and is worth it, to get a suite of Unix tools that work well on Windows, and UWin is also smaller & lighter than Cygwin, though not so comprehensive.
Be sure to read the section "Korn shell and Microsoft" at the David Korn link above :-) ]

4: Pipe the sorted task list to StdinToPDF, to generate the PDF output.

$ type tasklist.srt | python StdinToPDF.py tasklist.pdf

We just pipe the output of TASKLIST to StdinToPDF.py (an xtopdf app), which can be used at the end of any arbitrary command pipeline that generates text (on Unix / Windows / Linux / Mac OS X), to convert that text to PDF.

A screenshot of the PDF output I got (viewed in Foxit PDF Reader), is shown at the top of this post.

- Enjoy.

- Vasudev Ram - Online Python training and programming

- Vasudev Ram - Online Python training and programming

Share |

Monday, November 23, 2015

Convert XLSX to PDF with Python and xtopdf

By Vasudev Ram

XLSX => PDF

This is a simple application of my xtopdf toolkit, showing how to use it to convert XLSX data, i.e. Microsoft Excel data, to PDF (Portable Document Format). It only converts text data, not the formatting, colors, fonts, etc., that may be present in the Excel file.

For the input, I will use this small Excel file, fruits2.xlsx, which I created. A screenshot of it is below (click to enlarge):

Here is the code for XLSXtoPDF.py:

# XLSXtoPDF.py

# Program to convert the data from an XLSX file to PDF.
# Uses the openpyxl library and xtopdf.

# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2015 Vasudev Ram.

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()

And here is a screenshot of the PDF output in fruits2.pdf:

There are some points worth mentioning in connection with conversion of data to and from PDF. I will discuss them in a follow-up post.

Signup to hear about new products and services I create.

Share |

Wednesday, October 7, 2015

Create PDF invoices with Python and xtopdf

- Vasudev Ram - Online Python training and programming

This post shows the basics of how to create a PDF invoice using Python and xtopdf, my Python library for PDF generation.

Here is the code, in file InvoiceToPDF.py. It's rudimentary, but shows the basics involved, and can be built upon to create more complex invoices with more information:

'''
InvoiceToPDF.py
This program shows how to convert invoice data from
Python data structures into a PDF invoice.
Author: Vasudev Ram
Copyright 2015 Vasudev Ram
'''

import sys
import time
from PDFWriter import PDFWriter

def error_exit(message):
    sys.stderr.write(message)
    sys.exit(1)

def InvoiceToPDF(pdf_filename, data):
    try:
        # Get the invoice data from the dict.
        font_name = data['font_name']
        font_size = data['font_size']
        header = data['header']
        footer = data['footer']
        invoice_number = data['invoice_number']
        invoice_customer = data['invoice_customer']
        invoice_date_time = data['invoice_date_time']
        invoice_line_items = data['invoice_line_items']
    except KeyError as ke:
        error_exit("KeyError: {}".format(ke))
    try:
        with PDFWriter(pdf_filename) as pw:
            # Generate the PDF invoice from the data.
            pw.setFont(font_name, font_size)
            pw.setHeader(header)
            pw.setFooter(footer)
            pw.writeLine('-' * 60)
            pw.writeLine('Invoice Number: ' + str(invoice_number))
            pw.writeLine('Invoice Customer: ' + invoice_customer)
            pw.writeLine('Invoice Date & Time: ' + invoice_date_time)
            pw.writeLine('-' * 60)
            pw.writeLine('Invoice line items:')
            pw.writeLine('S. No.'.zfill(5) + ' ' + 'Description'.ljust(10) + \
            ' ' + 'Unit price'.ljust(10) + ' ' + 'Quantity'.ljust(10) + ' ' + \
            str('Ext. Price').rjust(8))
            pw.writeLine('-' * 60)
            sum_ext_price = 0
            for line_item in invoice_line_items:
                id, desc, price, quantity = line_item
                pw.writeLine(str(id).zfill(5) + ' ' + desc.ljust(10) + \
                ' ' + str(price).rjust(10) + ' ' + str(quantity).rjust(10) + \
                str(price * quantity).rjust(10))
                sum_ext_price += price * quantity
            pw.writeLine('-' * 60)
            pw.writeLine('Total:'.rjust(38) + str(sum_ext_price).rjust(10))
            pw.writeLine('-' * 60)
    except IOError as ioe:
        error_exit("IOError: {}".format(ioe))
    except Exception as e:
        error_exit("Exception: {}".format(e))

def testInvoiceToPDF(pdf_filename):
    # Get the Unix-style date from the system ...
    cdt = time.ctime(time.time()).split()
    # ... and format it a little differently.
    current_date_time = cdt[0] + ' ' + cdt[1] + ' ' + cdt[2] + \
    ', ' + cdt[4] + ', ' + cdt[3][:5]
    data = { \
        'font_name': 'Courier', \
        'font_size': 12, \
        'header': 'Customer Invoice', \
        'footer': 'Generated by xtopdf: http://bit.ly/xtopdf', \
        'invoice_number': 12345, \
        'invoice_customer': 'Mr. Vasudev Ram', \
        'invoice_date_time': current_date_time, \
        'invoice_line_items': \
        [
            [ 01, 'Chair', 100, 10 ], \
            [ 02, 'Table', 200, 20 ], \
            [ 03, 'Cupboard', 300, 30 ], \
            [ 04, 'Bed', 400, 40 ], \
            [ 05, 'Wardrobe', 500, 50 ], \
        ]
    }
    InvoiceToPDF(pdf_filename, data)
    
def main():
    if len(sys.argv) != 2:
        error_exit("Usage: {} pdf_filename".format(sys.argv[0]))
    testInvoiceToPDF(sys.argv[1]) 

if __name__ == '__main__':
    main()

Run the program with a command like:

$ python InvoiceToPDF.py Invoice.pdf

Here is a screenshot of the resulting PDF invoice in Foxit Reader:

Note: The use of the dict named data is not strictly needed. I simply used it to illustrate the technique of packing the multiple data items needed for the invoice, into a single dict, and then unpack those needed items in the function that actually generates the PDF. I could have done away with the dict and just used standalone variables in this simple program. But in a larger program where the invoice data is collected / generated in one or more functions, and the PDF is generated in another function, this technique or something similar can be of use, to reduce the number of individual arguments that need to be passed around.

- Enjoy.

Dancing Bison Enterprises

Signup to hear about new products and services that I create.

Share |

Wednesday, March 11, 2015

ASCII Table to PDF with xtopdf

- Vasudev Ram - Online Python training and programming

Recently, I had the need for an ASCII table lookup, which I searched for and found, thanks to the folks here:

www.ascii-code.com

That gave me the idea of writing a simple program to generate an ASCII table in PDF. Here is the code for a part of that table - the first 32 (0 to 31) ASCII characters, which are the control characters:

# ASCIITableToPDF.py
# Author: Vasudev Ram - http://www.dancingbison.com
# Demo program to show how to generate an ASCII table as PDF,
# using the xtopdf toolkit for PDF creation from Python.
# Generates a PDF file with information about the 
# first 32 ASCII codes, i.e. the control characters.
# Based on the ASCII Code table at http://www.ascii-code.com/

import sys
from PDFWriter import PDFWriter

# Define the header information.
column_names = ['DEC', 'OCT', 'HEX', 'BIN', 'Symbol', 'Description']
column_widths = [4, 6, 4, 10, 7, 20]

# Define the ASCII control character information.
ascii_control_characters = \
"""
0    000    00    00000000    NUL    �         Null char
1    001    01    00000001    SOH             Start of Heading
2    002    02    00000010    STX             Start of Text
3    003    03    00000011    ETX             End of Text
4    004    04    00000100    EOT             End of Transmission
5    005    05    00000101    ENQ             Enquiry
6    006    06    00000110    ACK             Acknowledgment
7    007    07    00000111    BEL             Bell
8    010    08    00001000    BS             Back Space
9    011    09    00001001    HT    	         Horizontal Tab
10    012    0A    00001010    LF    
         Line Feed
11    013    0B    00001011    VT             Vertical Tab
12    014    0C    00001100    FF             Form Feed
13    015    0D    00001101    CR             Carriage Return
14    016    0E    00001110    SO             Shift Out / X-On
15    017    0F    00001111    SI             Shift In / X-Off
16    020    10    00010000    DLE             Data Line Escape
17    021    11    00010001    DC1             Device Control 1 (oft. XON)
18    022    12    00010010    DC2             Device Control 2
19    023    13    00010011    DC3             Device Control 3 (oft. XOFF)
20    024    14    00010100    DC4             Device Control 4
21    025    15    00010101    NAK             Negative Acknowledgement
22    026    16    00010110    SYN             Synchronous Idle
23    027    17    00010111    ETB             End of Transmit Block
24    030    18    00011000    CAN             Cancel
25    031    19    00011001    EM             End of Medium
26    032    1A    00011010    SUB             Substitute
27    033    1B    00011011    ESC             Escape
28    034    1C    00011100    FS             File Separator
29    035    1D    00011101    GS             Group Separator
30    036    1E    00011110    RS             Record Separator
31    037    1F    00011111    US             Unit Separator
"""

# Create and set some of the fields of a PDFWriter instance.
pw = PDFWriter("ASCII-Table.pdf")
pw.setFont("Courier", 12)
pw.setHeader("ASCII Control Characters - 0 to 31")
pw.setFooter("Generated by xtopdf: http://slid.es/vasudevram/xtopdf")

# Write the column headings to the output.
column_headings = [ str(val).ljust(column_widths[idx]) \
    for idx, val in enumerate(column_names) ]
pw.writeLine(' '.join(column_headings))

# Split the string into lines, omitting the first and last empty lines.
for line in ascii_control_characters.split('\n')[1:-1]:

    # Split the line into space-delimited fields.
    lis = line.split()

    # Join the words of the Description back into one field, 
    # since it was split due to having internal spaces.
    lis2 = lis[0:5] + [' '.join(lis[6:])]

    # Write the column data to the output.
    lis3 = [ str(val).ljust(column_widths[idx]) \
        for idx, val in enumerate(lis2) ]
    pw.writeLine(' '.join(lis3))

pw.close()

Discerning readers will notice the effect of some of the said control characters on the displayed program code :)

You can run the program with:

python ASCIITableToPDF.py

py ASCIITableToPDF.py

./ASCIITableToPDF.py

ASCIITableToPDF.py

Figuring what needs to be done for each of the above methods of invocation to work, is (as they say in computer textbooks), left as an exercise for the reader :)

(There is a bit of subtlety involved, at least for beginners, particularly if you want to do it on both Linux and Windows.)

And here is a screenshot of the PDF output of the program:

Dancing Bison Enterprises

Signup to hear about new products or services from me.

Contact Page

Share |

Wednesday, February 25, 2015

Publish SQLite data to PDF using named tuples

- Vasudev Ram - Online Python training and programming

Some time ago I had written this post:

Publishing SQLite data to PDF is easy with xtopdf.

It showed how to get data from an SQLite (Wikipedia) database and write it to PDF, using xtopdf, my open source PDF creation library for Python.

Today I was browsing the Python standard library docs, and so thought of modifying that program to use the namedtuple data type from the collections module of Python, which is described as implementing "High-performance container datatypes". The collections module was introduced in Python 2.4.
Here is a modified version of that program, SQLiteToPDF.py, called SQLiteToPDFWithNamedTuples.py, that uses named tuples:

# SQLiteToPDFWithNamedTuples.py
# Author: Vasudev Ram - http://www.dancingbison.com
# SQLiteToPDFWithNamedTuples.py is a program to demonstrate how to read 
# SQLite database data and convert it to PDF. It uses the Python
# data structure called namedtuple from the collections module of 
# the Python standard library.

from __future__ import print_function
import sys
from collections import namedtuple
import sqlite3
from PDFWriter import PDFWriter

# Helper function to output a string to both screen and PDF.
def print_and_write(pw, strng):
    print(strng)
    pw.writeLine(strng)

try:

    # Create the stocks database.
    conn = sqlite3.connect('stocks.db')
    # Get a cursor to it.
    curs = conn.cursor()

    # Create the stocks table.
    curs.execute('''DROP TABLE IF EXISTS stocks''')
    curs.execute('''CREATE TABLE stocks
                 (date text, trans text, symbol text, qty real, price real)''')

    # Insert a few rows of data into the stocks table.
    curs.execute("INSERT INTO stocks VALUES ('2006-01-05', 'BUY', 'RHAT', 100, 25.1)")
    curs.execute("INSERT INTO stocks VALUES ('2007-02-06', 'SELL', 'ORCL', 200, 35.2)")
    curs.execute("INSERT INTO stocks VALUES ('2008-03-07', 'HOLD', 'IBM', 300, 45.3)")
    conn.commit()

    # Create a namedtuple to represent stock rows.
    StockRecord = namedtuple('StockRecord', 'date, trans, symbol, qty, price')

    # Run the query to get the stocks data.
    curs.execute("SELECT date, trans, symbol, qty, price FROM stocks")

    # Create a PDFWriter and set some of its fields.
    pw = PDFWriter("stocks.pdf")
    pw.setFont("Courier", 12)
    pw.setHeader("SQLite data to PDF with named tuples")
    pw.setFooter("Generated by xtopdf - https://bitbucket.org/vasudevram/xtopdf")

    # Write header info.
    hdr_flds = [ str(hdr_fld).rjust(10) + " " for hdr_fld in StockRecord._fields ]
    hdr_fld_str = ''.join(hdr_flds)
    print_and_write(pw, '=' * len(hdr_fld_str))
    print_and_write(pw, hdr_fld_str)
    print_and_write(pw, '-' * len(hdr_fld_str))

    # Now loop over the fetched data and write it to PDF.
    # Map the StockRecord namedtuple's _make class method
    # (that creates a new instance) to all the rows fetched.
    for stock in map(StockRecord._make, curs.fetchall()):
        row = [ str(col).rjust(10) + " " for col in (stock.date, \
        stock.trans, stock.symbol, stock.qty, stock.price) ]
        # Above line can instead be written more simply as:
        # row = [ str(col).rjust(10) + " " for col in stock ]
        row_str = ''.join(row)
        print_and_write(pw, row_str)

    print_and_write(pw, '=' * len(hdr_fld_str))

except Exception as e:
    print("ERROR: Caught exception: " + e.message)
    sys.exit(1)

finally:
    pw.close()
    conn.close()

This time I've imported print_function so that I can use print as a function instead of as a statement.

Here's a screenshot of the PDF output in Foxit PDF Reader:

Dancing Bison Enterprises

Signup to hear about new products or services from me.

Contact Page

Share |

Sunday, February 22, 2015

Excel to PDF with xlwings and xtopdf