Python Pattern Module -- Per Erik Strandberg

The command line argument parser

I use the very friendly argumentparser lib for the command line arguments. I set them up and parse in a separate method.

from argparse import ArgumentParser

# ...

    def parse_args(self, args=None):
        """Parse command line argumets."""
        desc = "Write random numbers to a compressed csv file."

        parser = ArgumentParser(description=desc)
        parser.add_argument('--mini', '-m', type=int, default=100,
                            help="Minimal number")
        parser.add_argument('--maxi', '-M', type=int, default=200,
                            help="Maximal number")
        parser.add_argument('--numbers', '-n', type=int, default=100,
                            help="Number of numbers")
        parser.add_argument('--drift', type=float, default=10.0,
                            help="Drift")

        tgroup = parser.add_argument_group("Testing instead")
        tgroup.add_argument('--test', '-t', action="store_true", default=False,
                            help="Perform doc tests and exit instead.")

        if args:
            self.args = parser.parse_args(args)
        else:
            self.args = parser.parse_args()
        self.logger.debug(self.args)
        return

The nice grouping and typical command line look and feel is excellent:

$ python template.py --help
usage: template.py [-h] [--mini MINI] [--maxi MAXI] [--numbers NUMBERS]
                   [--drift DRIFT] [--test]

Write random numbers to a compressed csv file.

optional arguments:
  -h, --help            show this help message and exit
  --mini MINI, -m MINI  Minimal number
  --maxi MAXI, -M MAXI  Maximal number
  --numbers NUMBERS, -n NUMBERS
                        Number of numbers
  --drift DRIFT         Drift

Testing instead:
  --test, -t            Perform doc tests and exit instead.

The logging

I use the vanilla logging library in python - it's a bit hard to set up if you want custom formats (and different formats in a file and on the console). But regular formatting is no problem and rapidly done.

import logging

    # ...
    self.logger = logging.getLogger(name="Data")
    logging.basicConfig(level=logging.DEBUG)

    # ...
    self.logger.debug(self.args)

The output is something like:

DEBUG:Data:Ctor OK.
DEBUG:Data:Setting up handle
DEBUG:Data:Getting 100 numbers

A good idea might be to parse the arguments before setting up logging in case you want to control the log level(s) to show.

The doctest

At the start of the file

#!/usr/bin/python

"""
An example with the typical python ingredients I often use.

These are the doctests for the module.

We first set it up and create some comma separated values:
    >>> filename = '/tmp/mydata.csv'
    >>> dm = DataMaker()
    >>> dm.parse_args('--mini 100 --maxi 100 --drift 0 --numbers 3'.split(' '))
    >>> handle = open(filename, 'w')
    >>> dm.setup_handle(handle)
    >>> dm.get_sample()
    >>> handle.flush()
    >>> handle.close()

We now open the created file and store the contents in a list called lines
    >>> f = open(filename, 'r')
    >>> lines = list()
    >>> for line in f: lines.append(line)

There are three lines plus a header
    >>> len(lines) == 4
    True

The first value is 100
    >>> float(lines[1].split(',')[1].strip()) == 100
    True

The last value is 100
    >>> float(lines[-1].split(',')[-1].strip()) == 100
    True
"""

I'm not sure if it is a good idea - but I use an argument to start the tests if that is what the user wants.

    if dm.args.test:
        import doctest
        res = doctest.testmod()
        print("Tested %s cases, %s failed." % (res.attempted, res.failed))
        exit(0)

I am running it with the Python Code Coverage Module to also measure how effective my doctests are.:

~/tmp$ coverage run template.py --test
[...]
Tested 14 cases, 0 failed.

~/tmp$ coverage report -m
Name       Stmts   Miss  Cover   Missing
----------------------------------------
template      65      5    92%   125-129

The csv file

I haven't used the csv lib much, but I'd like to start learning it - I tend to store huge amounts of csv files at work. But I have just used a regular handle and taken care of my semicolons and commas. Setting it up is pretty simple:

import csv

    #...
        self.writer = csv.writer(filehandle)
        self.write(('timestamp', 'left', 'middle', 'right'))
        #...
        self.writer.writerow([item for item in line])
        #...

The gzip

Using gzip is in fact pretty simple in python:

import gzip
handle = gzip.open('data.gzip', 'wb')
_ = [handle.write("data: %s\n" % d) for d in xrange(8)]
handle.flush()
handle.close()

And I have discovered that zcat, for me, is almost as nice as the gunzip command:

$ zcat data.gzip
data: 0
data: 1
data: 2
data: 3
data: 4
data: 5
data: 6
data: 7

Running it

Run the script with some arguments

$ python template.py --mini 10 --maxi 50 --numbers 20 --drift 1
DEBUG:Data:Namespace(drift=1.0, maxi=50, mini=10, numbers=20, test=False)
DEBUG:Data:Ctor OK.
DEBUG:Data:Setting up handle
DEBUG:Data:Getting 20 numbers

Uncompress the file
$ gunzip -v data.csv.gz 
gzip: data.csv already exists; do you wish to overwrite (y or n)? y
data.csv.gz:	 57.3% -- replaced with data.csv

View and plot with libre office:
$ libreoffice data.csv

The complete recipe for My Python Pattern

#!/usr/bin/python

"""
An example with the typical python ingredients I often use.

These are the doctests for the module.

We first set it up and create some comma separated values:
    >>> filename = '/tmp/mydata.csv'
    >>> dm = DataMaker()
    >>> dm.parse_args('--mini 100 --maxi 100 --drift 0 --numbers 3'.split(' '))
    >>> handle = open(filename, 'w')
    >>> dm.setup_handle(handle)
    >>> dm.get_sample()
    >>> handle.flush()
    >>> handle.close()

We now open the created file and store the contents in a list called lines
    >>> f = open(filename, 'r')
    >>> lines = list()
    >>> for line in f: lines.append(line)

There are three lines plus a header
    >>> len(lines) == 4
    True

The first value is 100
    >>> float(lines[1].split(',')[1].strip()) == 100
    True

The last value is 100
    >>> float(lines[-1].split(',')[-1].strip()) == 100
    True
"""

import logging
import csv
from argparse import ArgumentParser
import datetime
import gzip
from random import uniform


class DataMaker(object):
    """Class that spits out some data in a csv format."""

    def __init__(self):
        """Ctor takes a file handle on which we write"""
        self.writer = None
        self.args = None
        self.logger = logging.getLogger(name="Data")
        logging.basicConfig(level=logging.DEBUG)
        self.parse_args()
        self.logger.debug("Ctor OK.")
        return

    def setup_handle(self, filehandle):
        """Setup the file handle"""
        self.logger.debug("Setting up handle")
        self.writer = csv.writer(filehandle)
        self.write(('timestamp', 'left', 'middle', 'right'))
        return

    def parse_args(self, args=None):
        """Parse command line argumets."""
        desc = "Write random numbers to a compressed csv file."

        parser = ArgumentParser(description=desc)
        parser.add_argument('--mini', '-m', type=int, default=100,
                            help="Minimal number")
        parser.add_argument('--maxi', '-M', type=int, default=200,
                            help="Maximal number")
        parser.add_argument('--numbers', '-n', type=int, default=100,
                            help="Number of numbers")
        parser.add_argument('--drift', type=float, default=10.0,
                            help="Drift")

        tgroup = parser.add_argument_group("Testing instead")
        tgroup.add_argument('--test', '-t', action="store_true", default=False,
                            help="Perform doc tests and exit instead.")

        if args:
            self.args = parser.parse_args(args)
        else:
            self.args = parser.parse_args()
        self.logger.debug(self.args)
        return

    def get_sample(self):
        """Get sample based on arguments"""
        self.logger.debug("Getting %s numbers" % self.args.numbers)
        for drift in xrange(self.args.numbers):
            mini = self.args.mini + drift*self.args.drift
            maxi = self.args.maxi + drift*self.args.drift
            rands = [uniform(mini, maxi),
                     uniform(mini, maxi),
                     uniform(mini, maxi)]
            rands = sorted(rands)
            self.store(rands[0], rands[1], rands[2])
        return

    def store(self, left, mid, right):
        """Write the values with a timestamp"""
        now = datetime.datetime.now()
        self.write([str(now), left, mid, right])
        return

    def write(self, line):
        """Write a line"""
        # this does not work in python 3
        self.writer.writerow([item for item in line])
        return


if __name__ == "__main__":
    dm = DataMaker()
    if dm.args.test:
        import doctest
        res = doctest.testmod()
        print("Tested %s cases, %s failed." % (res.attempted, res.failed))
        exit(0)

    handle = gzip.open("data.csv.gz", "wb")
    dm.setup_handle(handle)
    dm.get_sample()
    handle.flush()
    handle.close()

Related entries in Min Blogg:

Python Pattern Doctest
Python Doctest And Docstring
Python Code Coverage Module
Python Command Line Arguments, where I use an other argument parser (option parser in optparse).
Python Compressed Files, where I investigate how efficient compression you get depending on your patterns in the file.

See also the Standard Python Library documentation:

argparse - Parser for command-line options, arguments and sub-commands [1]
csv - CSV File Reading and Writing [2]
doctest - Test interactive Python examples [3]
gzip - Support for gzip files [4]
logging - Logging facility for Python (Make sure you read the three tutorials if you want more than plain vanilla logging - it rapidly gets complicated) [5]

See also Doug Hellman's module of the week:

for argparse: [6]
for csv: [7]
for doctest; [8]
for gzip: [9]
for logging: [10]

Belongs in Kategori Test
Belongs in Kategori Mallar
Belongs in Kategori Programmering