Developing code

The wonder of the blogosphere is that I rediscovered a paper I had been meaning to think about for a long time: the paper reports Cook, Gelman and Rubin’s proposals that we:

  1. Specify a model and then simulate parameters from the prior distributions (which is a bit of a nuisance if you have improper flat priors)
  2. Now that you have some parameters, simulate data from the likelihood, conditional on the simulated parameters
  3. Run the MCMC sampler
  4. Compare the true parameter values to the samples. Generate a statistic for each parameter representing the proportion of samples that are greater than the true value
  5. Repeat many times

I think what interested me the most is that it highlights a gulf between my work and that of many programmers. And it doesn’t just seem to be me. We don’t seem to develop statistical software the way software developers would automatically do. And the main difference seems to be the lack for formal approaches to testing what we do.

Python for example is very well equipped with Test Drive Development packages. Standard Python comes with Unit testing built in (is can find a rather elderly book athttp://www.onlamp.com/pub/a/python/2004/12/02/tdd_pyunit.html). There is even a doctest facility which works out of your docstrings. Now, it’s certainly true that there is a unit testing package in R (http://cran.r-project.org/web/packages/svUnit/index.html), but I don’t get the sense that it is used as routinely as seems to be the case in Python. Yes, install.packages(“foo”) does download, compile, install and check something. But it doesn’t appear that testing is built into software development to the rigour I see in much of the Python packages that I use. In fact, I think all the packages I installed this year suggested I run nose tests – nose being a Python package that implements Test Driven Development in a more advanced way than is possible in standard Python.

I appreciate that much of what I do is either visual or relies on Monte Carlo methods, hence designing a test is non-trivial. But hunting the blogosphere around TDD does suggest I could usefully learn a lot about more structured approaches to software development. You do innately perform a lot of checking on steps within an algorithm – and of course do the kind of checking envisaged by the gurus mentioned at the start. But I get the impression that TDD provides a discipline to writing code that doesn’t exisit in my work, and that my work could benefit from that in many ways – even just the job satisfaction of giving myself milestones.

(and as it happens, there is another buzzword “Agile Development” that perhaps I should check out).

 

 

Advertisements
This entry was posted in pymc, Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s