Multilevel Ordinal Logistic Regression in pymc

I think I’m finally getting closer to having the code to fit a proportional odds logistic regression model in pymc.   There are still a few little places where the code could be tidied up (I’ve yet to figure out how to avoid loops when working with the linear predictors in terms of \zeta_k - \eta_i but maybe I just need to stop worrying about duplicating entries in an array).   I also suspect I’m being over-cautious in explicitly sorting the \zetas; and the slicing logic still frightens me.    But so far the results ***look*** plausibly close to the maximum likelihood estimates when tried on a few datasets.

As blogged earlier, I’m thinking of model that has an ordinal response y_{ik} \sim (\pi_{ik}) for individuals i=1, \ldots, n  for categories k = 1, \ldots, K.    (One of my favourite errors is when I forget to set the y values to be 0,1,K. for use in python).    There are constraints such that \displaystyle{\sum_{k=1}^K \pi_{k}=1} for all i.

For ordinal logistic regression we relate \pi_{ik} to a linear predictor \boldsymbol{\beta X}.   This is done in two stages.   First we can write \eta_i = \beta_1 x_{i1} + \ldots + \beta_p x_{ip}.    There’s no intercept; what makes all this tick is that we have a set of cutpoints \zeta_k where -\infty < \zeta_{k} < \infty for cutpoints 1\mbox{ to }K ( need to tidy the notation here as we only need K-1 cutpoints).   For ordinal logistic regression we take \mbox{logit} \big( Prob[Y \leq k |\boldsymbol{X}] \big) = \zeta_k -\eta_i.   This is then a cumulative link model.

When fitting the model we have to estimate both the values of the “fixed effects” \boldsymbol{\beta } and these cutpoints \zeta_{k}.   All ran OK below in jags, now I just want it to work in pymc.

I’ve taken some data from Kevin Quinn which has a seven point scale, and (too many) predictor variables.   This is processed into a file clinton.csv referred to below.

from numpy import loadtxt

branddata = loadtxt('clinton.csv', delimiter=',')

brand = branddata[:,0]
X = branddata[:,1:7]

N = brand.size
K = 7 ## y.max

from pymc import Normal, Lambda, stochastic, deterministic, observed, Categorical, invlogit, LinearCombination
import numpy as np

## Here are the priors

#### First, try to control the initial values somewhat

initbeta = np.ones(6)/200
beta = Normal('beta', mu=1.0, tau=0.001, value=initbeta)

#### which gives rise to the Linear predictor (does it need to be lambda functions?)
 eta = Lambda('eta', lambda beta=beta,, beta))

## Ditto, set initial values for zeta

initzeta = [-4,-3,-2,-1,0,1]
zeta = Normal("zeta", mu=0, tau=0.001, value = initzeta)

#### Follow jags trick of simulating K-1 cutpoints and sort them
#### The $0.02 question is whether I can wrap this with the instantiation of zeta
## (or whether I need it at all)

 def zetasort(zeta=zeta):
 zsort =np.array(zeta, copy = True)

## cumulative log odds
## some legacy code in here when I was sorting zeta, but then couldn't figure
## how to monitor it, commented out in case I can simplify later

def cumlogit(eta=eta, zetasort=zetasort, brand=brand, K=K,N=N):
  for index in range(N):
    for jndex in range(K-1):
    ##cl[index,jndex] = zeta[jndex]-eta[index]
    cl[index,jndex] = zetasort[jndex]-eta[index]

## next I need to decumulate
## I don't like here are the index counters, might have them wrong (set from zero!)

def p(cumlogit=cumlogit, K=K, N=N):
  pee[:,0] = cumlogit[:,0]
  pee[:,1:K-1] = cumlogit[:,1:K-1] - cumlogit[:,0:K-2]## there's a nasty slicing rule here
  pee[:,K-1] = 1 - cumlogit[:,K-2]

##np.sum(p.value, axis=1) ## just there to remind myself how to check

y = Categorical('y', p=p, value=brand,  observed=True)

It’s the next bit where I get rather impressed by pymc. Driving the model building process is something here I’m really learning to appreciate.

import pymc
import clinton

mymodel = pymc.Model(clinton)
mygraph = pymc.graph.graph(mymodel)

myMAP = pymc.MAP(clinton)

M = pymc.MCMC(clinton)

I do like being able to get the graphs out (and interrogate them).

Now, all that remains is to reconcile the results with those you get in R. There does seem to be some vagueness in the relationship between some of model terms and the response.

This entry was posted in McMC, pymc and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s