## Multilevel Ordinal Logistic Regression

Several projects have been making me think about ordinal logistic regression models. Rather ironically, I’ve just stopped teaching about them as I needed to weed a curriculum down to something a little more manageable for everyone.

Anyway, the models are quite easy to code in jags, but I’ve yet to figure the pymc coding. Hence this blog entry documents ongoing attempts with these models.

The model can be specified on these lines, we have an ordinal response $y_{ik} \sim (\pi_{ik})$ for individuals $i=1, \ldots, n$  for categories $k = 1, \ldots, K$.   There are constraints such that $\displaystyle{\sum_{k=1}^K \pi_{k}=1}$ for all $i$.

The key idea is that we can think of $y_{ik}$ as a coarsened realisation of some continuous latent variable with a logistic distribution (other distributions are possible such as the Normal, Cauchy, and I think the extreme value).

For ordinal logistic regression we relate $\pi_{ik}$ to a linear predictor $\boldsymbol{\beta X}$.   This is done in two stages.   First we can write $\eta_i = \beta_1 x_{i1} + \ldots + \beta_p x_{ip}$.    There’s no intercept; what makes all this tick is that we have a set of cutpoints $\zeta_k$ where $-\infty < \zeta_{k} < \infty$ for cutpoints $1\mbox{ to }K$ ( need to tidy the notation here as we only need $K-1$ cutpoints).   For ordinal logistic regression we take $\mbox{logit} \big( Prob[Y \leq k |\boldsymbol{X}] \big) = \zeta_k -\eta_i$.   This is then a cumulative link model.

When fitting the model we have to estimate both the values of the “fixed effects” $\boldsymbol{\beta }$ and these cutpoints $\zeta_{k}$.

These are readily extended into multilevel models by adding a random effect for each area/subject grouping. We can structure these random effects in a number of ways. Perhaps we have area level covariates and wish to add $\eta_i = \beta_1 x_{i1} + \ldots + \beta_p x_{ip} + \psi[l]$ where $\psi[l] = \alpha_0 + \alpha_1 z_l$.

This is easily fitted using McMC.

model{ ## Multilevel Multinomial
model{
for (i in 1:N){
for (k in 1:K-1){
logit(gamma[i,k]) <- theta[k] - mu[i]}
pi[i,1] <- gamma[i,1]
for (k in 2:K-1){pi[i,k] <- gamma[i,k] - gamma[i, k-1]}
pi[i,K] <- 1 - gamma[i, K-1]
##mu[i] <- beta1 + x[i] * beta2
mu[i] <- inprod(X[i,], beta) + wheta[Z[i]]
y[i] ~ dcat(pi[i, 1:K])
}

## priors
#beta1 ~ dnorm(0, 0.001)
#beta2 ~ dnorm(0, 0.001)

##for (reff in 1:1440){wheta[reff] ~ dnorm(0, 0.001)}
for (reff in 1:1440){wheta[reff] ~ dnorm(0, tau)}
tau ~ dgamma(0.1, 0.01)

for (j in 1:J){beta[j] ~ dnorm(0, 0.001)}
for (k in 1:4){theta0[k] ~ dnorm(0,1)}
theta <- sort(theta0)
}

This all works fine and well, except that I can’t handle the number of correlation matrices I would like (I wish to have a separate correlation matrix for each of a number of subject groups). I am also trying to code this in pymc but haven’t quite figured the equivalent of that sort trick you can pull in jags. Actually it works very well, I’ve used this with a few datasets and perhaps 40 response variables and the speed of fitting is good and the convergence is good.

I suppose the only problem that deserves closer attention is the proportional odds assumption. Essentially, the cut points are fixed for all individuals, the linear predictor shifts the latent variable up or down as appropriate. However, we might like a partial proportional odds assumption which allows some covariates to act differently on different categories.