Logistic Regressions Computer Science Worksheet

ADVERTISEMENT

Logistic Regressions
The goal of logistic regression is to estimate the probability p
of a binary event i given predictor
i
variables. For example, is success (1) or failure (0) of an animal to reproduce a function of its
age? Or other factors too? Many outcomes can be described in these terms. Logistic regression is
thus flexible, widely used, and fairly simple to run, though some thought is required to express
outcomes as probabilities.
A logistic relationship between p
and the predictor variables is S-shaped, like the population
i
growth model, where a switch from p
= 0 to p
= 1 takes place somewhere in the middle.
i
i
Logistic regression is based on the logit function, which is a log transformation of p
:
i
To compute a logistic regression, we again use Generalized Linear Model (glm). A glm is able to
deal with a big problem for lm: error variance that is not evenly distributed across the model.
Here is an example of such a variance problem for lm:
1. Import and attach the
data set. This includes incidence (presence = 1,
islandbird.txt
absence = 0) for a bird species on islands in an archipelago, with given area (km
2
) and
isolation (km from the nearest island) as predictors.
2. Make simple plots of incidence as function of isolation and of area to see the data. Do
you think both predictor factors affect incidence?
3. Let's first try a linear model that assumes a Gaussian (i.e., normal) error variance. Enter:
liniso2 <- glm(incidence ~ isolation, family=gaussian)
summary(liniso1)
summary(liniso2)
par(mfrow=c(2,2))
plot(liniso1)
See any problems? You should! Thus the problem for analyzing binary data with tools
used so far this semester.
4. Now we try a logistic glm, where we can specify that binomial errors are to be expected
with the binary data. Logistic regression simply assumes response variable observations
are independent. That's it - no need not sweat residual distributions. Now make a new
glm model, with all as in liniso2 but use family=binomial instead, and get a summary. I
assume you call that model logiiso.
Notice how much the coefficients changed simply by assuming a binomial distribution
for the binary data? Coefficients in logistic regression indicate the effect of a one-unit
change in the predictor variable on the log odds of 'success'.
5. How much better is your logistic model than the linear glm (
)? Load the
liniso2
bbmle
package and compute an
with
to find out.
AICctab
weights=TRUE
1

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go
Page of 2