Course in generalized linear modeling with biological applications -
Spring 2006
This course is given in collaboration with the
DINA research school. The course is accepted as a Phd-course (9 ECTS points) on KVL.
The pages was updated: May 29, 2006
News
The course starts the first day, Monday, 03. April 2006, at 10:00.
Persons from outside Foulum are required to get a 'guest-card' at the reception.
- Place:
- The course will be held in Foulum.
- Schedule:
- The course will consist of 4 blocks, the first and last two blocks consisting of 3 days. The dates are
03. April - 05. April; 19. April - 20. April;
01. - 03. May; 15. - 17. May
The course will each day start around 9 am and end at 4 pm
(the exact details will be announced later).
- Accommodation:
-
The course is arranged in blocks of 3 days to facilitate participation
from other DIAS centres such that people will not have to spend too much time on
transportation and with the only additional expense of having to spend a few
nights in the Foulum area. Accommodation is available at Nørresøkollegiet in
Viborg, see http://www.nkvib.dk/. If participants come from far away, we have
the possibility of not starting until 10am on the first day in a block.
Registration
Registration should be done until Marts 17., 2006. To sign up, send
an e-mail to Ulrich Halekoh, (e-mail:
ulrich.halekoh(a)agrsci.dk
Course description
The fundamental focus in many experiments and studies is on relating a response
variable to one or several explanatory variables. A traditional way of
accomplishing this is through a multiple linear regression model (technically
speaking, analysis of variance is also a multiple linear regression).
Through practical experience with regression and analysis of variance, one may
have experienced situations where the model assumptions are questionable: Data
might not be normally distributed, for example because the data are counts
(0,1,2,3,4,5,...) or binary (sick/not sick or yes/no). It is not uncommon to
find that the variance of the response variable grows with the expected value,
or the response variable depends on the explanatory variables in a nonlinear
way. Starting from real data examples, it is shown how generalized linear models
(GLM) are used for handling such data. The course also describes how to analyze
such data, when they are correlated, e.g. because the measurements are made on
the same experimental unit. This is achieved using generalized estimating
equations (GEE). The course also gives a brief introduction to analysis of
censored data.
The course is planned such that practice and theory goes hand in hand. This
means that the starting point for all topics will be practical examples
primarily, but not exclusively, taken from biological sciences. The necessary
statistical theory is then added as needed to solve the practical problems.
Topics: Linear normal models, logistic regression, analysis of count data,
analysis of data with non-constant variance (in
particular data with constant coefficient of variation), nonlinear relations
between data and explanatory variables, growth curve models, analysis of
correlated data (generalized mixed models, generalized estimating equations), the model concept,
statistical inference, model control.
For computer labs the R program will be
used. In the course an introduction to R will be given on the first two days.
Nevertheless, the
participants are strongly recommended
to download, install and start playing
around with R before the course starts.
Prerequisites
Working knowledge of basic mathematical and statistical tools and concepts:
Solving a simple equation, logarithmic and exponential function. Probability
distribution, random variable, mean, variance, normal distribution, confidence
interval, linear regression, analysis of variance, hypothesis testing. If you
are uncertain about whether you meet this requirements, please contact the
teachers!!!
It may be advisable to brush-up your statistical skills before the start of the
course. We suggest to consult e.g.
- Blæsild, P, and Granfeldt, J. (2003) Statistics with Applications in
Biology and Geology, Chapman and Hall/CRC : London .
- Zar, J. H. (1999) Biostatistical Analysis, Prentice Hall
Additional information
- Language:
- The course language will be English.
- On the web:
- The course homepage is
http://genetics.agrsci.dk/biometry/courses/phd06
Homepage of the previous
course in 2005
- Form:
- The course will consist of a mixture of lectures, exercises, and computer
practicals.
- Credit:
- The course is approved as a PhD course at RVAU (KVL) with 9 ECTS points.
- Workload:
- To complete this course you should expect to put about 7 weeks
of full time work into it.
- Compulsory homework:
-
A very important part of the course is the take-home assignments. These are
larger assignments which must be handed in and approved. Participants can only attend the
exam if the take-home assignments have been approved.
- Exam:
- A project has to be made at the end of the course. The final (oral)
exam is based on that project, but a participant can only attend the exam if the
take-home assignments have been approved.
- Price:
- The course is free for PhD students, other students which are
affiliated with DIAS and for DIAS employees. Participants outside DIAS will
have to pay for participation.
- Lectureres:
-
Course program and course material
The data sets used in the course are installed to R by executing in R the
command
install.packages("dataRep",repos="http://gbi.agrsci.dk/biometry/software/r/packages")
In the software folder you can find some additional software used in the course.
- DAY Click here to find material for this day
- Introduction to R: Introduction the use of the
statistical programming environment R.
We download and install R, perform basic data analytic and graphical tasks.
- DAY Click here to find material for this day
- Linear normal models (LNM).
Regression modeling based on the normal distribution: We recap
what is assumed known, but put it in different form.
- Practical exercises on LNM in R
- DAY Click here to find material for this day
- DAY Click here to find material for this day
- Introduction to Binomial data
- Principles of inference
- DAY Click here to find material for this day
- DAY Click here to find material for this day
- Poisson Regression
- Gamma distributed data
- DAY Click here to find material for this day
- Generalized Linear Model
- Residual Analysis
- DAY Click here to find material for this day
- DAY Click here to find material for this day
- Overdispersion
- Quasi Likelihood
- DAY Click here to find material for this day
- Generalized Estimating Equations
- Final-Exam Click here to find material for this day
Homework:
- After day 3: Homework on linear normal models
- After day 5: Homework on logistic regression
- After day 8: Homework on Poisson regression, quasi likelihood
Literature
- Notes and slides prepared by the teachers.
- Dalgaard, P (2002) Introductory Statistics With R, Springer
Verlag. (You are expected to acquire this book prior to the course start).
In addition we suggest consulting:
- Blæsild, PPP. and Granfeldt, J. (2003) Statistics with Applications in
Biology and Geology, Chapman and Hall/CRC : London (Chapters 9, 8 and 4 are especially
relevant for this course, and it is a very good book in general).
-
Aitkin, M., Francis B. and Hinde, J. (2004).
Statistical Modelling in GLIM4. 2nd edition, Oxford University Press: Oxford.
- Dobson, A.J. (2002). An Introduction to
Generalized Linear Models. 2nd edition, Chapman and Hall.
- Lindsey, J. K. (1997) Applying Generalized Linear Models,
Springer Verlag: Heidelberg.
- McCullagh, P. og Nelder, J.A. (1989). Generalized Linear Models.
Second Edition, Chapman and Hall: London
- Myers R. H., Montgomery, D.C and Vining, G.G (2004)
Generalized Linear Models: with Application in Engineering and Science.
John Wiley & Sons, New York
Useful Links
File translated from
TEX
by
TTH,
version 3.72.
On 29 May 2006, 10:24.