What services does ISCON Statistical Consulting Services offer?

ISCON provide a wide range of statistical consulting and analysis services which include but not limited to the selection of research design, data analysis and results interpretation. We can also assist you in designing, conducting and analysing survey research. Our expert assistance is available for an extensive range of areas including grant proposal development, clinical trial design, scientific research design, strategies for data collection, determination of sample size and power, statistical modelling, results interpretation, preparation of manuscripts, data entry and management to some extent. We are specialists in epidemiology studies, clinical trials, statistical genetics, non-parametric methodology, genetic epidemiology, regression and time series analysis, Bayesian methods and graphical methodologies. Moreover, we offer support to non-profit organisations for NSF, private foundation and NIH grants.

How can I book appointment with Statistical consultant?

You can contact us via email or phone call to book your appointment for consultation

What information should I bring along at an initial consultation meeting with statistician?

When you visit us for the initial meeting, it is recommended that you bring along all the information related to your research and can help us in better understanding the goals and purpose of your research study. This information can include research hypotheses, relevant literature review papers, proposal drafts or manuscripts and anything you think is important for us to know about. If you have done data collection already, we request you to give us your data set copy in electronic form along with all the relevant study information.

How much statistics should I know?

It is expected that you at least have basic knowledge of statistical concepts and methodologies. However, our expert consultant helps you to explore the available options.

How much is the cost to hire statistician?

The cost of our service depends on a variety of factors which include your data format, complexity, data cleanliness, project deliverables and required analysis. We work with you to understand your requirements and breakdown the cost to meet your individual needs.

What happens at the initial consultation session with statistician?

In the initial consultation session, the consultant introduces him/herself and asks you a few general questions about your research, data, objectives and requirements. Through this, we try to understand your needs and work with you to answer your concerns.

What is the cost of the initial consultation?

At ISCON Statistics, we provide a free initial consultation at our office or through video-calling. However, if you request us to come over to you, then you have to cover our trip expenses.

At what research stage should I contact statisticians?

The best time to contact statistician is before designing your research study. We can help you in designing your research study and determine the most effective and powerful statistical methodologies. However, you can contact us at any stage of your research.

Which statistical software statistician will use for data analysis?

Our experts have hands-on experience of using a variety of latest statistical software and tools, including R, WinBUGS, SPSS, SAS, and Stata.

What is your approach towards data privacy, security and confidentiality?

At ISCON Statistics, we follow strict data security measures and respect your privacy and confidentiality. We never share your information and data with anyone without your permission.

Are your services available for students?

Yes, we happily provide statistical support to the postgraduate students who do not have statistical expertise. We either provide them statistical advice or guidance to help them perform their statistical analysis or perform the statistical analysis of their research.

What format should I use to provide you data?

You can send us data in any format. However, CSV files, Excel files or SQL-based data files are recommended.

Poisson regression : Statistical models for counts and rates

Chetan Prajapati

Founder & Statistician at ISCON

Table of Contents

What is a count ?
Probability distribution(s) associated with the counts
Poisson distribution
Negative binomial distribution
Poisson Inverse Gaussian (PIG)
Generalized Negative Binomial Model (GNB)
Generalised Poissson Models
Heterogeneous Negative Binomail model (NB-H)
Exact poisson regression

What is a count ?

Count is a number which isdiscreteandnon-negative. Discrete means a number which is a countable and distinct. For example, number of road accidents. Road accidents can be 2 or 852 but can not be 2.1 or 85.5. So its discrete i.e integer. This is different to numeric continuous variable (such as blood pressure 120.2,135.9) which modelled differently.

Count may also have contextual information i.e with time, area or length. Such as number of road accidents in given year, number of rain drops in square meter area. When the count is associated with any denominators, rate can be derived. For example, we observe 25 number of maternity claims from 850 women in year then the rate of claims will be 29 per 1000 women per year.

Probability distribution(s) associated with the counts

First we need to clarify what is a distribution.The Distribution is the specification of probability associated with the value taken by random variable on random experiment. For example, in random experiment of counting number of rain drops in square meter area. The random variable is number of rain drops. Once the experiment has been performed, we count the drop which may be 10. The random variable X taken the value of 10.We do this experiment again and its 35. So random variable X has taken the value of 35.

Now if we know the average rate of some event $[λ]$ in which event happening indepdently in time. Then the number of event in any time period has poisson distribution.

For example, in neighborhood road, on avaerge 2 cars passing by every hour. We noted the number of cars passing by continuously for 24 hours. We may get following numbers

set.seed(125)
rpois(24,2)

 [1] 3 0 1 1 5 5 2 1 2 2 0 3 1 0 2 2 2 4 2 3 1 3 3 1

We can derive the probability associated with each event and plot it with number of cars ( X axis ) and probability associated with it on Y axis

set.seed(125)
X <- rpois(24,2)
probs <- dpois(X,2)
require(ggplot2)

ggplot(data.frame(X,probs), aes(X,probs)) + geom_bar(stat = "identity") + scale_x_continuous(breaks=seq(0,5,by=1)) + theme_classic()

require(dummies)
x <- sample(x=c("A","B"),size=100, replace=TRUE, prob=c(0.5,0.5)) 
linpred <- cbind(1, dummy(x)[, -1]) %*% c(0.2, 0.4) 
y  <- exp(linpred) 
df <- data.frame(x,y)
fit <- glm(y~x,family = poisson(), df)
summary(fit)


Call:
glm(formula = y ~ x, family = poisson(), data = df)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-8.635e-09  -8.635e-09  -8.635e-09   0.000e+00   0.000e+00  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)   0.2000     0.1231   1.624   0.1043  
xB            0.4000     0.1646   2.430   0.0151 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 5.9609e+00  on 99  degrees of freedom
Residual deviance: 4.0263e-15  on 98  degrees of freedom
AIC: Inf

Number of Fisher Scoring iterations: 4

The prime distribution of count is poisson. The other models are available to overcome two majors issues in counts : over-dispersion (or less often under-dispersion) and excess zeros. Or sometime zero is impossible value for the outcome ( such length of stay-no 0 length of stay), in this condition, zero truncated model is useful.

To overcome over or under dispersion:

Negative Binomial distribution (NB)
Poisson Inverse Gaussian (PIG)
Generalized Negative Binomial Model
Generalised Poisson
Heterogeneous Negative Binomial model (NB-H)

To ovecome excess zero’s:

zero-inflated Poisson (ZIP)
zero-inflated negative binomial (ZINB)
Hurdle model

To model outcome where 0 is impossible value

zero truncated models

knitr::include_graphics("poisson_models_types.png",error = FALSE)

Figure 1: Models for count data: mean-variance relationships and its parameterisation

Poisson distribution

single parameter
mean $[μ]$ is equal to variance $[μ]$ , also called dispersion
in real life datasets, more often there will be higher variability or correlation than model allows (over-dispersion) which leads to biased standard errors ( so as misleading significance of covariate)

Negative binomial distribution

has additional dispersion parameter $[α]$ to accommodate excess variability. when the dispersion parameter is zero, the model becomes Poisson
can be traditional negative binomial (NB1) or quadratic negative binomial (NB2)
Only correct over-dispersion ( not under-dispersion of poisson)
form of Poisson-gamma mixture with dispersion parameter $[α]$ has gamma distribution. Gamma distribution is very flexible in shape so most (not all) dispersed count data modelled well with NB
It is possible that NB may not correct over-dispersion in a Poisson model

Poisson Inverse Gaussian (PIG)

Similler to negative binomial except the dispersion parameter $[α]$ has inverse Gaussian distribution
Available in Rgamlasspackage

Generalized Negative Binomial Model (GNB)

The GNB parametrizes the exponent on the second term of the negative binomial variance.
To get initial idea whether data is suited to NB1 or NB2 or NB-P

Generalised Poissson Models

Si miller to above models but dispersion parameter can take negative values which also correct under dispersion if present

Heterogeneous Negative Binomail model (NB-H)

Dispersion parameter can be associated with particular covariate which bring significant dispersion in the model.

Exact poisson regression

for unbalanced and sparse count data

In count model log of count are modelled with respect to linear predictors. This is will make sure that predicted count will always be positive.

The expected percentage of zero counts on the basis of the Poisson model is under 1%. If mean of count response is 5 or below and some 30% of the count observations consist of zeros,ZIP or ZINB will be good choice.

Article is being updated …

Ref:

Hilbe, J. (2014). Modeling Count Data. Cambridge: Cambridge University Press.

Poisson regression : Statistical models for counts and rates

What is a count ?

Probability distribution(s) associated with the counts

Poisson distribution

Negative binomial distribution

Poisson Inverse Gaussian (PIG)

Generalized Negative Binomial Model (GNB)

Generalised Poissson Models

Heterogeneous Negative Binomail model (NB-H)

Exact poisson regression

Analytics cookies

Functining Cookies

Marketing cookies

What is a count ?

Probability distribution(s) associated with the counts

Poisson distribution

Negative binomial distribution

Poisson Inverse Gaussian (PIG)

Generalized Negative Binomial Model (GNB)

Generalised Poissson Models

Heterogeneous Negative Binomail model (NB-H)

Exact poisson regression

Cookie Settings

Analytics cookies

Functining Cookies

Marketing cookies