Null models in statistics

Fri, 15/05/2009 - 09:02 — jyearsle

Here are a couple of Simile models which try and show that statistical inference is largely about comparing data against the output of a null model. In statistical text books the null model is not usually mentioned explicitly. Instead the null hypothesis, and the the assmptions of a test are presented, but the null hypothesis and the assumptions are really specifying an underlying null model.

These two Simile models look at a t-test (actually a paired t-test) and a one-way ANOVA. They use the same data set, which comprises two samples, each with 20 observations.

Both tests ask the question: are the means of the two samples the same? But they have slightly different null models. For these simple statistical tests the statistical theory can be derived analytically, so the null model is often forgotten about because the results from the analytical calculations (i.e. the statistical tables, Student's t-distribution for the t-test and the F-distribution for ANOVA) already contain the null model output. But I think it is useful for teaching statistics (or modelling) to explicitly show the null model, it's role in a statistical test and how it is used to calculate the probability that the data are consistent with the null model (which is the p-value).

I'd be interested to hear if anyone finds this approach confussing, interesting, misleading, etc.

There are 5 files.

The example data is contained in sample_data.txt

The model for the t-test is in paired_ttest.sml and the output interface is in paired_ttest.shf

The model for the ANOVA is in anova.sml and the output interface is in anova.shf

When statistical tests become more complicated the null model often becomes more prominent. Perhaps the best example of this is in Approximate Bayesian Computations where a simulation model is explicitly part of the statistical methodology.

Look forward to any comments and suggestions,

Jon