Carl Mooris presented three hypothetical scenarios with different sample sizes in an election race between two candidates, namely Mr. Allen and Mr. Backer. A sample of n voters is taken and let be the voters favoring Allen. He would like to test against . The three scenarios are

- Y = 15 and n=20,
- Y = 115 and n=200,
- Y =1046 and n=2000.

The p-values are about 0.021 for all scenarios and the ICs are:

- [0.560,0.940],
- [0.506,0.640],
- [0.501,0.545].

He asked which one of the three scenarios is most encouraging to candidate Allen, see the article. Andrew Gelman presented a discussion of this in his blog.

I argue here that the comparison of observed ICs and observed p-values are not appropriated, since ICs are random intervals and, as such, they are subject to random variabilities. Their observed values alone do not signify much without their dispersion measures. It is like comparing the observed values of two estimators without regarding their standard errors or other measures. P-values can also be regarded as random variables. Identical p-values could be compared together with a measure of their variabilities.

For instance, let be the null hypothesis. A p-value is defined by

,

where is the observed value of the test statistics , is the random sample and is the joint probability measure of the statistical model.

Define , then is a random variable whose distribution depends on , and (of course that it depends on the adopted statistical model).

It is possible to compute, e.g., . Then by plugging the estimative of , we got one possible measure of variability

Other measures can be implemented by using this method.

Notice that, if a problem occurs in the first theory level, then you go to a meta-theory level to solve the problem, if a problem occurs in the meta-theory level, then you go to a meta-meta-theory level and so on and so forth.

It is too easy to find `apparent holes’ in the classical statistical theory, since it is a language with huge number of concepts that go far beyond the probabilistic knowledge. Unfortunately, the general recipe is: “if it appears to be probabilistically incoherent, it must be incoherent in a broadly sense and should be avoided´´. This recipe is too intellectually weak. If you do not use an appropriate language to treat these concepts that requires other non-probabilistic tools, you are doomed to interpret the classical concepts in a very narrow way as it seems the rule nowadays.