A multivariate ultrastructural errors-in-variables model with equation error

Abstract:  This paper deals with asymptotic results on a multivariate ultrastructural errors-in-variables regression model with equation errors. Sufficient conditions for attaining consistent estimators for model parameters are presented. Asymptotic distributions for the line regression estimators are derived. Applications are presented to the elliptical class of distributions with two error assumptions. Model generalizes previous results aimed at univariate scenarios.

Link to the article

Modified likelihood ratio tests in heteroskedastic multivariate regression models with measurement error


In this paper, we develop a modified version of the likelihood ratio test for multivariate heteroskedastic errors-in-variables regression models. The error terms are allowed to follow a multivariate distribution in the elliptical class of distributions, which has the normal distribution as a special case. We derive the Skovgaard adjusted likelihood ratio statistic, which follows a chi-squared distribution with a high degree of accuracy. We conduct a simulation study and show that the proposed test displays superior finite sample behavior as compared to the standard likelihood ratio test. We illustrate the usefulness of our results in applied settings using a data set from the WHO MONICA Project on cardiovascular disease.

link to the article

A heteroscedastic structural errors-in-variables model with equation error.


It is not uncommon with astrophysical and epidemiological data sets that the variances of the observations are accessible from an analytical treatment of the data collection process. Moreover, in a regression model, heteroscedastic measurement errors and equation errors are common situations when modelling such data. This article deals with the limiting distribution of the maximum-likelihood and method-of-moments estimators for the line parameters of the regression model. We use the delta method to achieve it, making it possible to build joint confidence regions and hypothesis testing. This technique produces closed expressions for the asymptotic covariance matrix of those estimators. In the moment approach we do not assign any distribution for the unobservable covariate while with the maximum-likelihood approach, we assume a normal distribution. We also conduct simulation studies of rejection rates for Wald-type statistics in order to verify the test size and power. Practical applications are reported for a data set produced by the Chandra observatory and also from the WHO MONICA Project on cardiovascular disease

Link to the article

Why does a 95% CI not imply a 95% chance of containing the mean?

In the stack Exchange website there is a question about confidence intervals, see here: Why does a 95% CI not imply a 95% chance of containing the mean?

My personal answer can be assessed here.

Why does a 95% CI not imply a 95% chance of containing the mean?

There are many issues to be clarified in this question and in the majority of the given responses. I shall confine myself only to two of them.

 a. What is a population mean? Does exist a true population mean?

The concept of population mean is model-dependent. As all models are wrong, but some are useful, this population mean is a fiction that is defined just to provide useful interpretations. The fiction begins with a probability model.

The probability model is defined by the triplet

(\mathcal{X}, \mathcal{F}, P),

where \mathcal{X} is the sample space (a non-empty set), \mathcal{F} is a family of subsets of \mathcal{X} and P is a well-defined probability measure defined over \mathcal{F} (it governs the data behavior). Without loss of generality, consider only the discrete case. The population mean is defined by

\mu = \sum_{x \in \mathcal{X}} xP(X=x),

that is, it represents the central tendency under P and it can also be interpreted as the center of mass of all points in \mathcal{X}, where the weight of each  x \in \mathcal{X} is given by P(X=x).

In the probability theory, the measure P is considered known, therefore the population mean is accessible through the above simple operation. However, in practice, the probability P is hardly known. Without a probability P, one cannot describe the probabilistic behavior of the data. As we cannot set a precise probability P to explain the data behavior, we set a family \mathcal{M} containing probability measures that possibly govern (or explain) the data behavior. Then, the classical statistical model emerges

(\mathcal{X}, \mathcal{F}, \mathcal{M}).

The above model is said to be a parametric model if there exists \Theta \subseteq \mathbb{R}^p with p< \infty such that \mathcal{M} \equiv \{P_\theta: \ \theta \in \Theta\}. Let us consider just the parametric model in this post.

Notice that, for each probability measure  P_\theta \in \mathcal{M}, there is a respective mean definition

\mu_\theta = \sum_{x \in \mathcal{X}} x P_\theta(X=x).

That is, there is a family of population means \{\mu_\theta: \ \theta \in \Theta\} that depends tightly on the definition of \mathcal{M}. The family \mathcal{M} is defined by limited humans and therefore it may not contain the true probability measure that governs the data behavior. Actually, the chosen family will hardly contain the true measure, moreover this true measure may not even exist. As the concept of a population mean depends on the probability measures in \mathcal{M}, the population mean is model-dependent.

The Bayesian approach considers a prior probability over the subsets of \mathcal{M} (or, equivalently, \Theta), but in this post I will concentrated only on the classical version.

 b. What is the definition and the purpose of a confidence interval?

As aforementioned, the population mean is model-dependent and provides useful interpretations. However, we have a family of population means, because the statistical model is defined by a family of probability measures (each probability measure generates a population mean). Therefore, based on an experiment, inferential procedures should be employed in order to estimate a small set (interval) containing good candidates of population means. One well-known procedure is the (1-\alpha) confidence region, which is defined by a set C_\alpha such that, for all \theta \in \Theta,

P_\theta(C_\alpha(X) \ni \mu_\theta) \geq 1-\alpha  and \inf_{\theta\in \Theta} P_\theta(C_\alpha(X) \ni \mu_\theta) = 1-\alpha,

where P_\theta(C_\alpha(X) = \varnothing) = 0 (see Schervish, 1995). This is a very general definition and encompasses virtually any type of confidence intervals. Here, P_\theta(C_\alpha(X) \ni \mu_\theta) is the probability that C_\alpha(X) contains \mu_\theta under the measure P_\theta. This probability should be always greater than (or equal to) 1-\alpha, the equality occurs at the worst case.

Remark: The readers should notice that it is not necessary to make assumptions on the state of reality, the confidence region is defined for a well-defined statistical model without making reference to any “true” mean. Even if the “true” probability measure does not exist or it is not in \mathcal{M}, the confidence region definition will work, since the assumptions are about statistical modelling rather than the states of reality.

On the one hand, before observing the data, C_\alpha(X) is a random set (or random interval) and the probability that “C_\alpha(X) contains the mean \mu_\theta” is, at least, (1-\alpha) for all \theta \in \Theta. This is a very desirable feature for the frequentist paradigm.

On the other hand, after observing the data x, C_\alpha(x) is just a fixed set and the probability that  “C_\alpha(x) contains the mean \mu_\theta” should be in \{0,1\} for all \theta \in \Theta.

That is, after observing the data x, we cannot employ the probabilistic reasoning anymore. As far as I know, there is no theory to treat confidence sets for an observed sample (we are working on it and we are getting some nice results). For a while, the frequentist must believe that the observed set (or interval) C_\alpha(x) is one of the (1-\alpha)100\% sets that contains \mu_\theta for all \theta\in \Theta.

PS: I invite any comments, reviews, critiques, or even objections to my post. Let’s discuss it in depth. As I am not a native English speaker, my post surely contains typos and grammar mistakes.


Schervish, M. (1995), Theory of Statistics, Second ed, Springer.

Severity and S-value

This post is under constant updating…

The severity principle proposed by Deborah Mayo is used for accepting a null hypothesis H when:

1.  there is no evidence to reject H and
2.  H passes the test with high severity.

It seems to me that it is quite similar to my purpose (the s-value), but the measures involved in steps 1. and 2. are different.

An example from the normal distribution with known variance \sigma^2 = 1 follows. Consider n=100 and the hypotheses

H_0: \mu \leq 10 vs H_1: \mu > 10,

a) If the sample average is 5, then 5+50*1/10 = 10. That is, we have no evidence against H_0 (since the sample average is corroborating H_0) and strong evidence against H_1 (since the sample average is very far away from H_1). The s-value for H0 is one and the s-value for H_1 is almost zero (i.e., H_0 passes with high severity).

b) If  the sample average is 15, then 15-50*1/10 = 10. That is, we have strong evidence against H_0 (since the sample average is very far away from H_0) and no evidence against H_1 (since the sample average is corroborating H_1). The s-value for H_0 is almost zero and the s-value for H_1 is one.

c) If the sample average is 9.9 , then 9.9+1*1/10 = 10. That is, we have no evidence against H_0 (since the sample average  is corroborating H_0) and also a not strong evidence against H_1 (since the sample average is near H_1). The s-value for H_0 is one and the s-value for H_1 is approx 0.3 (i.e., H_0 does not pass with high severity)

d) If the sample average is 10.1, then 10.1-1*1/10 = 10. That is, we do not have strong evidence against H_0 (since the sample average is near H_0) and also we do not have evidence against H_1 (since the sample average is corroborating H_1). The s-value for H_0 is approx 0.3 and the s-value for H_1 is one.

My impression is that the Deborah’s approach tries to accept H even when the data do not corroborate with H.

Deborah’s severity:

For my example H_0: \mu \leq 10 vs H_1: \mu > 10:

SEV(\mu < m) = P(\bar{X} > \bar{x}; \mu \geq m) = 1 - P(\bar{X} \leq \bar{x}; \mu \geq m)

Notice that

P(\bar{X} \leq \bar{x}; \mu \geq m)

is the p-value for the following “inverted”  hypotheses

H_0^*: \mu \geq m  vs  H_1^*: \mu < m

Then, SEV is high whenever the p-value for the “inverted” hypotheses is low. This means that the event “\mu \geq m” is quite improbable for the observed data. The p-value for the inverted hypotheses H_0^* vs H_1^* is:

a) very small for m=10
b) not applied (since we reject the null)
c) not small for m=10
d) not small for m=10.

That is, we have the same conclusions as I showed above with the s-value.  If there some think wrong please let me know.

All the best,

A non-parametric method to estimate the number of clusters

“An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensive Monte Carlo simulation studies show that the slope statistic outperforms (for the considered examples) some popular methods that have been proposed in the literature. Applications in graph clustering, in iris and breast cancer datasets are shown.” (Fujita et al., 2014)

Fujita et al. (2014). A non-parametric method to estimate the number of clusters, Computational Statistics and Data Analysis, 73, 27-39.

Link to the paper

Quantum measure theory generalizes classical probability theory

There is a quantum measure theory (an extension to the mathematical discipline called “measure theory”) that goes as follows:

If M is a quantum measure and \Omega is the universe set then:

1. M(\varnothing) = 0,
2. M(\Omega) = 1,
3. For any disjoint sets (measurable in the quantum sense) A, \ B and C: M(A \cup B \cup C) = M(A \cup B) + M(B \cup C) + M(A \cup C) - M(A) - M(B) - M(C)

Notice that, if A and B are disjoint sets then, in some quantum experiments, (A \cup B) cannot be always measured from the measurements of each isolated piece A and B as is usually considered in the classical measure theory. In these cases, we must compute a specific measure for the set (A \cup B). Naturally, if

M(A \cup B) = M(A) + M(B)

for all disjoint measurable sets A and B, then the usual probability measure emerges, but it is not the case in quantum experiments. The axiom 3. is called grade-2 additivity

There is a connection between M and the wave function. For more on this, just google it: “quantum measure theory”.

Alexandre Patriota