Just a little provocation:
The definition for the conditional probability is similar to the definition of division: let x be the number such that .
As , and provided , this value x is always a well-defined number between 0 and 1 that can be recovered from and . We can understand this number x as a value of a function of the events A and B, , since this number varies with the events A and B.
It is possible to show that this function is a probability measure over the first argument, when B is fixed: that is f(.,B) is a probability measure: , and if C and D are disjoint sets, then . Define .
The interpretation for as “the probability of A given that the event B has occurred” seems to be fictional, since is just a number such that .
One can also begin from the “below”, by defining the conditional probability and then building a joint probability space. In this case, the interpretation for P(A|B) as “the probability of A given that B has occurred” seems to be more justifiable.
Let us explore it a little bit more. If you first define the function , then you must have a sigma-field for the events for the argument of the function , since for each fixed B, this function must be a probability and hence you must define a list of sets to be measurable by : i.e., you must define the domain of . The symbol “|B” seems initially to mean that the probability was built by using the *information* contained in B and you can write instead . Well, this is how any type of probability is built, naturally also likelihood functions, joint probabilities, marginal probabilities and so forth. The problem is how to justify a probability space for the conditioning events, since they may not be measurable in the probabilistic sense. For instance, when the probability measure is built by employing some deterministic laws, such as via differential equations, in this case, B contains our knowledge about differential equations, knowledge about the relation among the elements of interest and so on. Can it be measurable in terms of probabilities? some conceptual discussion is needed and maybe this is not the right place to do it.
Well, you want to start from the “conditional” probability — which is not really a conditional probability in the usual sense, since B might not be measurable in terms of probability laws — to get a “joint” probability measure.
Let us assume that B is a measurable event in terms of probabilities. You must expand the initial measurable space to built a joint measurable space. First, you have to build all probability spaces , such that for all “conditional” sets (a non-pathological sigma-field of the “conditional” subsets of B), and, finally, you must define a probability space to be applied in all “conditional” sets B in K, say . Then you define to be ; naturally that and must have both special behaviors, otherwise W is not well defined; this was just an informal description.
As we saw, it is quite easier to start from “above” than from the “below” to built conditional probabilities. Is the interpretation “the probability of A given that the event B has occurred” for a fiction? Well, we can argue that all linguistic artifacts are fictions, even this post, but some are useful and others not.
PS: It is just a thought provoking note, please do not be angry…
PPS: In mathematics a definition (according to Suppes theory) must be eliminable and also non-creative. It means that all results obtained from a specific definition should be attainable without that specific definition, otherwise contradictions emerge from this creative non-eliminable definition. The sentence “definition is eliminable” means that its definiendum can be replaced by other definiens. The definition of $ latex P(A|B)$ must comply this criteria, however is not eliminable, the reason is that we cannot substitute the definiendum by any other definiens, since it does not have a definiens for the case . On the other hand, we can define as a number such that .
P . Suppes. Introduction to Logic . Wadsworth International Group (1957).