Here is a small example from my history:
A man once sent me permanents (these were recorded “spins”, roulette ball throws from a certain bowl, that’s what you call it, where the ball later rolls in). At first I thought: a joke, what am I supposed to investigate? After all, he was willing to pay something for my analysis of the numbers. So I went to work briskly to collect the money, admittedly. I didn’t think I would discover anything interesting. He himself, by the way, kept a low profile, saying nothing except that I should have a look at them. So that I could approach it with an open mind.
But after some research, it actually turned out that in a certain area of the cauldron the numbers appeared much more frequently than from the rest of the cauldron. The man had been taking notes for a really long time and the variation was obvious. Now one begins to speculate and also to calculate (it occurs to me that I had once read a Jack London story where a roulette wheel was right next to the stove; for some reason certain numbers came up more often). Everyone, whether they are mathematicians or not, considers a certain deviation to be possible or realistic and another to be unrealistic, too large. Even statisticians can only give vague answers to such questions. So: which deviation do we tolerate, do we consider realistic and which “too big”? Above all, there remains the question of preconditions, which is always floating in the room but remains permanently unanswered.
All right, I’ll try my hand at statistics: first you calculate the standard deviation. Then there is a statement about the standard deviation, which reads: 68.3% of all measurement results should lie within this standard deviation, 95.4% within twice the standard deviation, 99.7% within three times the standard deviation. So much for the mathematics on this subject. So when a so-called “outlier” occurs, you just accept it. It was unlikely, but it happened anyway.
But what were the results of the permanents provided to me? The values for these four numbers, considered individually, i.e. for the examined boiler range, were all outside, or more precisely above, three times the standard deviation. So each individual reading would only have a 0.3% chance of being that far above the standard deviation. But maybe it was all just a coincidence? It can happen, can’t it? To give myself (even greater) certainty, I took the four values together. This is justified in so far as they are adjacent numbers. If there had been four numbers scattered over the cauldron, this would not be acceptable, but with numbers next to each other? I can look at the values as a sum. Does that make sense intuitively?
So by looking at these four numbers together, I increase the certainty of the statement that it is a boiler error.
The four values taken together produced a deviation that was outside five times the standard deviation! Now my statement “boiler error” already becomes quite stable, one then also likes to say “with probability bordering on certainty.” But even then one cannot be completely sure that the statement “the kettle is wrong” is correct. There always remains a possibility of error. That is how curious and helpless statistics are. Even what one wants to state is only valid with a certain probability. I make a statement and prove it with a probability of error. Funny, isn’t it?
The story went on like this, by the way: I gave the man this answer and calculated for him the chances of winning, taking into account the probability of error and so on. The idea was, of course, to play at this game for the long term with high stakes and win, what else? But he then refrained from doing so anyway. A high capital investment would have been necessary. This would have had to be provided by someone on the outside. And in the end, the uncertainty (especially for the potential sponsor) was too high, whether the boiler would simply be (was) serviced or replaced at some point. So my efforts were fruitless in this sense.
Now I have carried out a small simulation (the computer does all this, the apparent “random experiment” uses the random numbers provided by the computer, which are of course calculated by a function, but a function always follows a calculation rule. And just because I don’t know them doesn’t mean that you can’t “know” what comes next. Nevertheless, for our purposes here) of 1000 spins is sufficient. The following diagram shows the result:
I am happy to print another one to refute the accusation of selecting a prefabricated example. To interpret this diagram: The mean value is of course 1000/37, i.e. 27.02. The ball was rolled 1000 times, all events are (apparently) equally probable, i.e. mean value 1000/37. So far clear.
In this series of experiments here, the values are reasonably normally distributed. One comes more often, one less often, whatever else. Should they all come equally often? Nevertheless, there is a recognisable outlier upwards. The 37 actually came 42 times. Mean value approx. 27, one value comes 42 times. Tolerate, accept or look for errors? A programming error? The statistics answer is this: You calculate the standard deviation for this distribution here using the binomial distribution. This distribution is a sequence of 1s and 0s. Every single number obeys its laws, because every single number comes or does not come, 0 or 1. The standard deviation of the binomial distribution can be calculated exactly. For 1000 trials it is
√1000(1/37)(36/37). That is 5.12. The triple standard deviation is 3*5.12, i.e. 15.36. 27.02 + 15.36 = 42.38. So the measured value, the outlier, is just within the triple standard deviation, but outside the double standard deviation.
But now we have to look even closer. The probability that a value lies above three times the standard deviation is 0.3%. But we have 37 values. So we have 37 attempts at 0.3%. The probability that an event with a probability of 0.3% occurs once in 37 trials is already 10.5%. So in one out of 10 trials we would already expect an outlier that is outside three times the standard deviation.
By the way, I actually repeated the experiment 100 times afterwards (believe me, it was not mistrust of my companion, rather curiosity). And indeed: in the 100 runs, there was such an outlier 11 times.
As promised, a second chart. We have “outliers” upwards and downwards. And indeed the 42 as a maximum once again.
Now I’ll show you another manipulated diagram, which may have roughly corresponded to what I got from the permanences (unfortunately the data are no longer there).
Now you just have to imagine that these 4 outliers are next to each other in the kettle. Each one would still be tolerated. But four values next to each other? If you combine the values, the following picture emerges:
Now there are only 9 values. Since 94 is not 37, I had to leave out one value. The standard deviation for this experiment would be, according to binomial distribution, √100001/98/9 = 31.4. 331.4= 94.2. The mean value would be 10000/9 = 1111. 1111 + 94.2 = 1205.2. The maximum, however, is 1502. So intuitively I said: There must be something wrong here (there is an obvious reason why some of the other values are far too low: A few values came far too often).
As I said, because I no longer have the exact values and the result here was only achieved through manipulation, it is somewhat difficult to formulate an exact statement. The example essentially served only as an illustration. However, even such a result would be possible with absolutely correct boilers. That must be emphasised again. But the probability is very, very low.
I would just like to conclude by trying to make you aware of these concepts and different forms of random experiments. So in theory there is an experiment where there are n (always these variables, the mathematician is running away with me, just say 6, that fits) outputs, all n are equally probable. This is the idealised probability space assumed for LaPlace experiments.
In practice, at least, it becomes very difficult to get such an experiment to work at all. So mostly we have different outcomes, all of which have some probability, which is mostly unknown to us, but where we have a good estimate because of the experimental design. In the end, the values cannot be determined exactly. But when I mention a LaPlace experiment in the future, I tend to mean a “Pauli experiment”. And even that is still idealised. In my experiment there are also n outputs. These n outputs are all known but not the same. The Pauli2 experiment, that’s real life. There are an infinite number of outcomes and not a single probability is known. But that’s just a joke. (It occurs to me that a good friend of mine occasionally offered, in order to document his lack of chances at a certain game: “Come on, we’ll play heads or tails. I’ll take edge”. So much for “there are only two exactly equally probable outcomes to a coin toss”).