Approaching the problem of forecast quality testing
So now we have seen what types of bets there are, We have seen what forms of betting providers there are. And we have learned that the good player is characterised by the fact that he has money, that he has a lot of money or even more money. The worse player has to go begging. Or learn, that actually always works. But still, we ask ourselves: isn’t there a method with which we can check the quality of predictions, independent of the financial outcome, in the long run? We also have to disregard the unfairness of game outcomes and rely on what mathematics promises us in principle in such cases: in the long run, everything evens out.
As you can see from the introduction, even as a bad dramatist, I have a solution ready. When I found you back in 1993, I had no idea that it was actually a new concept. It seemed so simple in principle that I rather blamed my poor memory for not being able to remember this method from my studies and having to work it out again. But I was wrong.
So I’ll formulate the problem again: Is it possible to test the quality of a forecast, an estimate, a probability prediction for an event with an undetermined, i.e. completely unknown, probability of occurrence in the long term? A not insignificant additional prerequisite is that the experiment is really only carried out once under the given conditions. Otherwise, we could gradually approximate the probability of occurrence on the basis of the relative frequency by repeating the experiment (there are answers to such questions, by the way). Not so with an experiment that cannot be repeated.
What mathematics offers on this subject is inadequate. Because it does not deal with unknown probabilities of occurrence. We move beautifully in the self-imposed and simply defined Laplacian probability space, and we don’t need anything like that. And then the laws are quickly derived and the statements formulated. That’s it. The chapter on probability and statistics is done. The only place where these considerations actually play a role, or better still, this whole branch of mathematics, is in actuarial mathematics. Only there, calculations are made with such gigantic profit margins (for reasons explained much further above, with the consent of the bettors, i.e. the policyholders) that misjudgements play a rather subordinate role.
So we are forced to find our own method. Now this method is actually based on concepts that have been known for a long time. So I am guaranteed not to be an Einstein with this invention. Rather, I have merely succeeded in putting one and one together. And Adam Riese is said to have managed that…
Apart from that, in a helpless attempt to increase the tension, I confess that I will indeed only be able to answer the question of the problem formulated above with “No”. It is NOT possible. At least, it is not possible to examine a single assessment. For, there it remains, as also expressed before, merely a statement, in the form of the prognosis “The event occurs with a probability of x per cent. The counter probability is 1-x percent.” Then the random experiment, the unique one, is carried out. And indeed: the event occurs or the counter-event occurs.
So we definitely cannot test a single prediction. Since we cannot test a single one, in principle we cannot test many different ones. But we can test the quality of the predictor. So this person repeatedly assigns a probability to such events (one-time, non-repeatable ones). And little by little we find out whether he is a good prophet, if you like this term on the subject. And a good prophet can even be identified by his own numbers with the help of my method.
The quality check of this prophet in comparison with other prophets still seems relatively simple. Because it goes like this: Bet on your own estimates, count money afterwards. Good prophet = lots of money. But a method that can be measured against oneself as well as compared with others, that sounds tempting, doesn’t it? So one could make predictions on any event without any comparative figures and then make the statement d with one’s own numerical material: The predictions were good.
Enough suspense, here is the method. All that is actually used is the expected value. That sounds amazingly banal. But the expected value of a probability? That sounds a bit reckless. Nevertheless, this value exists. How can it be determined? Well, in the same way as an expected value is determined. By multiplying the individual probabilities by the result value of the random variable. But here the random variable is itself a probability. So a 70% event occurs at 70%, a 30% event occurs at 30%. So we multiply out (in the example the numbers do not add up to 100% by chance), so 70%70% + 30%30% = 0.70.7 + 0.30.3 = 049 + 0.09 = 0.58 or 58%.
So if we estimate an event whose probability of occurrence we can only estimate and which will never be known to be 70%, then we would produce an expected value of 58%. What does this expected value mean? In principle, we can help ourselves quite simply. We pretend that we actually know the probability of occurrence and can repeat the experiment. And then let’s see what would happen in the long run.
So if a random experiment is actually distributed 70:30, then we check the quality of the prediction. The mathematician had no need here because he already knows the probability. But we do have a need. Nevertheless, if the probability of occurrence is known, the example serves as an illustration. So let’s take the popular game of drawing a ball again, but with putting it back. So we have a pot with 10 balls and, as usual, the balls in it are red and white. In this case, 7 red and 3 white. We draw a ball, note the colour and put the ball back. What do we expect now?
In the long run, we expect to have drawn 70% red balls and 30% white balls. That is clear. We also have a predicted average expected probability. This is 58%, as calculated above. So we note in four columns: Predicted probability, the counter probability, the expected probability and the probability of the event actually happening.
So it always looks like this:
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 30.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 30.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 30.00% |
70.00% | 30.00% | 58.00% | 30.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 30.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 30.00% |
70.00% | 30.00% | 58.00% | 70.00% |
70.00% | 30.00% | 58.00% | 70.00% |
Column 1 means: Probability of red ball. Column 2: W-ness white ball. Column 3: average expected W-ness, column 4: W-ness of the event that occurred. Now form the sums of the last two columns: What did we expect to happen and what did happen? You will notice that the sums are absolutely identical, 1160% or 11.6. In this case, this is because we expected that in 20 trials, “red ball” would come 14 times and “white ball” 6 times. In the long run it would certainly be like that, in the short run it is coincidence. Nevertheless, the congruence of the numbers gives us an indication: what we expected happened. Our measure in the column “average expected W-ness” is correct. Yes, there is a reason for that too: we have calculated correctly. We have determined a correct value for the average expected W-ness. Since it is valid for a concrete example with known probabilities of occurrence, it is of course also valid for unknown and non-repeatable events.
So in the long run, we can measure the quality of our predictions by our own predicted probabilities of occurrence.
We simply note for each predicted event a W-ness, a counter W-ness, an expected W-ness and the W-ness of the event that occurred. At the end, we sum the last two columns and compare them.
Now the only question that remains is: When was the prognosis good or: When was the prophet good or: How can you recognise a good prophet? And that, as I will show in a moment, is anything but obvious.