How do you check the quality of a forecast?
The question alone sends shivers down your spine. “What now, please? Does the man want to be or become a prophet? A prediction, a prophecy comes true or not. And in terms of prophets, I’ve only known Isaiah and Jeremiah.” But that is not quite the truth. In order to understand it, one must first try to clarify the concept of prediction.
By a prediction one may understand that one gives an estimate of the probability of an event in the future that is as exact as possible. Once you have done this, it becomes interesting in the sense that you either want to make an assessment of the quality of this prediction for yourself or, even more interestingly, compare your own prediction with that of another predictor. Here we are now looking for a scientific method.
I try to introduce the reader to the problem gradually. There are events that we want to predict whose probabilities are known, or at least seem to be known, but also later of events whose probabilities are unknown. When the problem of non-repeatability is added to this, a mathematician turns away anyway. For his understanding, he can no longer say anything about it. But for other people, too, the problem loses its clarity and, both in the case of occurrence and non-occurrence, people tend to refer to “fate”, “it’s all just chance”, “luck or bad luck” or to faith, from which, however, they could then also fall away again.
After studying this chapter, the reader can confirm that it is possible to make a statement, and even in a mathematically flawless form. And that the statement found even has a practical meaning.
1) Philosophical preliminary considerations and the imperfection of language
The basic line of thought is that every event lying in the future must be assigned a probability between 0 and 1. The probabilities 0 and 1 are not assigned. There is no certain event (value 1) and no impossible event (value 0). There is always an imponderability. I would like to briefly mention the two most ridiculous but perhaps immediately catchy justifications here: A precondition for the occurrence of an event lying in the future is the advance of time. And although there is sufficient evidence that it will do so, I certainly cannot prove it. Since, according to the chapter “Nihil is – Nothing is(really)”, in principle one cannot prove anything at all, the same applies here: We rely on axioms. The smallest, unprovable statements are simply “swallowed”, accepted. Nevertheless, one is allowed to have doubts about them. Why couldn’t time stand still? Or does it run cyclically? Perhaps there is the parallel timeline from “Back to the Future”? The concept of time reaches the limits of human imagination. What is time? You look at the sky at night and see stars. But the stars no longer exist or are somewhere, but not in the place seen. As proof, let us just mention that one likes to measure the distance to a distant planet in light years. So light travels for years, sometimes millions of years. And the speed at which light travels seems to be partly responsible for the passage of time. The fact that we rely on this fact – although at the same time we do not understand it – is nothing more than an empirical value. “Tomorrow, too, the sun will rise.” Yes, it will, it will.
The second argument, which I find powerful, is that language does not provide unambiguity. One relies on the fact that an imagined thought can be translated into a word, the words into a sentence, the sentences into a context of meaning, this structure can then be spoken by the mouth, heard again by our counterpart in the same way as we said it, and then transformed again by his brain into an identical thought. As beautiful as this idea may be. It is simply impossible for it to happen that way. What does a thought actually look like? One should also take into account that an identically spoken sentence can create endless complexity through gestures, facial expressions and intonation.
Here, for example, a concept is the concept of “future”. A human concept and as imperfect as man himself. When is the future, please? Does it happen at all? Another is the occurrence of events. That is already an expression. When does an event occur? And many terms such as “love”, “God” or “happiness” are suitable at most for conceptions anyway, not for fixed definitions.
Just to give a tiny example of how ambiguous language can be in connection with the “occurrence of events”, this here as a pure fairy tale:
A person throws a die. He would like a 6, because he has bet on it, 10 euros for odds 6.0, somehow a fair value. He apparently throws a 6 and is already cheering. Only I suddenly object: “The dice is on fire.” The thrower is outraged. “It’s not burning, is it?” I ask him: “What is the definition of burning?” “Well, what is this nonsense? The number is clearly visible. The cube is lying straight on the ground.” Me: “I have never seen a cube lying straight on the ground. There is no such thing. Parallel to the ceiling? Is that what you mean?” “Well, something like that. I don’t know, it’s a six, it’s on top.” Ok, although I would in no way be dissuaded by the fact that the cube is not lying straight and that the “burning” was not defined anywhere, I relent. But I just cheekily continue asking: “Why is that actually a 6?” “Well, because there are six dots on it.” “Aha, hang on, I’ll quickly get the microscope.” I hold the cube under it and count the dots I see. Oh dear, I stop counting at 85, it’s no use. “Uh, I’m at 85 now, how many did you count?” He was only at 74. But somehow also intrigued.
We don’t quite get anywhere. First he doesn’t accept that the cube burns, he literally insists that it doesn’t, and then he counts 74 instead of 6 dots. “I meant the big dots.” “At what point is a dot big?” But fine, I give in on this one too. Okay, fine, he rolled a 6. But please, I ask one more question. Sure, we had bet and he wanted the 6 and now gets money, according to him. I ask: “What dice did you bet on?” “Well, now it’s getting too much for me. On the next one, of course.” “I see, which one is the next one?” “Always the one that comes up now.” “All right, then please throw the dice again.” “I already had now. The previous one was my next one earlier, when we had made the bet.” Me: “Oh, now it’s getting colourful for me too. The terms change their meaning over time. You meant the next one and suddenly claim that the one behind was the next one, but at the same time admit that it was the previous one.” He is about to throw the dice cup at my head. I remark in time: “Besides, just to support you for fun: For me, the next one even now would be the one you just did, which is the previous one. It is the next one in the sense of ‘time’, because you haven’t done one since then. But in this case it is the next one in the past. Did you want to bet on that one? Then I would have had to collect earlier, because the next one in the past was the one you made a fortnight ago. And that wasn’t a 6 but a 93, although the big dots were only three on top, as I could also see. That one was clearly on fire. Also, you were throwing from your hand at the time and I meant that only throws from the cup counted.” And even though he clearly lost the bet, I pay out.
You can’t. You just can’t manage to say something clear. You try again and again, you look for it and you would like to have it. But it doesn’t exist. In the end, you always have to rely on humanity. On the goodwill to want to understand. You can’t do without it. If you don’t want to understand, you’ll manage. Let me remind you of the children’s story. The child simply asks again, “Why?” to every answer. If you don’t know my texts, you will inevitably become aggressive at some point.
One should only be grateful once in a while that I didn’t happen to study law…
But I go on blithely, counting on the fact that I continue not to be understood. Every event that lies in the future has a probability between 0 and 1. I can already hear the now again constructed example: “But Christmas and Easter really can’t fall on the same day.” To this I can really calmly reply, “Wait and see.” They are only human concepts. Always remember: you can’t prove anything. You don’t know if time is moving on. And you can’t say anything definite.
Even about death, which some people are convinced is inevitable, nothing definite can be said…
How close mathematics and philosophy are to each other can be seen just one heading down. And that is a conservative estimate of a maximum of 1.5 centimetres…
2) The LaPlace Experiment
The LaPlace experiment basically assumes that there are n outcomes of a random experiment and that all of these n outcomes occur with the same probability, i.e. with 1/n. This is also called the “ideal experiment”. This is also referred to as the “ideal probability space”.
This is a pure idealisation. And not only does it not exist in practice at any point in time, there is also very rarely any intention for it to be so at all. Let’s take the above example again: As soon as there is a bet on such an event as “I roll a 6” or also in roulette the event “in the next throw there will be a black number”, there is an intention that the event will not occur with the assumed probability if possible, but one wishes for it to occur, even if there are conflicting interests. So one side wishes that the 6 or a black number will fall, the other the opposite If you have a good lucky charm, the horoscope promises you the maximum number of five stars for today, you were born under the favourable star anyway like Gustaf Gans or you have stumbled upon luck in some other way (“today is my lucky day”; which, according to legend, is supposed to bring bad luck in love at the same time), there is the possibility that you will “outdo” the opponent’s lucky charm or his horoscope and be able to shift the odds in your own favour.
So, in the end, these intentions may just, being opposite, have no real influence (“who has the better talisman?” seems a rather curious question to me), but at least they exist, and pretty much always do. When you play the lottery, you are assured that the machine has been “checked for proper condition”, yet there is still the intention on the other side that the odds are not 1/49 for all numbers. One picks one’s lucky numbers, for example. And why are the numbers called lucky numbers? Exactly…
For me, however, two other systems would suggest themselves: One is to play numbers that are less bet by other people (usually numbers outside the maximum number of days in the month), which does not (discernibly) affect my chances of having these numbers right, but in the case that I do have them right, to get a higher payout for them, since you presumably have to share with fewer players. This is also commonly called “improving equity”.
The other system, however, which contrary to any conventional views seems “completely primitive” and “ill-considered” — since the mathematician is so fond of referring to “normal deviations” and his beloved “standard deviation”, which simply allows for a certain tolerance — I would still prefer: I go for the so-called “relative frequencies”. Why not? I explain this with an example:
A number has occurred 264 times in absolute terms in the (last) 2000 draws (that is, after all, about 20 years, with two draws per week). Since the simple arithmetical task “I would have expected 2000*6/49 = 244.9 times”, which a mathematician previously thought to be skilful, results in a deviation of 19.1 times that this number has occurred “too often”. The mathematician tolerates this, in view of the small deviation, as “within the simple standard deviation” and thus, from the mathematician’s point of view, as “completely normal”. But this standard deviation is calculated on the basis of an absolutely unrealistic consideration that all numbers are “equally probable, and accordingly will occur equally often in the long term”. The deviation therefore refers to a (false) assumption. For me, there would be no deviation at all, since for me the relative frequency (rather) corresponds to the probability.
Although I was particularly attentive and willing to learn at this point in school and university, I see little evidence for the idea that there is an equal distribution. It is quite possible that this number is also just a “normal deviation”, it is even possible that this number of all numbers is (much) less probable than the others and still occurred too often in the time span. However, this statement itself is also much more improbable than the statement “the number comes with a higher probability than the others.” So I look at the relative frequencies and deduce which numbers occur with what probability. Since the mathematician promises me, in the worst case, that the numbers I then play “are also only equally probable”, I take no risk in any case. But only in the case that I would play anyway. Otherwise it is the same “risk” as for any other player. And that consists of a negative equity, an expectation of loss. You play a disadvantage game.
As a winning system, however, it will certainly not suffice to rely on relative frequencies. Especially since one would always have to reckon with such considerations – keyword: the world behaves so chaotically – that one would come across imitators who would immediately start playing the more frequently occurring numbers, following my recommendation, whereby they would also increase their chances of “winning at all” by following the strategy, but at the same time the payout ratio then achieved would already result in a more unfavourable ratio than with any other combination of numbers. As a result, the really overly clever reader might even have to play the less frequently occurring numbers in order to secure a winning advantage. He would have an even smaller probability of the six, but a much higher payout. The example is only meant to illustrate that the mathematics at this point is quite shaky in principle. But somehow nobody cares.
Using the LaPlace space, the shrewd mathematician can make all kinds of theoretical calculations and always pat himself on the back with satisfaction (while I, by the way, pat myself on the back) because he believes he has now really understood the world. Alternatively – and this is what the larger part does – he turns away from this indescribably treacherous probability calculation – I recall the feel-good situation of “we now prove that this statement is true” as a favourite pastime and the statement I put opposite it, “probably only means something seems true”, unprovable, which automatically triggers unease – and does “real mathematics” from now on. So the one to run away with.
There is no such thing as equal distribution. At no time even means that one can possibly wait until one knows more. So if the reader would like to play the lottery, he can wait until the Saturday horoscope. Is that supposed to hurt? One can also wait to see if one inadvertently and absolutely unintentionally actually gets up with one’s left leg and then consequently refrain from playing. Alternatively, however – and there is even some evidence that is gladly concealed – you can wait until shortly before “rien ne va plus” in roulette and only then bet, in the hope of being able to recognise from the course of the ball in which area of the pot the ball is likely to land, in order to then bet on this area of the pot (which, as an explanation, is actually possible with “orphelin”, “small series” and “large series”). The shift in chances before the throw is in principle indifferent to this person. Here, the time of setting is decisive. In any case, even with these considerations, taking into account the never uniform intentions, it in no way amounts to an equal distribution. Even when throwing the dice, as shown above, there are intentions, times and methods of execution (hand. cup, base). There is no equal distribution, this is an invention. According to LaPlace “an ideal”.
As a last component, however, it should be mentioned that LaPlace himself was aware of this circumstance and he pointed out that if all relevant parameters were known, it would possibly be possible to calculate the result of a single experiment exactly in advance. And even if, at the latest since Heisenberg (indeterminability of location and velocity in the case of very small particles), this theory is at least more than questionable, it is nevertheless certain that one can achieve partial predictability. So if the reader were to try to roll as many 6s as possible than other numbers in one afternoon, and if he succeeded in doing so he would receive a rich reward, then it would be conceivable, thanks to the additional motivation, that after several hundred rolls, all of which are made on the same surface and with the same sequence, from the cup, for example, he would nevertheless gradually realise that a 6 comes a little more often if he puts the die in this position in the cup and then rolls with such speed along that axis. If you are interested, you are welcome to try. But I wouldn’t want to promise a reward. You would have to turn to an ambitious and stubborn mathematician.
I will now devote myself to another curiosity: card games. There are quite funny and exciting considerations there. For example, people like to talk about a “well-mixed pack of cards”. The big question that comes up is: When is a pack of cards well mixed? When the cards are as mixed up as possible? Curious: If they really are very mixed up (which in itself is not a criterion, because it is indeterminable what should be mixed up), then the next question would still follow: Do you get cards in your hand? The goal of every game must be that cards are revealed at some point. If you do get some, are they dealt one at a time? Or, as is often the case in Skat, in a 3-4-3 rhythm? What if you deal them one by one, they are totally mixed up, but by dealing them that way they are sorted back? So you get better hands than you would expect and that’s just through the dealing routine?
Or maybe it’s the opposite: you want to shuffle well, but you don’t succeed (for example, when shuffling in your hand, which is common, but because the game is supposed to go on, impatience arises and you shuffle too short), you deal in sequence, one by one. Now the effect is that, for example, tricks from before are still together, i.e. cards that match each other, and because these are dealt individually, it is more difficult for a player to get them into his hand. The result: worse hands than could be expected. Pure chaos.
But the decisive aspect is yet to come: after the cards are shuffled, the order is fixed. This means, for example, in relation to the currently most played game Texas Hold’em Poker, that there is actually no probability at all that an ace will come next. Either there is an ace there or there is none. 1 or 0. Only you don’t know and perhaps have no chance of finding out.
To add a little more fuel to the fire of indignation that has been blazing for a long time: there are also marked cards. Maybe you can even recognise the top card? Oh, the one with the dog’s ear. Maybe you want to recognise it? Maybe you marked it when it was in your own hand, intentionally, accidentally? Maybe – and this is more than just a story – there are packs of cards that allow you to see through them and you can even recognise them if you choose pre-made contact lenses skilfully? Who is still talking about “well shuffled” and “1/50” for the ace of clubs now? Even more so here: is there an intention?
Nevertheless, one could also speak of it here – purely theoretically, that with good people without recognisable intentions, the equal distribution is given in the approach, that is, that every card in a well-shuffled game with a chance of 1/52 also lies on top. But now we come to…
3) to the probability spaces that cannot be determined exactly.
Up to this point, I have deliberately stayed with these seemingly simple examples. After all, they clearly promise a certain estimate of how probable the individual outcomes of such a random experiment are. The drawing of the lottery numbers is “monitored”. All right, I have mentioned the relative frequency as a better approximation for determining the probability of occurrence than the mindless 1/37, but I do not claim by any means that one could win with this. In roulette, too, the kettles are regularly checked and maintained. Even, as Mr Rudolf Taschner writes in his book “Zahl, Zeit, Zufall” (Number, Time, Chance), the organiser really does intend to make sure that everything is correct and fair, that the probabilities of occurrence are as accurate as possible. This is by no means a guarantee of equal distribution, but it is certainly a good approximation. The factor of timing plays a role there again (can you observe anything?), but otherwise I would advise against playing the numbers that have come more often than others because of the permanences that are made by all roulette tables. If you are going to play, then play those numbers. But I wouldn’t count on it being enough to win.
In private circles, the situation is somewhat different. Sometimes things are deliberately manipulated, people try to “pull the wool over the eyes” of their playing partners, for example (as has happened to me more than once) by shifting the odds in their own favour through manipulated gaming equipment. Even when playing backgammon this has happened and perhaps not even a rarity. At the very least, every player wants to roll the better numbers. And that alone can have an influence. Who knows? Of course, the opponent in turn may also wish bad rolls for oneself or wish for good rolls.
The bottom line is that in no random experiment, no matter how well conducted, is there an exact equal distribution. Both the timing of the assessment and the intentions of the participants play a role. But even if it is agreed that all participants are absolutely honest and sincere and want to carry out a purely random experiment – no one expects an advantage, no one tries to shift the odds even with the power of thought or other aids, no one even tries to include parameters that could be known, and even the time is set in such a way that there is no knowledge whatsoever about how it will be carried out, except that it should, by agreement, be “as fair as possible”, an equal distribution does not exist. After a certain number of executions of the experiment, one has only relative frequencies with which each outcome occurred. And these are possibly better indications than the statement, “each of the n outcomes occurs exactly with the probability 1/n”. Even if the statistician could explain to you afterwards that the deviation that occurred is “normal” or “tolerable” for him. Because the following applies here: Deviation from what? From 1/n? Who has given 1/n and why? THERE IS NO 1/n!
Since no random experiment, no matter how beautifully conducted, can deliver an “exactly determinable probability” — and please note here that the concept of probability contains something reckless, something not determinable through the use of the individual terms “seems” and “true”, i.e. it only seems true –, we can confidently move on to the events for which we allow even less exact than the above admittedly good approximations. And I will take a very simple example, which can already make a lot clear: The weather forecast. And even in this case I will limit myself to a single tiny example: the probability of rain.
Since one can collide with exact definitions here (as is demonstrably always the case), I will try to ignore them. But first, consider the presumed historical derivation of how probabilities were arrived at. I could imagine that a few hypercritical listeners tormented the meteorological institutes with calls. “You said it was going to rain. But it’s been sunshine all day in our area.” All right, the caller might still be able to cope with that. But the other way round? “You promised sunshine and it rained at our place.”
To get around this, it was decided at some point – conscious of suddenly giving bad information – to simply give a probability. One can always reply to any outraged caller, even if 99% sunshine is promised, that one can also expect a 1% chance of occurrence. The attentive reader will even notice, according to “Murphy’s law”, that every event, even those with a very small probability of occurrence, must happen at some point, if you only try it often enough. So an 80% probability of rain guarantees no rain, a 10% probability no sunshine. The viewer/listener should please accept this. If the forecast is not enough for you – we simply cannot get any closer.
Now, I do the overruling of exact definitions in the following way: I say that one takes a certain place and a certain period of time, plus a certain amount of rain that has to fall in the period of time, so that the statement “It rained” can be evaluated as “true” or “false”. Even if I do this in the knowledge that it is impossible to really classify every event as such and only such. Consider this: What if the amount of rain measured is only almost exactly the minimum amount one has to assume (or would the reader be satisfied if someone stood exactly where it was measured, then stretched out his hands and at some point said: “There, I got a drop. You too? For me, in this experiment, the statement: It rained.”), corresponds? So there is an accuracy of measurement that stands in the way of exactness. There is also the possibility that it really rains exactly at the end of the measurement period, i.e. a few drops, and that the limit quantity is exceeded exactly at that point (“Hey, you have been measuring for too long. That’s why you wanted to call the day a rainy day, even though it wasn’t.”). So, we now pretend that all these problems do not exist. Any period of time can be evaluated as “it rained” or “it didn’t rain”, “true” or “false” according to the given criteria.
Nevertheless, one does not know these probabilities. And even the best meteorologist could not come closer to the truth than to give a probability, be it a large one. Now a few meteorologists are arguing about who is better at estimating the probability of rain. And for the simplest case, we even assume that we know an exact value. Only to observe what then happens numerically. To check the mathematical stability of the model, so to speak. And that is legitimate.
Yes, because now I’m moving on to the measurability of the quality of predictors. There is a mathematically sound method for expressing this in numbers. Of course, I have to say right away that the statements that can be teased out are not very different from other statistics: There are possibilities of interpretation and there is the possibility of misstatement. And, to make the confusion complete: There is also a probability for the “false statement” event. This is how the statistician helps himself.
4) Rain probability prediction checked with simple methods
All in all, there are two tasks here: One tries to approximate the estimation of the probabilities. Then one tries to check one’s results. When one has mastered these two tasks to one’s satisfaction, one can pass on to measuring one’s values on oneself – my method is also suitable for this – and even then to compare them with those of other prophets.
It is almost a joke, but mathematically it stands up to seriousness in a certain way: If one should say to oneself or aloud on every day: “It will rain or it will not rain.” Then one is – just mathematically seen – even very right. It will or it won’t. You have more or less said “50 -50” or “I don’t know”. But it is true. As we will see later, this even stands up to a statistical test. But of course one almost inevitably asks oneself whether one might not know a little better? Funnily enough, the 50-50 forecast does exactly what was doubted above: It is based on an absolute equal distribution.
With 50-50, one forecasts exactly what was ruled out per se: Every event is equally probable. One cannot or does not want to “commit oneself”. And it is precisely this concept, that of commitment, that we want to derive and make mathematically tangible. If a random experiment should not allow any “determination”, i.e. if all n outcomes were equally probable, then one could not determine at all, mathematically speaking it would be an error, as will become clear later. But one can, at least theoretically, do it with every practically conducted experiment, as explained above. So the question is not whether one can commit oneself, but only how much one can commit oneself. But the amount of possible commitment may even depend on the quality of the predictor, if we are talking about unknown and unverifiable, non-repeatable events. One sees…
We again have a small problem of timing, which we simply ignore again benevolently, or rather restrict. It would be mean, of course, to see a drop falling, so to speak, and say at that moment “the probability is 99% that it will rain.” So we restrict – because we are very serious and even want to give people an aid for their daily planning — so we work “true to nature” as real meteorologists –, and say that we want to give our estimate for the period until 8 p.m. at the latest in the morning. Will it or will it not rain?
Now there are some farmers who look at the clouds the night before, then there are statistics over the last year or even the last 10 years. And there is a date we want to predict. The statistics for October tell us that there were 26% rainy days. On top of that, we consult our intuition or the farmers who say that yesterday there was a cloudy sky indicating rain, we tap our barometer again conscientiously and we throw a number into the room: “On today’s day (between 5 a.m. and 8 p.m.) there will be 45% rain (a minimum amount of 1 cubic millimetre) (at that place).” In the evening we state that it really has rained and we virtually pat ourselves on the back. (The 50% man, however, laughs at us, because he was better. But that’s just by the way).
We repeat this experiment. Yet “the experiment” is only a subsequent day, which in turn has completely different preconditions compared to the previous day. We can put our accumulated experience into it for this day, or even use the event “it rained the day before” as a basis. But this day, for which we have now predicted, is over. There is no repetition of this day on which we could verify our result in the long term. We are working “without a net and a double bottom”, so to speak. There is no chance to correct a misjudgement with the help of statistics. One prediction, one event, one result. Now comes the next one. What can we say now? First of all, we approach the problem practically, according to the requirements. We simply take the next day and forecast a bold 65%. It won’t rain. Why should it?
We repeat the experiment 100 times. We take 100 days on which we use yearly statistics, year-round statistics or even an intuitive forecast based on the empirical values of sky observations, cloud or wind movements, low pressure areas or whatever (I was once in Rendsburg, summery, bathing weather, with a friend, at a lake, when we overheard a conversation between two older men. One of them: “Meun kleunä Fingä, wenn dähr schmäärzt, dann kommtn Gewiddä. So Mountäch, Dienstäch kommtn Gewiddä. That’s what I’m saying.”). Good, then we have 100 days together. Now we want to tease out a result.
Two things still need to be considered:
a) In principle, the definition means the favourite event that you have decided on or can decide on. On one day this can be the event “it is raining”, on another day the event “it is not raining”. The closer you get to 100% with the favourite event, the higher the commitment, mathematically speaking. But intuitively it is also congruent.
b) When forecasting rain probabilities, it is conceivable that the possible determination comes much closer to 100%. This would happen by the weather observing stations getting a better and better grip on the parameters responsible for the rain and, just to give an example, today the possible determination is 70% and in 100 years it might reach 85%. This experiment could possibly still do that. Today, we have not yet teased out everything from the relevant parameters that they make available to us. However, this statement is fundamentally in contrast to football. There, it is conceivable that even the best assessment and knowledge of all possible parameters will not allow a higher determination. The reason for this: The game is a form of gambling. The playing strengths of the teams cannot be so different as to determine a winner more clearly. So it is simply not possible to fix it even higher.
The mathematician immediately discovered a simple method that has also been used for a long time. You add up all the probabilities and look at the total sum of the rainy days predicted in this way, because the sum represents nothing else (think about it for a moment? If it were to rain 50% of the time on each of two days, according to the forecast, then we would get a sum of 1, i.e. a forecast rainy day, and intuitively one would expect that, wouldn’t one? It’s like tossing a coin. Toss twice, on average once heads, once tails). Then we compare the sum of the predicted rainy days with the sum of the rainy days that actually occurred. And let’s assume that despite our two small “misjudgements” (who knows?) we expected 24.3 rainy days in the end (sum of the probabilities) and it really rained on 25 days. Then we pat ourselves on the back again and this time not even we alone: the mathematician joins in. “Well done.”
Here now, however, a few concerns: if one were to compare only the sum of the predicted hits with the number that occurred, then I would make it easy for myself next time: I would simply take the long-term average, or even that of the period under consideration, and forecast it permanently and for each day. My result would possibly even be better in total, although I have made no effort at all. I have taken an average value, relied on it, then “dangled my legs” and sat back, let the other prophet or prophets sweat over their figures, and I am also proven right with a certain form of statistics? It may be that I was a little lucky, but nevertheless I achieve a “reasonable value” without any effort. This intuitive reasoning can, of course, be made more concrete and later also be recorded with numbers. More concretely, it would be like this: If I write down a terse, unchecked “25%, as always in November” on a day, not taking into account the clouds and the falling barometer, and one of the experts writes down a 65%, taking these parameters into account, then he is almost certainly better in his estimate for that day. Surely he should get some kind of reward for that, some mathematical recognition? And he gets. Wait a minute…
I would rely on my ridiculous form of “non-forecast” or “average forecast” that on another day, equally sweating but already better than me again, he reduces his forecast to 3% chance of rain, while I come up with the clumsy “25%” again. He’s better again, but I remind him: “Well, that’s how it all evens out.” I make mistake after mistake and in the end I’m ahead?
There must be a way to express this superiority. And there is. I will first briefly introduce the method here and then show a few examples with concrete figures.
The determination is calculated analogously to a normal expected value. One multiplies the individual possible outcomes by their probability of occurrence. Recall that to determine the expected value when rolling a die, one multiplies the outcomes by their probability of occurrence. So since we can roll any number to (assume; we ignore the above concerns) 1/6, we simultaneously multiply and add 1/61 + 1/62 + 1/63 +1/64 + 1/65 + 1/66. This gives 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 21/6 = 3.5. And note again that the expected value will only be reached as a long-term average, but never in a single roll. There is no 3.5 on the dice. Yet we expect this eye sum, we can’t help it.
Here is a numerical example of this, quite practical, as I pick up a die:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
NUMERICAL EXAMPLE OF THROWING DICE
So the analogy now for the example of the probability of rain: We calculate the “expected average probability”. And this corresponds to the “determination” that has long been intuitively described. The calculation method is the same as for the dice. The statement we get is worth thinking about for a while and also worth checking. I maintain that this method has not yet been used in mathematics. The reader may decide on its meaning.
We multiply out by analogy and add up. Since we have here only one forecast each day and one execution of the experiment under the given circumstances, I must give a fictitious number. Let’s say we estimate the number to be 25% rain, 75% non-rain. Then we get 25% * 25% + 75% * 75% or 0.25 * 0.25 + 0.75 * 0.75. That would be 0.0625 + 0.5625 = 0.625. So we would expect to get a probability of 62.5% as that of the event occurring. I’m sure this will become more understandable in a moment. I’ll just briefly mention here that not fixing would give a 0.5 * 0.5 + 0.5 * 0.5 = 0.25 + 0.25 = 0.5. And in this case, if one compares the expected probability with the one that occurred, there is always agreement. One had expected 50% and 50% also occurred. I didn’t commit myself (I couldn’t even commit myself) and the result is that I did well. I reached my expected value exactly. It could hardly be better? The question remains, however, whether another person, set to the same event, could possibly tickle out a determination after all.
In the case of 1/n or, in this case, 50% determination, I am talking about the “minimum determination”, namely none at all. Everything is equally probable, everything can happen, that would be the philosophical translation of this thought. However, and this is the work I am doing now, so to speak, I am also doing it by determining the possible determination inherent in the experiment as precisely as possible.
In order to explain the expected average probability, which ultimately measures the determination, I must explain the above-mentioned example a little more. If we have now made the assessment 25% rain, 75% no rain, then we have committed to 62.5%. What this means can once again be seen best in the example. And – oh wonder – as always, the illustration works best if we remove the false bottom and once again look at an ideal case that does not exist in practice. So if you really had a probability distribution of 75% to 25% on the outcome of a random experiment, then this could be checked quite well.
Once again, it is worth making a small digression here: How do you generate a certain probability in a random experiment? I cannot mention often enough that it is only apparently exact, but nevertheless. We are theoretically looking for a split from a 1/4 chance to a 3/4 chance. How can we theoretically generate this? For any chance, no matter how absurd, I always immediately think of the example: a drum with red and white balls. You put in as many red balls as you want and then as many white balls as you want so that the ratio works out as exactly as possible. This way is quite reliable and – since it is only theoretical – also “feasible”. So, if you want to map a chance of 1/10 or 1/100 and achieve the best possible results in the experiment, in the first case put one red and 9 white ones in a drum, in the second one red and 99 white ones. Theoretically, the same questions arise as above (who has what intention? How is the experiment conducted? Can and does one side want to shift the odds? For example, the red one is sorted “all the way to the bottom”, he is guaranteed not to “find” it…). etc.), but we remain theoretical and that’s where it works.
So for the 1/4 chance we could put 3 white balls and one red ball in a drum, assure each other of absolute honesty and sincerity and start drawing. The event “a red ball is drawn” then has the probability 1/4. If we were to carry out the experiment sufficiently often under the given conditions, then every mathematician would rub his hands again: After 100 times, about 25 times “true” and 75 times “false” will come out when evaluating the event. And if we then – now I’m finally intervening – add up all the probabilities of the event occurring, then we get a total of 6250. Who doesn’t believe it?
A white ball was drawn 75 times. We had quantified the probability of this event – as best we could – with 75%. So in the ideal case that we really drew “white” 75 times in 100 attempts, we would have noted this 75 as the probability of the event occurring each time. This results in a total of 75 * 75 = 5625. 25 times red was drawn, so that 25 times a 25 appears in the column for the probability of the event occurring. That means we would have to add up 25 * 25 + 75 * 75 = 625 + 5625 in the summation. And that gives the predicted 6250. So on average our expectation of 62.5% would have occurred as the probability of the event occurring. And that is nothing other than what an expected value promises us.
The fact that the mathematician’s usual test method is just as effective here has so far prevented my method from gaining any practical significance at all, or rather it has stood in the way of its being discovered at all. The mathematician conventionally checks the number of predicted hits and would get a very simple result: I expected to draw red 25 times and black 75 times. That is what happened. Why check any further?
I have a rationale for further checking, and have already provided it above: In practice, we forecast non-repeatable events, all of which have an unknown and unprovable, untestable probability of occurrence. Since, of course, as also mentioned above, every experiment carried out in practice is one, any event would therefore, in practical terms, lend itself to testing predictors against each other.
So if someone claimed to be able to tell from a dicer’s throwing axis which number will occur with which altered probability, he can work with the method given. He always has new preconditions, with every throw, and is allowed to give his assessment again each time. After several rolls, one can then put his prediction – which in this case deviates forcibly from the equal distribution – “through its paces”. Has he achieved the determination he assumed in the long run? I will give examples later. The only thing that is certain is that with 1/n, i.e. when rolling 1/6 for each number, one has no or the minimum determination. The more certain one becomes with a number, with a prediction, the higher the value of the determination becomes, as a measurable quantity, via the sum of the squares of the individual probabilities (with 75 : 25 we have also only squared the individual probabilities and added them up).
5) Forecasting sporting events, especially football matches
The events I have done for a living over the years have been football matches. I decided on a particular fix for each match, simply by giving the individual probabilities for 1 – X – 2. If a single match, such as Bayern – Bochum, allows a higher fix, then be my guest. If one estimates the victory of Bayern as 80% (but who doesn’t remember the 3:3 last season?), then note an 80% for Bayern victory. One will certainly achieve a higher value in the determination than the one who notes 70% for victory. You are welcome to check that: The higher the chance of the favourite event, the higher the sum of the squares of the individual probabilities. 80% — 15% — 5% is 0.80.8 + 0.150.15 + 0.050.05 = 0.64 + 0.025 + 0.0025 = 0.665 (thanks, Excel!). At 70% — 20% — 10% it becomes 0.70.7 + 0.20.2 + 0.10.1 = 0.49 + 0.04 + 0.01 = 0.54. And 0.54 is really considerably smaller than 0.665. Whoever is closer to reality in the long run will be ahead later. And not only financially.
But if you take a balanced game, which in principle allows “no commitment” — the lottery player speaks of a “three-way game” — then you should not commit yourself either. All outcomes are more or less equally likely, you don’t know any better, the game is simply balanced, then you should also note down these probabilities and not try to predict something that you can’t do. Mathematically, such behaviour would be punished with the help of this method.