As usual with competition/pot sims, I used ClubElo's expected goals formula to run 10000 simulations.

First the average points in the group stage:

France 6.880 Switzerland 4.193 Romania 3.557 Albania 2.086 ------------------------- England 5.958 Russia 3.963 Slovakia 3.645 Wales 3.014 ------------------------- Germany 6.841 Ukraine 4.201 Poland 3.752 Northern Ireland 1.942 ------------------------- Spain 5.877 Croatia 3.780 Turkey 3.642 Czech Republic 3.208 ------------------------- Belgium 5.081 Italy 4.589 Republic of Ireland 3.514 Sweden 3.273 ------------------------- Portugal 5.746 Austria 4.420 Hungary 3.297 Iceland 3.067

Eliminated in the group stage:

72.37% - Northern Ireland

69.7% - Albania

50.66% - Wales

49.48% - Iceland

47.1% - Czech Republic

46.04% - Hungary

45.41% - Sweden

41.24% - Republic of Ireland

38.88% - Turkey

38.53% - Slovakia

37.78% - Romania

36.72% - Croatia

34.1% - Poland

33.65% - Russia

26.86% - Switzerland

25.84% - Austria

25.78% - Ukraine

24.43% - Italy

18.17% - Belgium

11.04% - Portugal

10.35% - Spain

9.27% - England

3.41% - France

3.19% - Germany

Eliminated in the Round of 16:

40.4% - Austria

39.73% - Switzerland

38.7% - Ukraine

37.54% - Poland

37.44% - Romania

37.11% - Slovakia

37% - Russia

36.92% - Portugal

36.57% - Italy

36.46% - Republic of Ireland

36.35% - Belgium

35.86% - Sweden

35.76% - Croatia

35.04% - Iceland

34.8% - Hungary

34.78% - Turkey

32.93% - Wales

32.68% - Czech Republic

30.44% - England

27.24% - Spain

23.14% - Albania

21.91% - Northern Ireland

21.13% - France

20.07% - Germany

Eliminated in the quarterfinals:

25.37% - England

25.16% - Portugal

23.23% - Germany

23.16% - Spain

21.74% - Belgium

21.34% - Italy

20.88% - Ukraine

20.79% - Austria

19.13% - Russia

18.85% - France

18.5% - Switzerland

17.59% - Poland

16.35% - Slovakia

15.66% - Croatia

15.25% - Romania

15.09% - Turkey

13.11% - Hungary

12.72% - Republic of Ireland

12.03% - Czech Republic

11.72% - Sweden

11.29% - Wales

11.17% - Iceland

5.57% - Albania

4.3% - Northern Ireland

Eliminated in the semifinals:

21.72% - France

19.25% - Germany

16.12% - England

14.95% - Spain

13.31% - Portugal

11.36% - Belgium

10.01% - Italy

8.37% - Ukraine

8.16% - Switzerland

8.13% - Austria

7.49% - Poland

6.77% - Turkey

6.67% - Croatia

6.49% - Russia

6.4% - Republic of Ireland

5.75% - Slovakia

5.63% - Romania

5.16% - Czech Republic

4.43% - Hungary

4.41% - Sweden

4.13% - Wales

3.08% - Iceland

1.11% - Albania

1.1% - Northern Ireland

Losing finalist:

12.89% - France

11.78% - Germany

11.1% - Spain

9.61% - England

7.57% - Portugal

6.26% - Belgium

4.66% - Switzerland

4.47% - Italy

4.2% - Ukraine

3.09% - Croatia

3.02% - Austria

2.98% - Romania

2.55% - Turkey

2.47% - Poland

2.33% - Russia

2.06% - Republic of Ireland

2.06% - Czech Republic

1.74% - Sweden

1.55% - Slovakia

1.19% - Hungary

1.03% - Iceland

0.76% - Wales

0.42% - Albania

0.21% - Northern Ireland

Winner:

22.48% - Germany

22% - France

13.2% - Spain

9.19% - England

6.12% - Belgium

6% - Portugal

3.18% - Italy

2.1% - Croatia

2.09% - Switzerland

2.07% - Ukraine

1.93% - Turkey

1.82% - Austria

1.4% - Russia

1.12% - Republic of Ireland

0.97% - Czech Republic

0.92% - Romania

0.86% - Sweden

0.81% - Poland

0.71% - Slovakia

0.43% - Hungary

0.23% - Wales

0.2% - Iceland

0.11% - Northern Ireland

0.06% - Albania

About me:

*Christian, husband, father x 3, programmer (with CodeSoftware.Net since 2013) and Covenant Eyes user. You can find me on Twitter (@FootballRanks) and/or LinkedIn. More info in the Contact / Questions page.*

Very interesting! Thanks for sharing.

ReplyDeleteI'm not sure I agree with England being 4th favorites, as the case can be made that Italy, Belgium, and Portugal are all better teams. But of course the luck of the draw & knockout round bracket structure play a huge role in this.

I have a question that is not connected to this post, but I don't know where else to ask.

ReplyDeleteSo I was wondering if Ed, Edgar and you other guys could help me. There is a club X that for 6-7 years has had unusually bad performance against club Y in a national league. I'm talking about 0W 1D 15L type of performance. Club Y is of much better quality than club X, but X is still in the upper middle class of that league, so such a bad record is not something that would normally be expected. Clubs of lower quality than X had much better performance against Y. So I would like to measure how much this record deviates from what would've normally been expected. How can I calculate and/or simulate what percentage of points would've been normal (expected) for club X to obtain against club Y over the last 6-7 years?

Hey nogomet, do you wanna try to mathematically prove an existing match-fixing scheme in 1. HNL :D

ReplyDeleteBut serious. What you need is the expected chance that team X wins/draws/loses against team Y, given the strength of both teams at the moment the match is played. The simplest indicator for this chance is the win-expectancy calculated with the elo-ratings of teams. For NT-football you have the well known elo rating. For domestic league club football you have an equivalent ClubElo rating.

In both systems the win-expectancy for the home team is calculated using a relatively simple formula which takes into account the ratingsdifference between both teams at the time the match is played and a home field advantage factor. It results in a number between 0 and 1 which indicates the probability that the home team wins. For NT-football I've established boundaries for this win-expectancy (We) based on an extensive sample of matches in my database:

if We < 0.391 then the home team loses;

if We >= 0.391 and We <= 0.609 then it's a draw;

if We > 0.609 then the home team wins.

I use these boundaries to predict the results of scheduled NT-matches and subsequently predict future FIFA rankings.

I'm not really familiar with the clubelo system. I see that it calculates the same home team win-expectancy, although the home field advantage factor seems a bit more complicated to calculate than in NT-elo. So 'all you need' is clubelo ratings of the involved teams at the time the matches you are interested in, are played. Then you can determine the expected result of each match and compare that with the realised outcome of the match. With a sample of league matches over a substantial period of time, you should be able to make some sort of sound conclusions regarding a club structurally over- or underperforming against one other club.

It is a challenging calculation exercise, but I would be very interested in your conclusions.... Of course only if you also give the names of the teams you're investigating :)

Thanks Ed. I would like to analyse all matchups in a certain league over the last 7 years and see whether some matchups stand out and substantially deviate from what would've normally been expected given the relative strengths of concerned clubs. ClubElo publishes these probabilities for each match going back many years, so these data are not a problem to gather. What is a problem for me, since I'm not that good in calculus and probabilities, is combine all these individual probabilities for each individual match into an estimated expected number of points for each league matchup over the analyzed period. I have a strong suspicion about something, and I would like to prove it mathematically and write a paper about it. But I need help in calculus. I can give you all the details over email if you're interested and maybe we can write a paper together.

DeleteAs it happens I'm sort of an expert in handling and statistically analyzing big data-sets. I would like to help you in this particular casus.

ReplyDeleteIf you like you can send an empty e-mail to Edgar (see his contactpage for his e-mail). He'll forward it to me and then I will contact you. Sorry for the work around, but I rather not give my e-mail address in public.

Sounds good.

DeleteHi,

ReplyDeleteAre you just simulating a result or actual scorelines? I've been modelling the Euro's using Poisson to determine probabilities of each result within a game and then a random number to determine the result. 10k sims. The offensive and defensive exG 'power rankings' for each game I have implied based on market odds [goal seeked so that result (not scoreline) for each game is equal to the vig free market odds]. However, the results I'm getting are quite far off what I would expect from an overall perspective (not enough wins for big favourites). Your link for the exG above doesn't work so interested to know how you are turning elo into results. For my purposes I require actual scorelines.

Thanks

Sorry - when I say overall results, I mean tournament wins. I'm happy with the win %'s for group games (agreed to market) and the knockout rounds seem reasonable.

ReplyDeleteActual scorelines. I've changed the link with a working one. The clubelo site has been updated.

ReplyDeleteThanks Ed - how have you amended those formulae for matched played on a neutral field? And have you given France the full benefits of a home field advantage?

ReplyDeleteYes, and yes, and I'm not Ed, although he is a good old chap!

DeleteHaha oh OK, sorry! How have you amended the formulae for neutral field?

DeleteTo complicate things, I will answer you :)

ReplyDeleteThe actual scorelines are simulated, based on a probability distribution for goals scored for each team in a match (see Edgar's link for an explanation). This probability distribution is dependent on the elo win expectancy for each team in the match. In this elo win expectancy a home field advantage of 100 points is incorporated (see here for an explanation of the elo ratings).

When a match is played on a neutral field the home field advantage is just not added for the home team. And yes, France enjoys in the simulations for the coming EUROs the full home field advantage factor.

Thanks a lot Ed & Edgar! I'm sure I'll be back with more questions once I've had a chance to play around with this. I'm very interested to see how it compares to my model, which, when market lines are applied, should give an indication of where the market deviates from elo. I'm also interested to see which is a better historical predictor of results but one step at a time!

ReplyDeleteThe link states that the expected number of goals for each team is:

ReplyDeleteGoals for the Home team:

if Proba < 0.5: Home Goals = 0.2 + 1.1*sqrt(Proba/0.5)

else: Home Goals = 1.69 / (1.12*sqrt(2 -Proba/0.5)+0.18)

Goals for the Away team:

if Proba < 0.8: Away goals = -0.96 + 1/(0.1+0.44*sqrt((Proba+0.1)/0.9))

else: Away goals = 0.72*sqrt((1 - Proba)/0.3)+0.3

"Proba" is the Probability (Winning Expectancy) from the Elo Formula, ranging from 0 to 1, "math.sqrt" is the square root.

I think I need the neutral field version of this formula that I need for Poisson, unless I'm missing something!

The formulas clubelo gives calculate the expected mean of goals scored by the home team (meanH) and the expected mean of goals scored by the away team in a match (meanA).

ReplyDeleteThe number of goals the home team scores is poisson distributed with Lambda equal to meanH, so the chance the home team scores 0 goals = meanH^0*EXP^(-meanH)/FAC(0), scores 1 goal = meanH^1*EXP^(-meanH)/FAC(1), scores 2 goals = meanH^2*EXP^(-meanH)/FAC(2) etc.

The same for the away team: the chance the away team scores 0 goals = meanA^0*EXP^(-meanA)/FAC(0), scores 1 goal = meanA^1*EXP^(-meanA)/FAC(1), scores 2 goals = meanA^2*EXP^(-meanA)/FAC(2) etc.

btw: ^: power, EXP: Euler's number (2,71828...) and FAC: factorial.

The expected mean is only dependent on Proba, the elo win expectancy for the home team. Now when a match is played on a neutral field there is no home team, so the elo win expectancy of the 'home' team (or the first mentioned team if you like) is then calculated without the home field advantage factor of 100 points.

Example: team1 and team2 play a match on the field of team1; Elo rating team1 = 1368; Elo rating team2 = 1537

Elo win expectancy is 0,402 (home field advantage for team1 included)

meanH = 1,184; meanA = 1,377

Probability that team1 scores

0 goals = 0,306

1 goal = 0,362

2 goals = 0,215

3 goals = 0,085

4 goals = 0,025

etc.

Probability that team2 scores

0 goals = 0,252

1 goal = 0,347

2 goals = 0,239

3 goals = 0,110

4 goals = 0,038

etc.

If you sum all probabilities that team1 scores more goals than team2 then you will find that team1 has 32,2% chance to win. Sum all probabilities that team2 scores more goals than team1 and you will see that team2 has 41,3% chance to win. There's a 26,5% chance that it will end in a draw.

So if the same match is played on a neutral field, the elo win expectancy for team1 is only 0,274 (home field advantage for team1 no longer included). meanH = 1,008; meanA = 1,657. After the same set of calculations you will find that team1 now has 23,0% chance to win, team2 has 52,6% chance to win and there's a 24,4% chance the teams tie.

Thanks Ed, I think that I understand all of that but are wenot still using different formulae to calculate the meanH and meanA even though the match is played on a neutral field?

ReplyDeleteTo use your example, let's start with a scenario where team A has HFA:

Team A has an elo of 1368+100=1468

Team B has an elo of 1537

Therefore Team A win expectancy is 40.2%, as you mention.

In order to then calculate meanH and meanA as you have above, we use two different formulae:

For the Home team: =IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18))

For the Away team: =IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3)

T4 = 40.2%.

I then create a Poisson distribution as you have, and I get the same results.

Now, if we want to calculate meanH and meanA for the same match on a neutral field:

Team A has an elo of 1368

Team B has an elo of 1537

Therefore Team A win expectancy is 27.4%, as you mention.

In order to then calculate meanH and meanA as you have above, we use two different formulae:

For the Home team: =IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18))

For the Away team: =IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3)

T4 = 27.4%.

I then create a Poisson distribution as you have, and I get the same results.

However, this methodology will yield different results on a neutral field depending on which team we classify as being the 'home' team as we are still applying a slight home field advantage to team A by using a different formula. For example, if we swap the two teams around:

Team A has an elo of 1537

Team B has an elo of 1368

Therefore Team A win expectancy is 72.6% (fine so far)

In order to then calculate meanH and meanA as you have above, we use two different formulae:

For the Home team: =IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18))

For the Away team: =IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3)

T4 = 72.6%.

This yields the following %'s:

Team A win: 54.2%

Draw: 24.3%

Team B win: 21.5%

Compared to the figures calculated with team B as the favourite on a neutral field (as you did), you can see that the win % of the favourite has further increased, from 52.6% to 54.2%.

Apologies if I have misunderstood anything.

Absolutely no need to apologize. You got a valid point here.

ReplyDeleteAs one would expect the elo win expectancies are completely opposite when changing the order of team1 and team2 (1 - 27,4% =) 72,6%, still the clubelo formula favours the 'home'/first mentioned team slightly with regard to mean scored goals and thus win percentages. Well researched and discovered. Thanks Anonymous !

So a warning with regard to using the clubelo formulas is appropriate: for matches on neutral ground it matters which team is mentioned first for the winpercentages of each team. And that's counter-intuitive. Effectively the clubelo mean goals formula can't be used for matches on neutral ground, only as an indication. Luckily almost all NT-matches for official competitions are played on a true home/away basis.

Time to contact the boy(s) and/or girl(s) at clubelo. Edgar, what is your point of view on this and do you have a good contact at clubelo ?

For now, I'm just using the mean of the two as follows:

ReplyDeleteTeam A: Elo 1575 Win % 23.2

Team B: Elo 1783 Win % 76.8

meanA_1:

=IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18)) = 0.949

meanA_2:

=IF(T5<0.8,-0.96+(1/(0.1+(0.44*SQRT((T5+0.1)/0.9)))),0.72*SQRT((1-T5)/0.3)+0.3) = 0.919

=average(meanA_1, meanA_2)

= 0.934

meanB_1:

=IF(T5<0.5,0.2+(1.1*(SQRT(T5/0.5))),1.69/(1.12*(SQRT(2-(T5/0.5)))+0.18)) = 1.792

meanB_2:

=IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3) = 1.763

=average(meanB_1, meanB_2)

= 1.778

I'm sure this isn't accurate but it should provide a decent fix for now. In order to get an accurate formula I guess we would have to:

A. Use only results from matches played on a neutral field which severely limits sample size and probably isn't a good idea:

B. Estimate the effect of HFA (I think the clubelo guys have already done this) and effectively remove this effect from the curves

Anyway, time to run some sims!

T4 = 23.2%

DeleteT5 = 76.8%

When I contacted Lars Schiefler (owner of clubelo.com) in April 2013 about the formula, I also asked about neutral venue games. This was his answer:

ReplyDeletegood question. At the moment I take team 1 as home team and team 2 as

away team and set the home field advantage for that match to 0.

This is not optimal as the curves for home and away goals are not symmetrical.

However, there are so few neutral ground games in club football that

it does not matter too much for my purpose.

Nevertheless, I will come up with something more sound in the future

including neutral ground matches. For the moment I suggest you just

mirror the 2 curves one on another and take the average.

And that's what I've been using for neutral venue games.

Good to know they've suggested taking the same approach as me. Thanks for the update

ReplyDeleteFinally found the time to identify a formula based on national team matches using polynomial regression (least squares method). The coefficient of determination was higher than that of the clubelo formula, so I'll be using it from now on for simulations.

ReplyDeletehi edgar,

ReplyDeletewould it be able for you, to filter out only the runs, that are matching the current results and to have a new estmation on the fourthcoming of excusevely these runs?

No, Marko, sorry. I don't keep the "path" to certain simulation outcomes.

Delete