An analysis of elo in Showdown!
When we talk about climbing the ladder, whether in Showdown! or on cart, what we generally mean is having a highly ranked team. Both Showdown! and cart VGC use a widely-adopted ranking system known as Elo. I think most players are at least familiar in passing with this rating system, so I won’t go into its history right now because I want to get to the question at hand which is does this rating mean anything in Pokemon?
The purpose of elo is to quantify how much better one player is than another. Specifically, elo is designed to be able to produce the odds or probability that one player will beat another. There’s certainly reason to be skeptical that elo would work in Pokemon. In a game like chess (which elo was developed for), there is basically zero chance that a player like me would ever beat Magnus Carlsen. However, it’s not impossible to imagine a world where Cybertron VGC would lose to an inferior player because hax, bad rng, or even just normal variability in the game.
First let’s look at the data. Here I have about 500 matches in Showdown! where I have recorded the difference in elo between me and my opponent. As we can see in the distribution, it seems that the matchmaking algorithm in Showdown! is probably looking for an opponent within 100 points difference. This means we will only be able to evaluate the difference in this range, which will probably make elo look a little worse (imagine for example, a 1000-rated player vs a 1900-rated player).
The next thing we can do is look at the elo difference vs the actual win rate. Here I use a smooth plot to show the win rate with confidence interval.
At first glance, this may look bad. What we would want to see is a monotonic relationship at least, meaning the lower the difference, the lower the chance of winning. However, remember back from our histogram that we have very few datapoints beyond -100 and 100, so I would argue that in the region where it matters, there does seem to be a consistent relationship. I wouldn’t read much into what’s going on in the tails of this distribution.
Remember we said that elo difference can be converted to a win probability? Let’s do that and then check if the actual results empirically validate those theoretical probabilities:
I actually think this looks pretty decent in the 75-point range (going from 40-60%). So while elo may not be perfect, I think it’s probably a good system overall. Its ease of implementation and universality make a good option. However, players should recognize just how much variation exists within this game and not take minor differences in elo too seriously.
Some would like to see the graph excluding the ends of the distribution. Here it is.
For attribution, please cite this work as
melondonkey (2022, April 6). Pokemon Analysis: Does Elo Mean Anything?. Retrieved from https://pokemon-data-analysis.netlify.app/posts/2022-04-06-does-elo-mean-anything/
BibTeX citation
@misc{melondonkey2022does, author = {melondonkey, }, title = {Pokemon Analysis: Does Elo Mean Anything?}, url = {https://pokemon-data-analysis.netlify.app/posts/2022-04-06-does-elo-mean-anything/}, year = {2022} }