Saturday, May 30, 2015

Twisted Treeline Hexakill Statistics (2015)

I gathered 95k Twisted Treeline Hexakill games in May 29th to May 30th, 2015.

1. Popularity

Champion Popularity
Ekko 62.8%
Lux 38.4%
Ezreal 37.5%
Yasuo 30.7%
Teemo 29.4%
Ziggs 28.7%
Nidalee 28.2%
Ashe 26.4%
Jinx 21.3%
Fiora 21.0%
Bard 20.7%
Master Yi 20.4%
Blitzcrank 18.7%
Shaco 17.3%
Zed 16.5%
Wukong 16.4%
LeBlanc 15.9%
Jayce 15.5%
Katarina 15.2%
Riven 15.0%
Malphite 14.6%
Sona 14.6%
Viktor 14.0%
Talon 13.5%
Cho'Gath 13.3%
Fiddlesticks 13.3%
Morgana 13.2%
Azir 13.0%
Garen 12.9%
Darius 12.8%
Annie 12.8%
Fizz 12.7%
Heimerdinger 12.6%
Ahri 11.9%
Kog'Maw 11.7%
Vayne 11.7%
Vel'Koz 11.3%
Gnar 11.0%
Karma 11.0%
Veigar 10.9%
Orianna 10.7%
Lee Sin 10.5%
Karthus 10.4%
Gangplank 10.4%
Sion 10.3%
Vladimir 9.8%
Rengar 9.2%
Brand 8.9%
Akali 8.8%
Amumu 8.6%
Xin Zhao 8.2%
Rumble 8.1%
Sejuani 8.1%
Xerath 8.1%
Caitlyn 7.9%
Thresh 7.9%
Hecarim 7.6%
Varus 7.4%
Nautilus 7.4%
Miss Fortune 7.2%
Kalista 6.9%
Diana 6.9%
Tryndamere 6.6%
Singed 6.6%
Maokai 6.5%
Volibear 6.5%
Twisted Fate 6.3%
Kha'Zix 6.3%
Leona 6.3%
Pantheon 6.3%
Malzahar 6.2%
Alistar 6.2%
Cassiopeia 6.1%
Jax 6.1%
Gragas 6.1%
Kennen 6.0%
Sivir 5.9%
Lucian 5.9%
Mordekaiser 5.8%
Soraka 5.8%
Twitch 5.6%
Zyra 5.5%
Jarvan IV 5.2%
Nunu 4.8%
Irelia 4.6%
Tristana 4.6%
Swain 4.6%
Galio 4.6%
Vi 4.6%
Dr. Mundo 4.3%
Graves 4.0%
Lissandra 4.0%
Anivia 4.0%
Draven 3.8%
Rek'Sai 3.7%
Renekton 3.7%
Nami 3.6%
Lulu 3.5%
Shyvana 3.5%
Ryze 3.4%
Zac 3.4%
Zilean 3.3%
Nasus 3.3%
Janna 3.2%
Kayle 3.1%
Poppy 3.1%
Olaf 3.1%
Braum 3.0%
Rammus 2.9%
Warwick 2.9%
Udyr 2.9%
Shen 2.9%
Kassadin 2.8%
Elise 2.6%
Evelynn 2.6%
Quinn 2.5%
Aatrox 2.5%
Syndra 2.2%
Urgot 2.1%
Corki 2.1%
Trundle 2.1%
Yorick 1.8%
Taric 1.7%
Nocturne 1.5%
Skarner 1.0%

2. Win Rate

Only in non-mirror matchups.

Champion Win Rate
Katarina 58.2%
Wukong 56.8%
Sion 56.6%
Lux 56.5%
Soraka 56.3%
Zyra 56.3%
Xerath 56.3%
Janna 56.0%
Sona 55.6%
Vel'Koz 55.2%
Galio 55.2%
Singed 54.7%
Leona 54.5%
Karthus 54.1%
Poppy 54.0%
Swain 54.0%
Talon 53.9%
Morgana 53.9%
Maokai 53.8%
Ziggs 53.8%
Taric 53.4%
Xin Zhao 53.3%
Fiddlesticks 53.3%
Malzahar 53.0%
Diana 53.0%
Nami 52.8%
Rammus 52.7%
Vladimir 52.6%
Kog'Maw 52.4%
Brand 52.3%
Master Yi 52.3%
Braum 52.3%
Karma 52.1%
Alistar 52.0%
Nautilus 51.9%
Cho'Gath 51.9%
Amumu 51.6%
Sejuani 51.5%
Anivia 51.4%
Fiora 51.3%
Ahri 51.3%
Heimerdinger 51.3%
Jarvan IV 51.3%
Shaco 51.1%
Trundle 51.0%
Annie 50.8%
Teemo 50.8%
Mordekaiser 50.7%
Pantheon 50.7%
Varus 50.5%
Nocturne 50.5%
Blitzcrank 50.5%
Irelia 50.4%
Jinx 50.3%
Zac 50.3%
Miss Fortune 50.1%
Garen 50.1%
Graves 50.1%
Ashe 50.0%
Caitlyn 49.9%
Rumble 49.8%
Darius 49.8%
Skarner 49.6%
Sivir 49.4%
Malphite 49.4%
Yorick 49.2%
Dr. Mundo 49.1%
Ryze 49.1%
Olaf 49.1%
Renekton 48.9%
Gangplank 48.9%
Zilean 48.9%
Hecarim 48.8%
Ekko 48.7%
Volibear 48.7%
Yasuo 48.7%
Lissandra 48.5%
Shen 48.5%
Vi 48.5%
Rengar 48.4%
Warwick 48.4%
Orianna 48.3%
Quinn 48.2%
Lucian 48.1%
Ezreal 48.1%
Gragas 48.0%
Kennen 47.9%
Fizz 47.7%
Lulu 47.7%
Draven 47.7%
Corki 47.5%
Jax 47.5%
Kassadin 47.3%
Aatrox 47.2%
Nunu 47.2%
Veigar 47.2%
Urgot 47.1%
Vayne 47.1%
Twitch 47.0%
Jayce 47.0%
Shyvana 46.8%
LeBlanc 46.8%
Tryndamere 46.8%
Twisted Fate 46.8%
Thresh 46.8%
Kayle 46.6%
Udyr 46.6%
Evelynn 46.5%
Tristana 46.4%
Cassiopeia 46.3%
Nidalee 46.3%
Rek'Sai 46.0%
Riven 45.7%
Nasus 45.6%
Syndra 45.2%
Gnar 45.1%
Viktor 45.1%
Kha'Zix 45.1%
Lee Sin 45.0%
Kalista 44.8%
Akali 44.8%
Zed 44.4%
Azir 42.6%
Bard 39.7%
Elise 37.5%

The statistics from last year can be found here

Note that one of the biggest differences from last year is the introduction of bans. This has presumably curbed the popularity of Katarina (73% -> 15%).

Saturday, May 23, 2015

Chance of Winning on Blue Side Is Much Higher for Shorter Games on the Summoners' Rift?

It seems that the chance of winning on blue side is a lot higher for shorter games than for longer games on the Summoners' Rift. It's curious to speculate why this is the case.

Blue Side Win Rate vs Game Length, Gold-level NA Ranked Solo Queue on Patch 5.7

For those who are unaware, it was discovered around this time last year that the blue side (bottom side) was winning 55% of the time on Summoners' Rift. At the time, there were quite a bit of speculations on why this was happening and several theories were proposed. At the end of the day, it seemed that the issue was almost certainly due to the camera angle. In June of 2014, Riot decided to change the matchmaking algorithm to put the higher Elo team on the purple side,

One thing I missed from last year was that this "Blue Side Advantage" seems to be stronger for shorter games. Even after the matchmaking adjustments, blue side seems to excel in shorter games -winning 53.5% of games between 20 to 30 minutes. However, for longer games (>40 minutes), blue side's chance of winning drops below 50%. The similar effect can be detected in old Season 4 games before Riot's matchmaking adjustments that made the stronger team on the purple side; therefore, it is unlikely that the pattern is solely caused by stronger players being better at later stages of the game.

Blue Side Win Rate vs Game Length, Gold-level NA Ranked Solo Queue on Patch 4.7 (before Riot's matchmaking changes)

My personal opinion on the matter is as follows. It is possible that being on the blue side gives a sizable advantage in the earlier stages of the game due to favourable camera angle. As the time goes on, the effect diminishes and the advantage of playing on blue side decreases. 

Assuming my hypothesis above is true, and also assume that the same logic for solo queue data can be applied to competitive games, does this mean that a professional team playing on the purple side should play the game differently to compensate for their early game disadvantage? Does this mean that purple side should consider picking champions that are stronger in the early game in order to compensate for their early game disadvantage? This is just a hypothesis, of course, and it is really difficult to answer in either direction. Nevertheless, it is something interesting to think about. 

Monday, May 11, 2015

Comparing Viewer Base of Popular LoL Streamers Using Hierarchical Clustering

Have you wondered how League of Legends streamers are "similar" or "different" by their audience? How does Dyrus's viewer base different from Froggen's?

The following diagram showcases the similarities and differences for some popular LoL streamers based on their viewer data. Streamers that are closer to each other on this tree diagram are far more likely to be watched by the same viewer.

Similar to what I did for Nemesis Draft, I used hierarchical clustering (with the "average" metric, in case you care about the details) to analyze 100 popular streamer using their registered viewers data from April 27th to May 2nd, 2015. I have also outlined 10 major clusters/groups of these streamers.

In case you are curious if your favourite streamer is on the graph, here's the full list.

Some quick observations:

1. There are some very obvious clusters by the language line. It's easy to see that the first box from the top consists of Chinese streamers and the second box consists of Turkish streamers and Riot Games' official Turkish channel. This is unsurprising, since a viewer who understands Turkish is far more likely to watch any of  the Turkish streamers than the Chinese ones.

2. English, non-professional-player streamers (box box, cowsep, nightblu3, wingsofdeaths) have slightly different viewership base from the professional player streamers such as dyrus and wildturtle. European streamers (froggen, cyanide, rekkles) also capture slightly different audience from the rest of the English streamers.

3. The last cluster consist of nicktron, kneecoleslaw, kaceytron, destiny et al.) are interesting. I have not watched all of these streams - but to my best understanding many of these are not very high elo players; however, they are entertaining by their own ways.

Why should you care about this graph:

1. As a viewer, you can use this graph as a reference to find your next favourite stream. For example, according to this diagram, Doublelift and Sneaky have similar audiences. So, if you enjoy Doublelift's stream, maybe you should also give Sneaky a shot - there is ample amount of statistical evidence that supports this.

2. As a streamer, you can use this kinds of graphs to gauge your audience. For example, if you are Doublelift and you know your audience is similar to Sneaky's, it might be interesting to stream in different hours, or start streaming right around Sneaky shutting down. While I am sure Doublelift does not need any additional viewers, this may be helpful for smaller streamers who are just starting to stream more seriously.

I would like to sincerely thank @brettfarrow who taught me how to capture viewer data via Twitch API. This analysis would not be possible without his assistance. If you like Twitch statistics you should definitely follow him on Twitter.

Sunday, May 10, 2015

AFK / 4v5 Games - An Analysis Using the Kaplan Meier Estimator

This is a common complaint on reddit. So let's talk about "4v5". I will demonstrate how often "4v5" occurs and the chance of winning a "4v5". I will also show the probability of a player reconnecting once he has been AFK for a significant amount of time.

The analysis will employ a commonly used statistical model in biomedical research - the Kaplan-Meier Estimator - the details of which I will provide at the end.

I am going to define "4v5" strictly as a game where one player chose a champion but could not immediately enter the game. This causes the player to be "AFK" and idle within the game as soon as the game starts, creating a scenario which one team has only four  players. Anyone with at least some experience with the game knows that "4v5" is an uphill battle since the numerical advantage of the opposing team is very difficult to overcome.

Detecting this kind of scenario is very simple using the Riot API. Since nearly all players will start the game by buying some starting items, the time required for a player to connect (or not connect at all) can be measured simply by the time which a player buys his first item. While this is not the perfect solution, this is about the best we can do using the data that are available.

I want to be very clear that this study is only about players who are "AFK" from the beginning of the game. It does not consider scenarios which a player plays for (say) 15 minutes, then decides to leave the game for any reason.

1. How Often Can You Win a 4v5? How Big Is the 4v5 Problem?

First, let's address the effect of 4v5. Unfortunately, if a player takes a long time to connect, the effect on the team is detrimental - even if he eventually connects to the game. Using 220k Silver-level ranked solo queue games played on Patch 5.7 on the NA server, I find the following (if you are interested in other Elo brackets, click here):

Time Needed to Connect Win Rate Occurrence
0 - 1 minutes 50.12% 95.617%
1 - 2 minutes 49.14% 3.605%
2 - 5 minutes 43.12% 0.552%
5 - 10 minutes 35.55% 0.117%
10 - 15 minutes 29.69% 0.030%
15 - 20 minutes 24.05% 0.011%
20 - 25 minutes 27.78% 0.004%
25+ minutes 41.18% 0.003%
never connects 11.55% 0.060%

As the table demonstrates, in a pure 4v5 scenario where a player never connects, the chance of his team winning the game is around 11.6%. This is somewhat expected, since 4v5 is an uphill battle. But perhaps more surprisingly is when a player connects late: if a player connects within 5 to 10 minutes after the game starts, the chance of him winning is only about 35.3%

Additionally, we see that a "pure" 4v5 where one player never connects is actually pretty rare - at 0.06% per player per game. This translates to about only about 0.6% chance of seeing one such case per game since there are 10 players in a game. In most cases, the player will reconnect at certain point of the game - provided the game doesn't finish quickly enough. Once the missing player does connect, the outlook of the game recovers somewhat. On the other hand, it can be shown that if it takes at least 5 minutes to connect (including not connecting at all) for a player to connect, the team of that player has only an overall 27.3% chance of winning, which is still very low.

The take-home message here is as follows. A "pure" 4v5 is fairly rare; however, even a late connect (5+ minutes) can be detrimental to the team. Therefore, the study on the 4v5 should focus on how much time a player needs to connect to the game.

I will try to answer two questions in the sections below:

1. Are there any differences in the frequency of AFK in terms of time needed to reconnect between different Elo brackets?

2. Suppose you are currently in a game and your teammate has been AFK for 15 minutes since the beginning of the game. What is the probability of him coming back in the next 5 minutes?

2. AFK Length and Elo brackets

While the analysis above used Silver-level games, it is interesting to compare the occurrence of AFK with other Elo brackets. As it turns out, it seems that Bronze-level players are far more often to AFK than players of other Elo brackets, as the following table will demonstrate:

AFK Length(min)5+10+15+20+25+30+35+40+

For example, we see that 0.417% of Bronze players will take at least 5 minutes after game starts to connect (keep in mind this includes players who never connects to the game). On the other hand, for Silver players this probability is only 0.225%, about half as often as for Bronze players. Overall, we see that as we move up in Elo brackets, AFK frequency decreases dramatically.

The following is the Kaplan-Meier Estimator (Curve) used to compute the table above.

3. My Teammate Didn't Connect. Is He Coming Back?

Say you are 15 minutes into a game and one of your teammates  remain disconnected. What is the chance of him coming back in the next 5 minutes? It turns out this can be fairly easily calculated.

Time Already Spent AFK (minutes) Probability of Returning in the Next 5 Minutes
5 50.8%
10 25.5%
15 14.5%
20 11.0%
25 6.1%
These probabilities are surprisingly similar across Elo brackets.

Therefore, when a player has already disconnected for 15 minutes, the chance of him returning in the next 5 minutes is only 14.5%. With this in mind, it is my opinion that allowing players to surrender in 15 minutes instead of 20 in a 4v5 game is reasonable and can be a good thing for the game.

4. The Kaplan-Meier Estimator

As mention above, the focus of this study should be on the time needed for a player to connect to a game. There is a small problem with this, however: if, say, a particular player needs 25 minutes to reconnect, but the game finishes in only 20 minutes, then through the data available it is impossible to pinpoint the exact time needed for re-connection.

Essentially, the data at hands have issues with censoring: we are interested in time needed for re-connection, but a lot of the data we are interested in are missing due to games finishing too fast. To overcome this difficulty, the Kaplan-Meier Estimator is used for the analysis of the data in Section 2 and 3.

The Kaplan Meier Estimator is a non-parametric statistic often used to study time-to-event data with right censoring. In the usual context of biomedical sciences, it's often used for survival analysis where the event in "time-to-event" is death. For our purposes, however, the event of interest isn't "death" - it's when a player connects to the game.

1.  KM Curve with 95% CI

Data is retrieved using the Riot API.