First of all, I don't actually have data on the exact number of wins when a Season 4 Gold player returns to Gold. All I have are snapshots of the entire ladder ranking at certain points in time - usually one snapshot per month. In other words, my data looks somewhat like this:

In this case, we see that this player was Platinum 5 on January 11th (Season 4) with 255 wins. On March 1st (Season 5), he is Gold 1 with 40 wins. 5 days later he is Platinum 5 with 52 wins. This means that this player managed to retain his Platinum rank within 40 to 52 wins.

It's not always possible to find this range for each player. Sometimes, the left hand end point does not exist because the player was never seen in a rank lower than his Season 4 rank. For example, for this player below:

He is a Platinum player in Season 4 and was never seen (by me) to be lower than Platinum in Season 5. Therefore, he retains his Platinum rank within 0 to 43 wins.

On the other hand, some players do not have the right hand end point because he simply haven't played enough games. For example:

This player is Platinum in Season 4, but only has 5 wins and currently in Silver in Season 5. Therefore, the number of wins he needs to retain Platinum is between 5 to infinity.

Overall, we see that the data I have gathered are far less than ideal. There are several reasons for this:

1. It's not possible to track more than one million players on a game-by-game basis since the Riot API and I both have limited bandwidth.

2. Even if it is possible to do this (I technically can since I have a production key now), it takes too much effort and space to store and manage the data.

3. The ranking data from the Riot API is actually slightly "bugged"; as far as I understand, due to some deep level architectural design, the data pulled can behave in unexpected ways.

That being said,

__I am a strong believer that statisticians should always be ready to work with less-than-ideal data__, since "ideal data" does not require statistics to analyze. To this end, it's fairly easy to see that this data can be analyzed using an interval-censoring model - which is exactly what I have done here. I do need to assume that the censoring is independent of the time-to-event, which is probably not exactly true since many players stop playing after achieving Gold 5; however, it is an assumption which I am personally comfortable with.
There were more problems, however. To my best knowledge, R has a package for interval censored data called interval(see the published paper for this package here). Unfortunately, this package seems to be written with survival analysis in mind and works very slowly with large amount of data. The fact that it uses bootstrapped CI makes CI computation all but impossible for my purpose. It is probably possible to remedy this situation by avoiding bootstrapping (my impression after some cursory reading on interval-censored data is that it doesn't need to be bootstrap), but it may take a lot of time and effort. Therefore, CI is not plotted on the diagrams.

## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.