Predicting OverWatch™ Match Outcomes with 90% Accuracy

Photo by Riho Kroll on Unsplash

Predicting OverWatch™ Match Outcomes with 90% Accuracy

In 2020, Philip Bubsy implemented the Weng-Lin paper (Weng, 2011) as openskill.js. The OpenSkill project was successful in predicting match outcomes with up to 54% accuracy. Fascinated by the efficiency of the project, I recently ported the project to python and subsequently wrote a prediction function that yielded a much better accuracy of up to 90%.

We are going to use the Python implementation of OpenSkill to make the predictions, but the math will work for any TrueSkill™ implementation as well. The dataset we are using is also exactly the same as that used by the JavaScript implementation of OpenSkill. Finally it is required you understand at least the general aspects of the Weng-Lin paper.

From the JavaScript version's author:

For team games, a dataset of 61867 matches from OverTrack.gg was loaded into memory. Most matches represent a team of 5 vs. 5, with no possibility for ties. Since tracking matches is volutary, a not insignificant percentage of matches (7.82%) are played against a team of unrated/unknown players. The average number of matchs per unique player is about 2.338, however a decent number of players were tracked for over a hundred matches (Busby, 2020).

Note: Some people have pointed out that OverWatch™ matches are not 5 vs 5. The actual benchmarks are on matches of 6 vs 6. This is a typo in the original source.

The original author used a regular check with the \(\mu\) values. If the value was higher then the team won. We are going to use this equation (Ibstedt, 2019) instead create a general prediction function:

\[ p(p_1 - p_2 \geq \varepsilon) = \frac{\Phi(\mu_1 - \mu_2 - \varepsilon)}{\sqrt{2\beta^2 + \sigma_1^2 + \sigma_2^2}}\] (1)

Where \(\varepsilon\) is the draw margin, \(\Phi\) is the model's distribution specific cumulative distribution function and \(\beta^2\) is the performance variance.This equation can only be used to predict 1v1 matches.

Let's generalize this function to be a bit more useful. Before doing that, we need to way to get the rating of a team. The paper (Weng, 2011) that OpenSkill is based on denotes this as:

\[\theta_i = \sum_{j=1}^{n_i} \theta_{ij}\]

Where \(\theta_i\) is the strength of team \(i\) and \(\theta_{ij}\) is the strength of player \(j\) in team \(i\).

The next step is to construct the set of all pairwise subsets of (1). We will denote this set as \(S\). If \(k<2\) where \(k\) is the number of teams, then we simply return \(S\), else we define a new variable \(h = \frac{|S|}{2}\).

Given \(h\), we use (2) to get a complete probabilities:


Algorithm 6: Prediction function where \(k>2\)


6.1 Let \(P\) be the probabilities.

6.2 \(\text{while $|S| > 0$}\)

\(Q = \{S_0\}\)

\(s = \sum_{q=0}^{h - 1} Q \)

\(m = \prod_{q=1}^{k-2} k - q\)

\(P.append(\frac{s}{m})\)

(2)

The sum of \(P\) will always be very close to 1.

This is what the equivalent python code looks like:

if k > 2:
    S = deque(S_list)
    P = []
    h = len(S) / k
    while len(S) > 0:
        s = sum([S.popleft() for q in range(int(h))])
        m = k
        for q in range(1, k - 2):
            m *= k - q
        P.append(s / m)
    return P

Here is how the actual implementation behaves:

>>> from openskill import predict_win
>>> a1 = Rating() 
>>> a2 = Rating(mu=33.564, sigma=1.123)
>>> predictions = predict_win(teams=[[a1], [a2]])
>>> predictions
[0.45110901512761536, 0.5488909848723846]
>>> sum(predictions)
1.0

The actual openskill.py benchmarks can be found here. Hopefully both TrueSkill and all OpenSkill implementations benefit from this function.


References

  1. Busby, P. (2020, June 3). Openskill vs TrueSkill implementations. Philihp Busby Blog Posts. Retrieved February 12, 2022, from philihp.com/2020/openskill.html

  2. Ibstedt, J., Rådahl, E., Turesson, E., & Voorde, M.V. (2019). Application and Further Development of TrueSkill™ Ranking in Sports.

  3. Weng, R. , & Lin, C. . (2011). A Bayesian Approximation Method for Online Ranking. 12,, 267-300.

Did you find this article valuable?

Support Vivek Joshy by becoming a sponsor. Any amount is appreciated!