Interest and brainstorming for a results-based analytical ranking system

DoubleDrewski · Feb 1, 2024

In a thread earlier today, I discussed the potential of an alternative to divs. You can read that thread for specifics, but basically People love LUTI for the measuring stick of divisions, but dislike the gaps between the seasons (which, as discussed there, cannot be narrowed for well-established practical reasons). Anyways, the whole thread has me wondering if, perhaps, divs could be supplanted by a system based on results in reputable tournaments, with those results weighted heavier based on how impactful and/or reputable the tournament is. I mentioned how I was drawing inspiration from complex sports statistics, which aim to combine many stats into a single, meaningful one.

Sorry if this seems very barebones, but it was suggested I make this thread for potential ideas and/or to gauge interest in the concept

Heres the skeleton I've devised thus far:

Take a glicko-2-esque system, where results against teams at varying degrees above and below your level are more or less impactful on your score

For a given team compile every result from qualifying tournaments and take 3 data points:

The "reputability" of the tournament (Take the IPL as gospel, whereas small tournament results are relatively untrustworthy)
The placement of the team in that tournament
the skill level (quantified) of that placement. This could either be derived by a TO, or, with time, by the average rating of the teams in that tournament (if enough data were available)

From there, the formula would more or less go:

(new score) = (previous score) + (reputability) * ( [Skill value] - [previous score] )

This is very, very rudimentary, but every idea has to start somewhere, right? In the near future (this weekend at the latest) I'd like to see if I could do a test run by scraping data from sendou, but input about any part of the idea (or how I may go about efficiently scraping data specifically, I know python) would be appreciated. Commentary/criticism welcome, or even just a "this seems cool and I'd be interested in seeing it come to fruition." Thanks for hearing me out!

TL;DR I have an idea for a formula based concept that could be more universal than LUTI divs. Whaddya think?

Sambews · Feb 1, 2024

That sounds super interesting! I don't know that it needs to exist, but life is lame if you only invent the things you need. At the very least it'd probably help take some weight of the LUTI people's shoulders. You could potentially use LUTI div as a stand in for skill until you have enough data to stand on its own.

DoubleDrewski · Feb 1, 2024

Sambews said:
That sounds super interesting! I don't know that it needs to exist, but life is lame if you only invent the things you need. At the very least it'd probably help take some weight of the LUTI people's shoulders. You could potentially use LUTI div as a stand in for skill until you have enough data to stand on its own.

Part of the reason I came up with the idea was in the previously mentioned thread, a former LUTI TO stated that they explicitly do not endorse the use of divs as measuring sticks for skill level. So I thought, “if not divs, then what?”

tocapa · Feb 1, 2024

Believe me, I've thought a lot about ranking systems, Splat-adjacent and otherwise. While I think the idea is sound, there are a lot of obstacles in the way to any sort of rating system. The widespread use of pickups, substitutes, and alternates etc. raises questions of what constitutes a "team" for the purposes of ranking. Which is complicated by the fact that the most used tournament platforms either lack player data at all (Challonge) or lack fine-grained data on what players are actually playing (Battlefy).

Now that sendou.ink has SendouQ and direct tournament hosting, I think the sendou.ink ratings have a good chance of becoming a standard for ranking/seeding teams. At least given my understanding that tournaments hosted on sendou.ink count towards SendouQ ratings. I have concerns but it seems like any other solution would either lack a lot of valuable data or require a lot of work.

Gh0ul1sh · Feb 1, 2024

DoubleDrewski said:
In a thread earlier today, I discussed the potential of an alternative to divs. You can read that thread for specifics, but basically People love LUTI for the measuring stick of divisions, but dislike the gaps between the seasons (which, as discussed there, cannot be narrowed for well-established practical reasons). Anyways, the whole thread has me wondering if, perhaps, divs could be supplanted by a system based on results in reputable tournaments, with those results weighted heavier based on how impactful and/or reputable the tournament is. I mentioned how I was drawing inspiration from complex sports statistics, which aim to combine many stats into a single, meaningful one.

Sorry if this seems very barebones, but it was suggested I make this thread for potential ideas and/or to gauge interest in the concept

Heres the skeleton I've devised thus far:

Take a glicko-2-esque system, where results against teams at varying degrees above and below your level are more or less impactful on your score

For a given team compile every result from qualifying tournaments and take 3 data points:

The "reputability" of the tournament (Take the IPL as gospel, whereas small tournament results are relatively untrustworthy)

The placement of the team in that tournament

the skill level (quantified) of that placement. This could either be derived by a TO, or, with time, by the average rating of the teams in that tournament (if enough data were available)

From there, the formula would more or less go:

(new score) = (previous score) + (reputability) * ( [Skill value] - [previous score] )

This is very, very rudimentary, but every idea has to start somewhere, right? In the near future (this weekend at the latest) I'd like to see if I could do a test run by scraping data from sendou, but input about any part of the idea (or how I may go about efficiently scraping data specifically, I know python) would be appreciated. Commentary/criticism welcome, or even just a "this seems cool and I'd be interested in seeing it come to fruition." Thanks for hearing me out!

TL;DR I have an idea for a formula based concept that could be more universal than LUTI divs. Whaddya think?

I had a very similar idea to this, it would be so much more beneficial than LUTI divs for gauging skill level. You could also have the players for the teams have a log of their team history and those skill scores so when a new team or pickup forms and is put in the system it can automatically give them a rating.
I know it can be difficult to get some of the data from the sites used by some tournaments for human done seeding so I wonder if that’ll be a major road block for you too

DoubleDrewski · Feb 1, 2024

tocapa said:
Believe me, I've thought a lot about ranking systems, Splat-adjacent and otherwise. While I think the idea is sound, there are a lot of obstacles in the way to any sort of rating system. The widespread use of pickups, substitutes, and alternates etc. raises questions of what constitutes a "team" for the purposes of ranking. Which is complicated by the fact that the most used tournament platforms either lack player data at all (Challonge) or lack fine-grained data on what players are actually playing (Battlefy).

Now that sendou.ink has SendouQ and direct tournament hosting, I think the sendou.ink ratings have a good chance of becoming a standard for ranking/seeding teams. At least given my understanding that tournaments hosted on sendou.ink count towards SendouQ ratings. I have concerns but it seems like any other solution would either lack a lot of valuable data or require a lot of work.

The system I’m thinking of would be teams specific, with pickups being a case-by-case basis. Pickups and the like are actually a problem I have with SendouQ: the scores are much more oriented to the individual than the team. Also, it only ranks based on Sendou results, which aren’t necessarily reflective of a team’s tournament performance.

Using already established results would be ideal for a ranking system. It’s how Divs currently function (or at least how they’re perceived) and it’s what I’d want a new system to replicate. Honestly, the only major roadblock to the creation (not necessarily its acceptance) is a database of tournament results. Battlefy may suffice, but in a large scale idk how I could scrape results. I’ve thought about perhaps allowing voluntary input, but then nobody would ever put a bad result into the system. Perhaps an inflationary system where inaction is a net negative could resolve that but I haven’t thought that far just yet.

step 1 will be testing the formula in the first place. If anyone would like to help with any part of that please lmk.

Joy · Feb 1, 2024

I have a lot of thoughts on this but it requires a much greater amount of infrastructure to make doable. I can go into the dirty details, but it’s very math heavy. The long and short of it is, this is just a very hard problem and someone will always be unhappy with the results. SendouQ’s rating system that’s underpinning it (OpenSkill) is actually performing extremely well (I’m the guy analyzing the system health from a math perspective) but I have my doubts it can extend to the structured team-based rating system I’m interested in developing.

Gh0ul1sh · Feb 1, 2024

DoubleDrewski said:
The system I’m thinking of would be teams specific, with pickups being a case-by-case basis. Pickups and the like are actually a problem I have with SendouQ: the scores are much more oriented to the individual than the team. Also, it only ranks based on Sendou results, which aren’t necessarily reflective of a team’s tournament performance.

Using already established results would be ideal for a ranking system. It’s how Divs currently function (or at least how they’re perceived) and it’s what I’d want a new system to replicate. Honestly, the only major roadblock to the creation (not necessarily its acceptance) is a database of tournament results. Battlefy may suffice, but in a large scale idk how I could scrape results. I’ve thought about perhaps allowing voluntary input, but then nobody would ever put a bad result into the system. Perhaps an inflationary system where inaction is a net negative could resolve that but I haven’t thought that far just yet.

step 1 will be testing the formula in the first place. If anyone would like to help with any part of that please lmk.

You could maybe have like a list of teams for manual input so if team B puts a result against team A it would subtract the points from team A and also add points to team B. The main issue I think could come up would be people faking any manual results or teams just not being registered in the system.

DoubleDrewski · Feb 1, 2024

Joy said:
I have a lot of thoughts on this but it requires a much greater amount of infrastructure to make doable. I can go into the dirty details, but it’s very math heavy. The long and short of it is, this is just a very hard problem and someone will always be unhappy with the results. SendouQ’s rating system that’s underpinning it (OpenSkill) is actually performing extremely well (I’m the guy analyzing the system health from a math perspective) but I have my doubts it can extend to the structured team-based rating system I’m interested in developing.

If you have any interest in further discussion, I’m very open. Math heavy is the goal, not an obstacle for me ;).

as for the “someone will always be unhappy” thats why I like the idea of an open source system. It’s hard to get blindly mad at an algorithm.

DoubleDrewski · Feb 1, 2024

Gh0ul1sh said:
You could maybe have like a list of teams for manual input so if team B puts a result against team A it would subtract the points from team A and also add points to team B. The main issue I think could come up would be people faking any manual results or teams just not being registered in the system.

We more or less have that now and it’s SendouQ. If you want a skill levels assigned from scrim results sendou has you covered. But tournament results? That’s what I’m mainly discussing here.

Gh0ul1sh · Feb 1, 2024

We more or less have that now and it’s SendouQ. If you want a skill levels assigned from scrim results sendou has you covered. But tournament results? That’s what I’m mainly discussing here.

I’m assuming this is about my other post bc this reply makes more sense for it, and yeah no it is similar to sendou, what I’m thinking would probably just be easier to have a sendouQ ranking on a teams profile to gauge their collective skill, or also a way to input tournament results into it or something to get a more accurate ranking, as I know many low levels like my self do not enjoy sendouQ

I personally feel like what you want to do is kinda similar, but I do get that it’s more focused on team results as a whole and not individually

DoubleDrewski · Feb 1, 2024

Gh0ul1sh said:
I’m assuming this is about my other post bc this reply makes more sense for it, and yeah no it is similar to sendou, what I’m thinking would probably just be easier to have a sendouQ ranking on a teams profile to gauge their collective skill, or also a way to input tournament results into it or something to get a more accurate ranking, as I know many low levels like my self do not enjoy sendouQ

I love your idea and if it ever comes to fruition I will absolutely love it

Apologies, I probably should’ve been more clear. What I meant was a system where “Team A wins and gets points, team B loses and loses points” already kind of exists. I’m talking more so on a macro level of “yeah you lost that set but won everything else in that tournament so it’s wasn’t really relevant” kinda way, you know what I mean?

But I absolutely agree SendouQ is borderline unusable for lower levels. Waiting in the queue for a team anything less than gold is a fool’s errand. Maybe my whole system would be redundant if A. SendouQ participation increased and B. Team participation in sendouQ specifically increased. But even if that were to happen, yeah it would be nice to have a way to input tournament results. Maybe using a similar system to how the badge rewards are given? Especially since some tournaments are starting to be run through sendou.ink

joe??? · Feb 1, 2024

@Roundabout did this in Splatoon 2 and despite how good it was it received more hate than support lol. Doubt anyone will attempt it again. But maybe as more tournaments get hosted on sendou.ink SP can become more representative of tourney results and become a soft version of this.

DoubleDrewski · Feb 1, 2024

joe??? said:
@Roundabout did this in Splatoon 2 and despite how good it was it received more hate than support lol. Doubt anyone will attempt it again. But maybe as more tournaments get hosted on sendou.ink SP can become more representative of tourney results and become a soft version of this.

Tbh I started this whole concept totally forgetting that sendou was integrating tournaments. If that ecosystem does end up happening and thriving, then I’d be totally down for SP as a standard measuring system. Only issue is how they weigh Q results to tournament results, and how they determine the weight of those results.

Alphine_Agnitio · Feb 1, 2024

While the concept on paper is sound imo, the logistical challenges sound like a nightmare that most teams would be unwilling to go though. I think sendouq integration is the easiest way though.

DoubleDrewski · Feb 1, 2024

Alphine_Agnitio said:
While the concept on paper is sound imo, the logistical challenges sound like a nightmare that most teams would be unwilling to go though.

Hence why I brought up the potential of automation. People like LUTI divs because, for at least a time, they’re comprehensive. But the whole system involves human input for results and human judging for the placement. If sendou does become the premiere tournament site, then results would be fairly easy to compile without human intervention, I’d imagine. The only debates would be how it’s calculated, but I imagine those debates could be a lot more fruitful than complaining about what div you got months ago.

Joy · Feb 1, 2024

DoubleDrewski said:
If you have any interest in further discussion, I’m very open. Math heavy is the goal, not an obstacle for me ;).

as for the “someone will always be unhappy” thats why I like the idea of an open source system. It’s hard to get blindly mad at an algorithm.

No, people will still be mad at it. I often answer concerns and complaints about sendouq on the helpdesk, and even knowing how the sausage is made people don’t like it. It’s not just a mathematical/systems issue, but also one of player perception and perspective. Both sides of the coin are equally valid.

That said, the idea I have is generally representing all matches as a multidigraph. Every player is a node, with an unweighted connection from the loser of a match to the winner. The edge contains important information, namely the date, tournament ID, and tournament stage index (the first round of a tournament would have an index of 0, the second an index of 1, …). When a match is completed, you create an edge from each node of the losing team to each node of the winning team. What I’m unsure about is also an extra node representing the team itself, but more on that later.

Ranking is done via offline ranking algorithm, such as PageRank. It doesn’t natively support multidigraphs, so we’ll have to consolidate the edges into a weighted edge. To do this, we take the edge age into account (decreasing influence the older the edge is), the tournament stage index (without this step, players who advance further will be weighted more heavily), and potentially tournament weight (matching the ID with total entrants, possibly).

Once edges are consolidated, the algorithm is run. There won’t be a visible “score” associated with it, since it’s not interpretable and just serves as a way to rank players. The way teams will be evaluated is as the [sum/mean/median] (I haven’t tested this yet so the aggregate measure is still up in the air), then ranked once more. The extra team node would perhaps account for any team synergies or anti-synergies, but I’m still not sure how useful it might be.

This has the major benefit of ranking players with low amount of representation (say a JP team with three entries that smashed all the NA teams) much more accurately than with an online ranking algorithm like OpenSkill.

Still needs work, primarily on the infrastructure underlying said system.

Gh0ul1sh · Feb 1, 2024

DoubleDrewski said:
Apologies, I probably should’ve been more clear. What I meant was a system where “Team A wins and gets points, team B loses and loses points” already kind of exists. I’m talking more so on a macro level of “yeah you lost that set but won everything else in that tournament so it’s wasn’t really relevant” kinda way, you know what I mean?

But I absolutely agree SendouQ is borderline unusable for lower levels. Waiting in the queue for a team anything less than gold is a fool’s errand. Maybe my whole system would be redundant if A. SendouQ participation increased and B. Team participation in sendouQ specifically increased. But even if that were to happen, yeah it would be nice to have a way to input tournament results. Maybe using a similar system to how the badge rewards are given? Especially since some tournaments are starting to be run through sendou.ink

Oooooooooooooh okay yeah that makes sense, an issue that I think people would have with that would be sometimes a tournament’s turnout makes getting gold bracket or placing in one of the others easier than in other weeks, but to counter that I guess you could look at the number of sets played and rank the value of the win against that, if that makes sense? (Or is already what your talking about lol) bc my team got like 2nd in a bronze bracket we didn’t deserve bc there were very few teams that signed up

DoubleDrewski · Feb 1, 2024

Joy said:
No, people will still be mad at it. I often answer concerns and complaints about sendouq on the helpdesk, and even knowing how the sausage is made people don’t like it. It’s not just a mathematical/systems issue, but also one of player perception and perspective. Both sides of the coin are equally valid.

That said, the idea I have is generally representing all matches as a multidigraph. Every player is a node, with an unweighted connection from the loser of a match to the winner. The edge contains important information, namely the date, tournament ID, and tournament stage index (the first round of a tournament would have an index of 0, the second an index of 1, …). When a match is completed, you create an edge from each node of the losing team to each node of the winning team. What I’m unsure about is also an extra node representing the team itself, but more on that later.

Ranking is done via offline ranking algorithm, such as PageRank. It doesn’t natively support multidigraphs, so we’ll have to consolidate the edges into a weighted edge. To do this, we take the edge age into account (decreasing influence the older the edge is), the tournament stage index (without this step, players who advance further will be weighted more heavily), and potentially tournament weight (matching the ID with total entrants, possibly).

Once edges are consolidated, the algorithm is run. There won’t be a visible “score” associated with it, since it’s not interpretable and just serves as a way to rank players. The way teams will be evaluated is as the [sum/mean/median] (I haven’t tested this yet so the aggregate measure is still up in the air), then ranked once more. The extra team node would perhaps account for any team synergies or anti-synergies, but I’m still not sure how useful it might be.

This has the major benefit of ranking players with low amount of representation (say a JP team with three entries that smashed all the NA teams) much more accurately than with an online ranking algorithm like OpenSkill.

Still needs work, primarily on the infrastructure underlying said system.

That is so much more complex than what I originally outlined but holy cow I love that. The issue I saw with sendou was that it felt too individual oriented, but that extra node being able to account for the changes to players when on a team is invaluable imo. The whole idea of treating players as nodes and gathering meaningful data from the edges sounds amazing.

Maybe it’s niche but if you ever felt inclined to share on the development or decision making process for the system as it’s created, you’d at least have an audience of 1 lol.

GreGuru · Feb 2, 2024

As Joe said, Roundy used to publish a ranked list of teams updated monthly. My memory is fuzzy, so maybe I’m wrong, but I don’t think he was using a ton of math so much as sizing up results like how college football teams are ranked. The problems he ran into are why I think a SendouQ based system is better with regards to teams being a sum of their players.

For one, some teams just didn’t play. He required a minimum amount of tournaments to make the leaderboard and a lot of teams especially at the end of Splatoon 2’s life didn’t make that. So people would constantly ask “where’s X team?”. But can you really move a team up or down based on less than 2 sets? Should there be some decay if they don’t play at all? If FTWin loses to no one and then doesn’t play a tournament in 2 months, is it fair to say they aren’t still number 1?

Secondly, people hide results. Everyone posts the good stuff on Twitter, but ignore the bad. Plenty of “teams” are pickup groups that consist of mostly the same people, but are under a different name every tournament which makes any automation difficult.

Finally, this is a game of pickups and unless we score players on an individual level, how do we fairly score pickups? Does a win against Jackpot not count at all if they have a sub? Even if that sub is Kiver? What if there are two subs?

It makes the situation murky because for every team that replaces one of their actual members with a sub of equivalent skill level, you have the opposite end of the spectrum too. Maybe I’m someone who consistently makes silver bracket of SOS every week, but this week I’m playing with two irl friends who play the game casually, but this is their first tournament. Is it a huge underperformance and black mark against me if the team only makes bronze bracket this week? You definitely don’t want to create a system that discourages people from playing with a broader range of players.

Of course there are other complications like how do you handle or recognize mid tournament subs after the fact, how do you count tournaments with nonstandard rule sets, are all wins treated equally (is me beating you 3-0 the same as me beating you 2-1 and you had a DC in game 1?).

I don’t want to take the wind out of your sails, just relaying how complicated this actually is.

Interest and brainstorming for a results-based analytical ranking system

Senior Squid

Inkster Jr.

Senior Squid

Inkling

Inkster Jr.

Senior Squid

Data Nerd

Inkster Jr.

Senior Squid

Senior Squid

Inkster Jr.

Senior Squid

Full Squid

Senior Squid

Trans Bird Trans Bird 🏳‍⚧️

Senior Squid

Data Nerd

Inkster Jr.

Senior Squid

Inkling

Users who are viewing this thread

Information

Network