Valve accidentally gives us an unique and accurate look at Steam numbers.

[Office building, interior]

A lone game developer/journalist/gaming enthusiast/researcher sits at their desk.

Protagonist

“Ugh, why can’t I get more reliable data! I need to know how many people are playing ‘Roman Legionnaire Battle Royal‘ so I can read the market/write this article/argue with people on Reddit! Why, Valve, why isn’t your API showing me the exact numbers! Why am I forced to rely on approximations, guesses, deductions, and percentages that are rounded up at the 2nd decimal!?”

The protagonist stares at the screen in front of them and begins sobbing.

[Transition]
[Valve’s Office, Gabe Newell’s room, interior]

Gabe Newell sits behind his desk, cleaning his knife-collection while enjoying his exclusive access to all the data Valve has collected.

Gabe

“Yes, yeeeeesss! I hold ALL THE INFORMATION! I, and I alone, knows how many people are playing Civilization VI, on what continent Counter-Strike sold the most copies, and how many people defeated Ornstein and Smough in Dark Souls Remastered. The knowledge is like nectar, and only I can access it. Truly, I have become the god of PC gaming-related data!”
Gabe sets aside his 14th century South-Indian Katar and reaches for a new knife to clean, but accidentally and unwittingly hits a key on his keyboard that grants his exclusive access to the data to everyone using the API.

[The End]

This is how users of the Steam API managed to temporarily get their hands on accurate data about all the games on Steam. Probably… Maybe…

You see, Valve has always been very protective of collected data on Steam and websites and services that used the Steam API relied on estimates and incomplete information. Steamspy, to name a popular example, had to randomly sample a (small) portion of all accounts, look at games owned, and deduce the total numbers from that. In other words: 350 people from the sample of 10000 own Sheep Herding Simulator 2016, which should mean that 3.5% of all Steam users have that game. This is already somewhat inaccurate, but to add more obscurity-salt to the vague-soup, the API rounds the numbers up to the 2nd decimal. And if you’re working with millions of users, those decimals become more and more important. But unfortunately, Valve recently made it harder for these services to access the data, but they promised to offer a new and better way to access and utilize data from the Steam API.

ArsTechnica made this graphic in 2014 using the old method.

On to the fun stuff

“But wait, Bram”, I can hear you think (don’t pay attention to the mind reader I’ve installed recently on your pc) “you mentioned that there was accurate data, the weird story at the beginning mentions Gabe Newell accidentally giving everyone access, and the title contains the word ‘leaky’. My amazing brain has added 1 + 1 and concluded that somehow the more accurate data got leaked!” Well done, dear reader, you are completely right! Recently, Valve accidentally didn’t restrict the public API to rounded up numbers, meaning that people could work out the numbers down to the individual user. Quick thinking minds quickly grabbed all the available data and stored it in databases, giving us a very detailed and accurate look on player numbers.

But on July the 4th, also known as “Teach the sky who’s boss using fireworks”-day in the USA, Valve plugged the hole and cut off this new method. And thereby throwing our advanced modern scientific method of reading numbers about people who managed to click the right thing at the right time in an audio-visual experience, all the way back to the early Medieval method of estimating things (and probably using some kind of horrific blood-sacrifice ritual. I don’t know, I’m just the writer here. Who knows what those programmers do!). However, Valve has promised to provide a more accurate replacement when they removed the old method, so hopefully this leak is a sign of things to come.

Spreadsheets for everyone!

So now, we can estimate the total unique players of every steam game with achievements. The ones without achievements don’t show up in the API because those aren’t tied to the user’s account: game ownership is a different dataset than achievements gained. The top three might not surprise you:

Team Fortress 2                                                          50,191,347

Counter-Strike: Global Offensive                            46,305,966

PLAYERUNKNOWN’S BATTLEGROUNDS        36,604,134

That’s a LOT of players shooting each other in the face! And if you keep scrolling down the list, you’ll see more and more games in which people shoot and/or stab each other in the face. Gamers are a violent bunch! Portal 2 at the 13th spot is the first game that doesn’t involve violence. It’s pretty safe to conclude that if you want your game to succeed it needs guns, a low price (or free) or made by Valve. So eh… get to it, developers!

Replaying History

But not everything’s guns and fantasy! VALUE-favourite Civilization V ranks at an impressive 14th rank and its successor Civilization VI sits at rank 90. Speaking of Civ6, have you watched our own Dr. Random and Ymir’s playthrough yet? Catch up on One More Turn to see their version of Persia develop through time, and learn interesting facts at the same time! You can find the archived videos on our Youtube page. But we already know that the Civilization series is a big player on the gaming market, what about other historically themed games? Are these games good and/or interesting enough to drag players away from Counter-Strike and PAYDAY 2? The Assassin’s Creed series is often mentioned in discussions about historicity that you might think it’s the only game series with a historic setting. And yet, they show up at 216 (Black Flag) and 326 (Origins), a far cry from the top. Keep in mind however that these are just Steam’s numbers: Ubisoft has its own client which might have more players that aren’t launching it through Steam. War Thunder, a WW2/Cold War air/tank/ship game is high on the list at rank 20, but the fact that it’s free-to-play might give it an unfair advantage. Besides, it’s historically accurate, but not historical: the tanks might have the correct guns and the planes might have the correct flightmodel, but it doesn’t do anything with history. Further down the list are WW2 shooters, Age of Empires II HD, some Total War games, Europa Universalis, Stronghold Crusader, and many others. But all are vastly outnumbered by the myriad of fantasy, modern, and sci-fi games. Space marines and dragons seem to be more popular than historically accurate depictions of Medieval Europe. But let’s go back to Civilization V for a second: the fact that it has 12.7 million(!!!) copies sold on Steam means that (most of) all those players have played a very educational (and fun) game that utilises history VERY well.

Arstechnica gives a nice overview of the leak, and gives a very long of games and their numbers on their website. 

And their other article from 2014 showcases what you can do with all the data, such as total play time per user per game and number of unplayed games. Bear in mind that this data is 4 years old, but it’s fun to see how they calculated this data back then and how much the numbers have changed. Head on over here for that.

If you want to read more about the mathematics behind the list, you might want to read Tyler Glaiel’s post on his blog.