Accelerating tabletop game design with Python
Exploring the game design questions Python can (and can't) help us answer when making a new board game or TTRPG.
Last week we explored how player count can impact if a game is strategic or tactical, using In Too Deep as an example.
This week we are looking at how Python can help with game design… and where it can’t.
Eleventh Beast
In Eleventh Beast, you are a monster hunter in 18th century London. As a member of a secret and ancient order, you know that the Beast shows up every 13 years… but you aren’t sure exactly when. So you must collect rumors, analyze information, and prepare for the hunt.
The core game loop is pretty simple:
Roll dice and add rumors (or the Beast) to the map.
The Beast moves closer.
Take two actions: Move, Investigate, Verify, or Hunt
Update your notebook.
Still alive? Begin the next day.
I wanted to flip the usual idea of TTRPG combat on its head. Instead of kicking open the door and taking turns swiping at each other, I wanted the game to focus on the preparation for combat. It is careful planning and research that ensures victory, not brute force.
The inevitable combat only happens at the end of the game. It’s intentionally brief:
Gain d6 for each Ward or Weapon researched and gained during the game.
Roll the dice and use the single lowest value to determine the outcome (slay the beast, take a wound, or take two wounds). At three wounds you die.
Continue if necessary for another round to win or lose.
Design questions
We have a variable-size dice pool using the lowest value with success on a 1 or 2. This raises some interesting questions:
What is the chance of Hunt success given N dice rolled?
How many dice are required to have a reasonable chance of success?
Should there be a maximum dice pool size?
Should success be only on a 1 or 2?
How do we cap the chance of success around a 90% win rate?
We have a few ways to answer these questions…
Possible solutions
The two common solutions are:
Extensive playtesting: Either as the designer or with a group, play the game over and over again. This must be done tens or hundreds of times.
Math and statistics: Use advanced math and statistical tests and methods to determine the most likely results — factorials, combinations, and permutations.
But a third way exists. If you, like me, don’t have easy access to a large pool of playtesters and you can’t remember what the null hypothesis is for a t-test… you can use Python.
This method really clicked for me after watching Jake Vanderplas’ PyCon 2016 "Statistics for Hackers” presentation. He made a compelling case that while in the past we had to rely on potentially complex and ambiguous statistical test and methods, we now have access to powerful computers. Using numerical simulations (e.g. Monte Carlo method) we can make quick work of some of the above questions.
Using Python to answer design questions
Step 1 is to create a model of the game in Python. In any sufficiently complex game, it will be hard (if not impossible) to model the entire game — nor is that necessary. Instead we can usually take one tiny piece of the game’s mechanisms and simply model that part. In the case of Eleventh Beast, we can model just the combat dice rolls.
Step 2 is to run the model many, many times. With even a moderately powerful computer, we can simulate thousands or even millions of dice rolls in just a few moments. Totaling the results gives us the approximate chance of success or failure, without having to do all that math by hand. You can see an example of this in Therg Fights a Skeleton.
Step 3 is to run that same model across many conditions. We can vary the number of dice in the dice pool from 0 to some arbitrarily large number. We can change the threshold of success from 1-2 to 1-3 or higher. Total the successes and failures, and we begin to see the options available for the game’s design.
In the case of Eleventh Beast, if I wanted the maximum chance of success to never exceed 90%, that meant limiting the maximum dice pool size to 5d6.1
The risk of being eaten (or mutated)
In You are a Muffin, customers come in, they place an order, determine if they will eat you, and the next hour begins. As a pastry, you slowly become more stale as you watch all of this from your place on the counter.
Both You are a Muffin and Exclusion Zone Botanist use a Risk Value (RV) that increases as time passes vs. a 2d6 dice roll. Every round, the chance of being eaten by a customer or being mutated by the forest becomes more likely.
I used Python to help answer some of the key design questions such as:
What is the average length of a game?
What are the chances of the game running too short or ending too early?
What are the chances of the game running too long?
How should the RV be distributed across the elapsed hours.
Should it be if both dice are less than the RV or just one die is less than the RV?
In Muffins and the Risk of Being Eaten, I walk through some of the code used to answer these questions using Python simulations.
The ocean calls
The examples of Eleventh Beast, You are a Muffin, and Exclusion Zone Botanist are all focused on dice rolls. While I prefer to use Python to simulate these types of problems, you can also use online tools such as AnyDice. Less coding knowledge in required, although you still need to learn the syntax.
In the case of Music in its Roar, it was more complicated than just rolling dice. The system uses a hex flower game engine.2 Each die roll moves a tracking token on a hex grid. The token winds its way from the bottom (Hex 1) ultimately to the top (Hex 19). How long (on average) it takes to move from start to finish depends on the dice, path, and probability.
Using Python, I was able to model the hex flower as a matrix.3 I could then simulate die rolls that moved a virtual token through the hexes, totalling up the number of steps and how many dead ends were encountered.
Run that simulation 100,000 times and we can see that it takes 23 steps on average from start to finish. It could take as few as 4 steps and as many as 238+ steps. The typical number of steps to finish was around 9-14.
This type of design information would have required either a brutal number of manual playtests or some rather sophisticated math. Knocking it out in a Python simulation took probably less than an hour.
Some important notes and cautions
Being able to quickly and easily simulate parts of game mechanisms is very cool. It’s a powerful tool, but it is just a tool. It needs to be applied properly to make sense:
You don’t need to be great at writing code. You do, however, need to carefully test your code to make sure it’s modeling the game correctly. Start small and check one loop before running it a million times.4
Focus on rules rather than elegance. Real coders would be appalled by my code, but it doesn’t matter. It’s better to write explicitly and clearly to follow the game mechanisms than write efficient code.
If you can model it all, question how much player agency exists. Being able to model a single die roll is fine, but the whole game? I’d question how much the player’s choices matter in that case.5
Visualize your data! Beware the datasaurus and look at a picture of your data. Make charts and graphs. The average path length for the hex flower game engine above was 23 steps, but the most common values were about 9-14 steps. Averages are not robust against outliers.
None of this replaces actual playtesting with humans. I consider Python simulations to be a quick tool to use very early in the design process. It gets you in the ballpark perhaps and helps find weird edge cases. Beyond that, you still need to have real people try your game and then ask them good questions.
Applied correctly, Python allows designers to iterate faster and make that first human playtest a better experience for the players.
Conclusion
Some things to think about:
Watch out for edge cases: Without noticing, your game might have a weird 1 in 1000 chance of ending prematurely on the second turn. Or there could be a random chance of it going on far too long. Python can help find those quickly.
Iterate faster: Iterative design (i.e. making revisions, testing, and revising again) is a proven way to create well-designed games. Python, AnyDice, and similar systems are tools that can help iterate faster. It makes sense to give your playtesters the best version of your game that you can as early as possible.
Not everything can (or should) be modeled: The best parts of games like freeform problem solving and bluffing can’t be modeled in Python.6 I’d argue that the more mechanisms in a game that can’t be easily modeled, the better.
What do you think? Have you ever used Python or another programming language to assist with early game design questions? What are some things that you don’t think could ever be modeled in this way?
— E.P. 💀
P.S. “A beautiful, straight-forward, and inspiring book.” Get ADVENTURE! Make Your Own TTRPG Adventure, the latest guide from Skeleton Code Machine, now at the Exeunt Press Shop! 🧙
Skeleton Code Machine is a production of Exeunt Press. All previous posts are in the Archive on the web. Subscribe to TUMULUS to get more design inspiration. If you want to see what else is happening at Exeunt Press, check out the Exeunt Omnes newsletter.
Skeleton Code Machine and TUMULUS are written, augmented, purged, and published by Exeunt Press. No part of this publication may be reproduced in any form without permission. TUMULUS and Skeleton Code Machine are Copyright 2025 Exeunt Press.
For comments or questions: games@exeunt.press
At 5d6 the chance of success is about 87%. At 6d6 it is just over 91%.
Tumulus Issue 01. Do not trust robots. has more on hex flower game engines, including a playable example about a malfunctioning (and increasingly hostile) robot.
More precisely as a list of lists — one large list containing 19 sublists that indicate possible next steps from the current hex.
Learn Python the Hard Way is a good crash course if you want to get started.
Personally, if TTRPG combat can be fully modeled in Python that might mean it’s simply a slugfest of taking turns hitting each other. I hit the monster. The monster hits me. Eventually one of us dies. That is, in my opinion, rarely fun over the long term. A notable exception is that I think combat in Dark Fort is fast and fun.
Of course “can’t” and “never” are strong words. I’m sure it can be modeled, but that would be well beyond my capabilities. I’d also question how good a model could ever be.
I will always remember that scene (based on real events) where the Mcdonalds founders had a kitchen layout drawn on a tennis court and hired actors to enact the parts thus modeling their kitchen and process
I also used python to calculate the modelling of combats with no extra options just to test the main mechanic with asymetrical combatants. I wanted to test it since I wanted to know how it affected combat if a higher level skill, that does more damage, done by a lower HP enemy fought versus a lower tier skill done by the PC with less damage but more than double the HP.
This helped in modelling things in the earlier stages till the game had enough substance for solo play before realeasing a beta status version for public testing.
Thank you Python.