Random Musings; A Dice Set Review
Posted by thehydradm
Per the title, here is some work I’ve done recently on randomness as inspired by this post at Gnomestew (which is a wonderful site that you should all go read if you don’t already). In this post I will be examining several dice using statistics to validate (or invalidate) their relative randomness, including an overview/review of my recently purchased GameScience dice. The results of a GameScience d20 (20-sided die) will be compared against a random number generator “d20″ and an old red d20 I found laying around my house. A more contemporary d6, taken from a set of Monopoly, will then be compared against the GameScience d6 as well as a RNG d6 to see if the newer “normal dice” are more or less accurate than older dice. This approach is fairly anecdotal in terms of the number of dice compared, but in terms of the number of throws the results are relatively conclusive.
To begin, my own explanation of the chi-squared test. This is going to be annoying and complicated, because I’ve gotten some flak recently about notactuallyexplaining how a Pearson chi-squared test works. This is less easy to understand than my previous explanation, but it is more technically accurate.
The Pearson chi-squared test, in short, involves finding a test statistic, representative of the deviation from your expected results by your experimental results, and comparing this test statistic with a critical value created by a combination of a P-value and a number of degrees of freedom. For a given number of degrees of freedom, therefore, and a given value of P, there will be a given critical value to which you can compare your test statistic.
The P value is the probability that your experimental results can be accounted for by your theoretical results plus a degree of error. A P-value of 1 would mean perfectly matching results between theoretical expectation and practical result. A P-value of 0, in contrast, would mean a complete disagreement between theoretical expectation and practical result. This would seem to make P a sliding scale of confidence in your theory based on the experiment from 0 to 100%, although in reality this is technically not the case. This is the statement that can get you in trouble, so pay attention: although p-value appears to correlate with how likely the null hypothesis (theoretical results) is to be true, it is still not necesarily the case. I have since revised this post specifically to address that complaint, because people who were actually fans of statistics or the science that uses them thought I was being a moron.
The degrees of freedom in this case are related to the number of faces on the die. It’s the number of different discrete outcomes, basically, but minus 1. The reasoning behind that is mathematically complicated and I’ll spare you from it here.
By combining a given P-Value and number of degrees of freedom you can then find the critical value, which again derived through complicated math that I will not burden you with.
Finally, by doing some simple math to the theoretical and expected results you can come up with a rough gauge of how distant they are from each other in an absolute sense, which is known as the test statistic.
In this study I will be comparing the test statistic such that it matches a critical value, and from there find a P-value, which I will compare to possible arbitrary P-values that determine the threshold at which I can challenge the null hypothesis (this threshold is known as alpha, which is a greek letter). I will not be concerning myself with significant digits as I always thought they were annoying anyway and I’m doing this for fun not academic publication.
Procedure and Findings
I did seven chi-squared goodness of fit tests, two of which were using a Random Number Generator (Build 86 of the popular virtual tabletop software program “Map Tool”, one test with a D6 analog and the other with a D20 analog), two of which were using a GameScience Smoke Quartz Pre-Inked D20 (one with the sprue improperly sanded, so I had to do a second test), one with a red D20 that was created sometime in the 1980s (it does not adhere to the modern ‘opposite faces sum to 21′ dicemaking practice but is still 1-20 rather than 1-10 twice, which gives me a rough clue to its age, but not exactly), one with a GameScience D6 (the same pre-inked Smoke Quartz gem dice set), and one with a D6 I took from a Monopoly set. The first test (RNG D20) is not included in the excel data at the end of this post (a file saving snafu, unfortunately). The P value of this first test, which consisted of 200 “rolls”, was very high (.61), which compares favorably with an alpha of .1 or .05, since this means the rejection of the null hypothesis is not possible by a fairly great extent (in other words, the theoretical results and the experimental results match pretty well). Again, I would caution the reader that although P-value correlates with the relationship to the null hypothesis, this test is based on probabilities, and thus even a P-value of .999 has that .001 chance to be wrong about the null hypothesis being correct.
The next test I performed was administering 200 rolls with my GameScience D20 that I recently received in the mail on a hard, flat surface (my desk) with all rolls coming to a natural halt via surface friction rather than obstruction. Unfortunately, I did a right cruddy job of sanding down the molding sprue, and therefore got results where 14 (the opposite face of 7, the one with the sprue) did not appear very often (2 times out of 200). Still, the critical value of that test was 18.2, which means with 19 degrees of freedom that test resulted in a P value of ~.509. Not too bad, and it certainly doesn’t show a statistically significant pattern of being non-random. Therefore, the GameScience D20 is what I would call “extremely random”, just like the Map Tool d20 function, which is good for a polyhedral device that claims to generate random numbers as you may suspect!
After finishing sanding the sprue properly on my GameScience D20 I managed to improve the results of rolling the 14 during the next test (6 out of 200). In non-recorded testing results I managed to get it going about 1 for every 20 rolls over the course of about 300 rolls, so the sprue can be mostly mitigated (although it left something of a small pit in the die on the 7 face near the vertex, which worries me a little). In this case the P value was surprisingly slightly worse at P = ~.382, which still isn’t actually that bad, but does note the problem of working with the vagaries of random numbers – if the variance in P is that large it’s pretty important that we avoid assigning absolute results to these tests. Regardless, both of these tests on the GameScience D20 had a mean result of ~10.8 and ~10.3 respectively, with no consistent mode. The median was 10 both times. Suffice to say that the GameScience D20 would be a legal die at any table I played at because it is quite clearly very random over the long term.
In comparison, I took an old red D20 that I had laying around. It has a visible sprue blemish on the 1 (at least I’m assuming it’s a sprue blemish given the size and location) but is completely smooth there and is thus merely a discoloration. To give you an idea of its age I’m pretty sure it was from a Red Box set that I found laying around the house when I was a kid. It’s not “numbered 1-10 twice” old, but it’s old, notably because the opposite faces do not sum to 21 (a modern die-making convention that helps to ensure a more even distribution in the event of manufacturing imperfections).
The results for this old red D20 are, frankly, horrific. The mean was 8.92, the median 9, and mode 2. No eighteen appeared until the 127th roll, and no second eighteen appearing until the 191st roll. There was no protuding blemish, yet it rolled eighteens as poorly as the improperly sanded GameScience d20 did fourteens! The P value of the red D20 was ~.002, which means the die is extremely likely to be dodgey in comparison to even an alpha value of .05 (which is a pretty common one used in scientific studies). It’s not even the good kind of rigged, it rolls low!
As a final exam, of sorts, I took the GameScience D6, which was in much better shape than the D20. I didn’t even need to sand the sprue at all, it was completely flat from the get-go. In this case the die actually beat the random number generator, with a P value of .608 compared to the RNG D6-analog’s .527 (after 150 rolls each). Compare this to the result of a D6 I snatched out of a board game, which got .17; The Monopoly die wasn’t exactly non-random, but it was clearly not as good as a RNG or the GameScience die, at least in this test. Again, however, given the vagaries of fate displayed earlier with the re-test of my GameScience d20 I would like to remind the reader that comparisons like these arevery trickyto make with any degree of accuracy. This test with this number of inputs (throws) is primarily useful for determining if dice are obviously rigged or not, not to determine comparisons between dice ththat do not appear rigged in the first place for practical purposes.
In conclusion: GameScience dice are, indeed, very random dice, and seem to be at least somewhat more random than normal dice, though any result beyond “GameScience dice seem to be sufficiently random for practical purposes” is a bit of an educated guess. However, they do come with some significant drawbacks, which I will explain in the review. You can acquire my results in Excel spreadsheet format here.
With the tests out of the way, now I can get on to a quick review of GameScience pre-inked gem dice.
I bought a 7 die GameScience set in Smoke Quartz from a reseller on Amazon, pre-inked. I took Smoke Quartz because I wanted a slightly darker die color than “clear” to aid in contrast. Low contrast dice, so I’m told, are difficult to read. I don’t own any low contrast dice so I can’t comment there. They arrived in a small plastic box that broke apart easily (in a good way – no struggling with the packaging) with a small insert talking about how there’s a sprue that you can sand down, that the dice constitute a choking hazard for children under 3 years of age, and, most curiously of all, a bar code label “7-pc Smoke Quartz No Ink”. Well, there’s ink in these, so that’s kind of strange. Perhaps the reseller is pulling the wool over my eyes? Who knows.
Anyway, the quality of the plastic is pretty good – it’s translucent (nearing transparent) as promised in the pictures online. The edges are extremely sharp, and a particularly nice feature I found was that the corners of the D4 are cut off to prevent it from turning into a caltrop (which was nice, because let me tell you these dice have edges sharp enough to kill!) There were a few notable defects. The inking, for one, is somewhat inconsistent. The 9 on the D12, for instance, is a bit light on ink in the tail of the number. Same goes for the bottom of the 3 on the D6, a slight overuse of ink on the 2 of the D10 (outside of the number), some slight imperfections in the inking on one face of the D4, missing a little ink in the tail of the 5 on the D8, and some marring of the 6 on the D-tens (D10 used for percentile dice to roll the tens place).
The D20 deserves its own mini section due to the outrageous bad quality, however. To begin, the 11 is still empty. The 9 might as well be. So the same for the 1, and the 13 is missing about half of its ink. Only the bar of the 5 still has ink in it, and the top half of the 12 is fading fast. The 13 and 11 faces additionally have fairly deep scratch marks across them. I know that the disclaimer says these dice won’t ever be as “pretty” as the rock polishered dice of, say, Chessex, but really? This looks downright ugly, and sort of ruins what would’ve been an otherwise very aesthetically pleasing set. I shall have to order another D20 individually to replace it, I think, which is rather expensive and irksome in comparison (overall it’s still only a couple dollars, but really I shouldn’t have to do that if I want dice that are free from deep scratches and missing paint).
Some sprues came better than others – the one on the D20 was a real pain in the neck, to the point that I had to go get coarse grit sandpaper and have at it rather than use the ultra fine sandpaper I had been using. Unfortunately in my zeal with the ultra fine sandpaper before I realized I needed to just get coarser stuff I slightly rounded the vertex between faces 7 and 4. Alas, problem exists between chair and keyboard on that account, and since I’m getting another D20 anyway that’s not such a big deal, but be warned to exercise caution in sanding down the sprues where you have to.
As an addendum to the review as I revise the chi-squared portion of the test, I have recently ordered the aforementioned replacement d20 and am currently working on finding the best way to ink the dice. You can expect a review of different inking methods in the future. The ink that came on my pre-inked dice seems to be water-soluble as a sweat finger can wipe it right out of the groove, something I learned now that I no longer depend on the singular screwed up d20 and can conduct tests like that without fear of having an unreadable die causing problems.
All in all, the inking work was a bit specious (primarily on the D20), which means you’re probably better off inking them yourself using any pigment-applying method with which you are familiar (paint on a toothpick, paint pen, crayon wax, etc.). The fact that you have to sand down the sprues yourself, and that your dice will never be free of the slight visual blemish that results, is a bit of a chore. The dice are kind of pricey, too, though only relative to other dice – practically speaking it’s the difference between getting a cup of coffee at starbucks or not. However, if you don’t mind paying a slight premium, and you don’t mind going at your dice like you would a miniature, GameScience dice are, as far as I can tell, almost certainly the premier polyhedral random number generating option. Being used to using a random number generator as I am online, when I play in person I’ll be sure to use GameScience dice in the future to achieve consistently random (now there’s an oxymoron if I ever heard one) results, and I’m glad that I invested in the purchase. Now I just need to get an un-inked set and be a bit more careful with the sandpaper…
-The Hydra DM