Wednesday, February 29, 2012

Example: Working With RAND()

Question: How do I generate random integers in the range 1 through 6?

Answer:  

Oh, you mean "How do I use SQL Anywhere to generate random numbers in the range 1 through 6?"

Well, you can call RAND()... which gives you a number between 0 and 1, but doesn't every actually give you a zero or a one, and take it from there.

Gee, thanks for all the help...

OK, OK, let's take it a bit further... let's call EXPRTYPE() to see what RAND() returns:
SELECT EXPRTYPE ( 'SELECT RAND()', 1 ) AS "data type";

data type
double

Great... RAND() returns a honking big floating point number.

Well, what do they look like? Let's use sa_rowgenerator() to call RAND() ten times:
SELECT RAND() AS random FROM sa_rowgenerator ( 1, 10 );

random
0.7364153851458874
0.9333781469303082
0.28651545768907083
0.46529738021329853
0.25306924490866683
0.33479917996320835
0.969817641642791
0.7251030903892141
0.8076401715202444
0.00836274074779951

Wonderful! That's so far away from 1, 2, 3, 4, 5, 6 it's not even funny!

Why does everything have to be so hard?

Sigh... let's get started... multiply by 6 and add 1, to change the interval from "0 to 1" to "1 to 7":
SELECT ( RAND() * 6 ) + 1 AS random 
  FROM sa_rowgenerator ( 1, 10 );

random
4.5213047105452535
5.568270134072876
3.916143362836979
4.621499201106605
3.5370729987216523
5.585889514808492
6.045075386317947
1.5820178457452068
2.9739334396896573
2.899320864071753

Much better... now the results are in the range 1-point-something to 6-point-something, but don't forget: it's never exactly 1 or exactly 7.

Now let's use CAST to drag the values down to 1 through 6, where the values can sometimes be exactly 1 or exactly 6:
SELECT CAST ( ( RAND() * 6 ) + 1 AS INTEGER ) AS random 
  FROM sa_rowgenerator ( 1, 10 );

random
3
1
4
3
3
6
2
3
4
6

MUCH better... now let's see if it generates the same number of each digit 1 through 6.

Let's generate a million numbers and count how many of each digit we get:
SELECT @@VERSION,
       random              AS digit,
       COUNT(*)            AS actual_occurrences,
       ( 1000000 / 6.0 )   AS perfect_occurrences
  FROM ( SELECT CAST ( ( RAND() * 6 ) + 1 AS INTEGER ) AS random 
           FROM sa_rowgenerator ( 1, 1000000 ) 
       ) AS generated 
 GROUP BY random
 ORDER BY random;

@@VERSION,digit,actual_occurrences,perfect_occurrences
'12.0.1.3298',3,1000000,166666.67

Oops, that's not right.

Try it again and you get a million fours. Or a million ones.

But just a single digit, not an even distribution across all 6 digits.

What we want is 166,666 ones, 166,666 twos, and so on... like we (almost) get in SQL Anywhere Version 9:
@@VERSION,digit,actual_occurrences,perfect_occurrences
'9.0.2.3951',1,166070,166666.67
'9.0.2.3951',2,166786,166666.67
'9.0.2.3951',3,166835,166666.67
'9.0.2.3951',4,166467,166666.67
'9.0.2.3951',5,166944,166666.67
'9.0.2.3951',6,166898,166666.67

This Is Why We Test...

...to make sure our crap stuff actually works.

Somewhere, deep down inside the SQL Anywhere query engine, the promise that RAND() "is treated as a non-deterministic function" is being broken... perhaps it's because of all the GROUP BY and COUNT(*) stuff that was added to the SELECT.

Or maybe it's a feature... in this case, it doesn't matter, we just need it to work.

And here's how: Bite the bullet, save the million RAND() results in a simple temporary table, then run the query:
SELECT RAND() AS random 
  INTO #generated
  FROM sa_rowgenerator ( 1, 1000000 );

SELECT @@VERSION,
       CAST ( ( random * 6 ) + 1 AS INTEGER )  AS digit,
       COUNT(*)                                AS actual_occurrences,
       ( 1000000 / 6.0 )                       AS perfect_occurrences
  FROM #generated 
 GROUP BY digit
 ORDER BY digit;

@@VERSION,digit,actual_occurrences,perfect_occurrences
'12.0.1.3298',1,166612,166666.67
'12.0.1.3298',2,166700,166666.67
'12.0.1.3298',3,166727,166666.67
'12.0.1.3298',4,166496,166666.67
'12.0.1.3298',5,166801,166666.67
'12.0.1.3298',6,166664,166666.67

That works in all versions of SQL Anywhere from 9 through 12, and the result looks pretty good: for all intents and purposes the number of ones, twos, threes, ... are the same.

So there's the answer: Call RAND(), multiply it to expand the range, add to it to shift the range, and truncate it to get an integer.

Tip: Truncate the number, don't ROUND() it, that might generate an incorrect number of 1s and 6s. That's what happened with this expression: CAST ( ROUND ( ( RAND() * 5 ) + 1, 0 ) AS INTEGER )

Are you sure that's random?


Scott Adams has the answer...

Dilbert.com


No comments: