Cody

Problem 44393. Testing for randomness: uniform distribution of integers

On Cody several problems have been set up asking players to generate one or more 'random' numbers. Usually they are asking for numbers from a uniform distribution function (UDF). However, Test Suites do not always test very carefully to see how 'random' the sequences generated by the submitted codes are. Indeed, rigorous testing of randomness is a sophisticated field of endeavour. (See e.g. TestU01 and the Diehard tests.)

MATLAB provides access to several very good pseudo-random number generators. Even these do not generate truly random number sequences — however, they are good enough for most purposes.

Here we will be considering only integer sequences. Your task will be to distinguish those that are 'practically' random — in this case being uniformly distributed — from those that aren't.

You must return:

  • the probability of the given sequence being generated (a scalar float); and
  • your assessment of whether the sequence could 'plausibly' have been produced by a true random number generator (a scalar Boolean).

Input sequences will be vectors of various lengths containing only the integers 1, 2, 3 and 4.

Here are a few cases to consider:

EXAMPLE ONE — Random sequence (UDF)

 Input = [3 4 3 3 4 2 2 2 1 3 2 2 1 3 3 2 4 3 1 2 3 2 1 4 2 1 4 4 4 2 3 1 2 1 2 2 1]
 prob ~ 5.29E-23
 isRandom = true

EXAMPLE TWO — Non-uniform distribution

 Input = [1 3 3 1 1 1 1 3 1 1 4 1 2 1 1 1 1 1 3 1 3]
 % Notice that there are far too many ones, and too few twos and fours.
 prob ~ 9.09E-13
 isRandom = false

EXAMPLE THREE — Blocked randomisation (or 'permuted block randomisation')

 Input = [1 3 2 4, 1 4 2 3, 3 1 4 2, 2 1 3 4, 1 4 3 2, 3 2 4 1, 4 1 3]
 % Notice that each of the four digits appears in groups of four.  In such cases the identity of neighbouring digits is often biased too, although it isn't apparent in the short sequence above.  
 prob ~ 5.55E-17
 isRandom = false

EXAMPLE FOUR — Repeated pattern

 Input = [1 3 1 4 2 4 2 3, 1 3 1 4 2 4 2 3, 1 3 1 4 2 4 2 3, 1 3 1 4 2 4 2 3, 1 3 1 4 2]
 % Notice the pattern of eight digits that is repeated.  Notice also that the identity of neighbouring digits is biased, as e.g. 4 is only ever followed by 2, and 3 is only ever followed by 1.  
 prob ~ 5.29E-23
 isRandom = false

EXAMPLE FIVE — Partial repeated sequence, too few runs

 Input = [1 2 3 4 2 3 4 1 3 4 1 2 4 1 2 3 1 2 3 1 2 4 1 3 4 2 3 4 1 2 3 4]
 % Notice that the sequence "1 2 3 4" is repeated in part or in full.  Notice also that the neighbouring digits are biased — e.g. 2 most commonly follows 1, but 2 is never followed by 1 or 2.  And finally notice that there are no runs (repeated occurrences of the same numeral).  
 prob ~ 5.42E-20
 isRandom = false

EXAMPLE SIX — Partially segregated sequence, runs too long

 Input = [1 1 4 1 1 1 1 1 1 3 3 3 3 1 3 3 3 3 4 4 4 4 2 4 4 4 4 2 2 2 2 2 2 3 2 2]
 % Notice that the sequence unfolds as mostly 1, then mostly 3, then mostly 4, and mostly 2 at the end.  Notice also that the neighbouring digits are biased — each number most commonly follows itself.  And finally notice that the runs are longer than we would expect from a truly random (UDF) sequence.  
 prob ~ 2.12E-22
 isRandom = false

As will be apparent, there are various tests that could be applied: inspecting occurrence of numerals, pairs of neighbouring numerals, runs of numerals (length of runs, number of runs), repeated sequences, etc. Your job is just to return the correct outputs. You are not necessarily required to implement every test.

Note: the sequences in the examples above were for illustrative purposes; most sequences in the Test Suite are considerably longer.

See also:

Solution Stats

37.5% Correct | 62.5% Incorrect
Last solution submitted on Nov 30, 2018

Solution Comments