TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.25, ss.3673-3683, 2017 (SCI İndekslerine Giren Dergi)
Random numbers and random sequences are used to produce vital parts of cryptographic algorithms such as encryption keys and therefore the generation and evaluation of random sequences in terms of randomness are vital. Test suites consisting of a number of statistical randomness tests are used to detect the nonrandom characteristics of the sequences. Construction of a test suite is not an easy task. On one hand, the coverage of a suite should be wide; that is, it should compare the sequence under consideration from many different points of view with true random sequences. On the other hand, an overpopulated suite is expensive in terms of running time and computing power. Unfortunately, this trade-off is not addressed in detail in most of the suites in use. An efficient suite should avoid use of similar tests, while still containing sufficiently many. A single statistical test gives a measure for the randomness of the data. A collection of tests in a suite give a collection of measures. Obtaining a single value from this collection of measures is a difficult task and so far there is no conventional or strongly recommended method for this purpose. This work focuses on the evaluation of the randomness of data to give a unified result that considers all statistical information obtained from different tests in the suite. A natural starting point of research in this direction is to investigate correlations between test results and to study the independences of each from others. It is started with the concept of independence. As it is complicated enough to work even with one test function, theoretical investigation of dependence between many of them in terms of conditional probabilities is a much more difficult task. With this motivation, in this work it is tried to get some experimental results that may lead to theoretical results in future works. As experimental results may reflect properties of the data set under consideration, work is done on various types of large data sets hoping to get results that give clues about the theoretical results. For a collection of statistical randomness tests, the tests in the NIST test suite are considered. Tests in the NIST suite that can be applied to sequences shorter than 38,912 bits are analyzed. Based on the correlation of the tests at extreme values, the dependencies of the tests are found. Depending on the coverage of a test suite, a new concept, the coverage efficiency of a test suite, is defined, and using this concept, the most efficient, the least efficient, and the optimal subsuites of the NIST suite are determined. Moreover, the marginal benefit of each test, which also helps one to understand the contribution of each individual test to the coverage efficiency of the NIST suite, is found. Furthermore, an efficient subsuite that contains five statistical randomness tests is proposed.