Systematically Producing Test Orders to Detect Order-Dependent Flaky Tests

Li C., Khosravi M. M., Lam W., Shi A.

32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Washington, Amerika Birleşik Devletleri, 17 - 21 Temmuz 2023, ss.627-638, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1145/3597926.3598083
Basıldığı Şehir: Washington
Basıldığı Ülke: Amerika Birleşik Devletleri
Sayfa Sayıları: ss.627-638
Anahtar Kelimeler: flaky test detection, order-dependent flaky test
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Software testing suffers from the presence of flaky tests, which can pass or fail when run on the same version of code. Order- dependent tests (OD tests) are flaky tests whose outcome depends on the order in which they are run. An OD test can be detected if specific tests are run or not run before it, resulting in a difference in test outcome. While prior work has proposed rerunning tests in different random test orders, this approach does not provide guarantees toward detecting all OD tests. Later work that proposed a more systematic approach to ordering tests still fails to account for the relationships between all tests in the test suite. We propose three new techniques to detect OD tests through a more systematic means of producing test orders. Our techniques build upon prior work in Tuscan squares to cover test pairs in a minimal set of test orders while also obeying the constraints of how tests can be positioned in a test order w.r.t. their test classes. Further, as there are many test pairs that need to be covered, we develop a technique that can take a specified set of test pairs to cover and produce test orders that aim to cover just those test pairs. Our evaluation with 289 known OD tests across 47 test suites from open-source projects shows that our most cost-effective technique can detect 97.2% of the known OD tests with 104.7 test orders, on average, per subject. While all techniques produce a relatively large number of test orders, our analysis of the minimal set of test orders needed to detect OD tests shows a tremendous reduction in the test orders needed to detect OD tests - representing an opportunity for future work to prioritize test orders.