Systematically Producing Test Orders to Detect Order-Dependent Flaky Tests


Creative Commons License

Li C., Khosravi M. M., Lam W., Shi A.

32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Washington, United States Of America, 17 - 21 July 2023, pp.627-638 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1145/3597926.3598083
  • City: Washington
  • Country: United States Of America
  • Page Numbers: pp.627-638
  • Keywords: flaky test detection, order-dependent flaky test
  • Middle East Technical University Affiliated: Yes

Abstract

Software testing suffers from the presence of flaky tests, which can pass or fail when run on the same version of code. Order- dependent tests (OD tests) are flaky tests whose outcome depends on the order in which they are run. An OD test can be detected if specific tests are run or not run before it, resulting in a difference in test outcome. While prior work has proposed rerunning tests in different random test orders, this approach does not provide guarantees toward detecting all OD tests. Later work that proposed a more systematic approach to ordering tests still fails to account for the relationships between all tests in the test suite. We propose three new techniques to detect OD tests through a more systematic means of producing test orders. Our techniques build upon prior work in Tuscan squares to cover test pairs in a minimal set of test orders while also obeying the constraints of how tests can be positioned in a test order w.r.t. their test classes. Further, as there are many test pairs that need to be covered, we develop a technique that can take a specified set of test pairs to cover and produce test orders that aim to cover just those test pairs. Our evaluation with 289 known OD tests across 47 test suites from open-source projects shows that our most cost-effective technique can detect 97.2% of the known OD tests with 104.7 test orders, on average, per subject. While all techniques produce a relatively large number of test orders, our analysis of the minimal set of test orders needed to detect OD tests shows a tremendous reduction in the test orders needed to detect OD tests - representing an opportunity for future work to prioritize test orders.