Publication Date
5-2025
Date of Final Oral Examination (Defense)
3-3-2025
Type of Culminating Activity
Thesis
Degree Title
Master of Science in Computer Science
Department
Computer Science
Supervisory Committee Chair
Elena Sherman, Ph.D.
Supervisory Committee Member
Jim Buffenbarger, Ph.D.
Supervisory Committee Member
Bogdan Dit, Ph.D.
Abstract
Data-driven software engineering has become a well-recognized approach to improving software engineering processes. By observing and recording variable values during runtime, we can reason about program semantics. Dynamic Invariant Inference is one of the applications of Data-driven software engineering, which uses these values for inferring program invariants from dynamic data using tools like Daikon or DIG. However, the quality of inferred invariants depends on diversity of dynamic data. Producing a diverse set of data points with coverage-adequate automated test case generation tools such as Evosuite is challenging.
In this thesis, we propose novel approaches to guide automated test case generators to produce diverse data points using the feedback from previously-observed data points. We implement and evaluate two approaches: the Datagen framework and the Multiset Genetic Algorithm (MultisetGA). These approaches are evaluated in the context of program invariant inference with dynamic-invariant generation tools. Our results show that while Datagen validates our data-diversity heuristic, Multi-setGA demonstrates significant improvements in both the speed and diversity of data generated.
DOI
10.18122/td.2363.boisestate
Recommended Citation
Bhusal, Sandesh, "Diverse Program Input Generation for Dynamic Invariant Inference" (2025). Boise State University Theses and Dissertations. 2363.
10.18122/td.2363.boisestate