Publication Date

5-2025

Date of Final Oral Examination (Defense)

3-3-2025

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

Elena Sherman, Ph.D.

Supervisory Committee Member

Jim Buffenbarger, Ph.D.

Supervisory Committee Member

Bogdan Dit, Ph.D.

Abstract

Data-driven software engineering has become a well-recognized approach to improving software engineering processes. By observing and recording variable values during runtime, we can reason about program semantics. Dynamic Invariant Inference is one of the applications of Data-driven software engineering, which uses these values for inferring program invariants from dynamic data using tools like Daikon or DIG. However, the quality of inferred invariants depends on diversity of dynamic data. Producing a diverse set of data points with coverage-adequate automated test case generation tools such as Evosuite is challenging.

In this thesis, we propose novel approaches to guide automated test case generators to produce diverse data points using the feedback from previously-observed data points. We implement and evaluate two approaches: the Datagen framework and the Multiset Genetic Algorithm (MultisetGA). These approaches are evaluated in the context of program invariant inference with dynamic-invariant generation tools. Our results show that while Datagen validates our data-diversity heuristic, Multi-setGA demonstrates significant improvements in both the speed and diversity of data generated.

DOI

10.18122/td.2363.boisestate

Available for download on Saturday, May 01, 2027

Share

COinS