Publication Date
5-2023
Date of Final Oral Examination (Defense)
3-10-2023
Type of Culminating Activity
Thesis
Degree Title
Master of Science in Computer Science
Department
Computer Science
Supervisory Committee Chair
Edoardo Serra, Ph.D.
Supervisory Committee Member
Francesca Spezzano, Ph.D.
Supervisory Committee Member
Liljana Babinkostova, Ph.D.
Abstract
Detecting malicious behavior is becoming increasingly crucial as the internet becomes more prevalent. This problem can be formulated as an anomaly detection task on provenance data, where attacks are detectable as anomalies in the behavior of the system. The availability of system-level data in comparison to network data is quite limited and so is the research carried out on system-level logs. However, monitoring the operating system's processes during program execution and identifying anomalous behavior in system calls can be beneficial since it can provide broad coverage and generality, as a variety of malicious applications could be identified. Furthermore, logs like system processes and events are provenance data- a graph that describes the relationship between all the elements that contributed to the creation of the data, making use of a Graph Neural Network (GNN) better suited for the task. Moreover, such data may contain metadata, which in general tends to be complex and make feature engineering more difficult resulting in limited usage of such features.
In this thesis, we address these issues by first utilizing the graph-like structure of logs, in which processes enact events and generate additional processes. Then we use a graph neural network to create representations of each event, encoding information about their neighboring events in a way that is unsupervised. The second is to make use of complex features such as command arguments which vary widely and cannot be used in the presented format as features in typical machine learning algorithms. If these features are instead encoded using a system composed of transformer and Variational Auto Encoder models, they can then be used in other algorithms such as a GNN or anomaly detector. These two approaches combined improve anomaly detection AUCROC for the BETH dataset by around 8 percent as compared to the manually engineered features alone.
DOI
https://doi.org/10.18122/td.2065.boisestate
Recommended Citation
Lakha, Bishal, "Anomaly Detection Using Graph Neural Network" (2023). Boise State University Theses and Dissertations. 2065.
https://doi.org/10.18122/td.2065.boisestate