Publication Date
12-2020
Date of Final Oral Examination (Defense)
10-22-2020
Type of Culminating Activity
Dissertation
Degree Title
Doctor of Philosophy in Computing
Department
Computer Science
Supervisory Committee Chair
Edoardo Serra, Ph.D.
Supervisory Committee Member
Maria Soledad Pera, Ph.D.
Supervisory Committee Member
Marion Scheepers, Ph.D.
Supervisory Committee Member
Francesca Spezzano, Ph.D.
Abstract
Unsupervised Graph Representation Learning methods learn a numerical representation of the nodes in a graph. The generated representations encode meaningful information about the nodes' properties, making them a powerful tool for tasks in many areas of study, such as social sciences, biology or communication networks. These methods are particularly interesting because they facilitate the direct use of standard Machine Learning models on graphs. Graph representation learning methods can be divided into two main categories depending on the information they encode, methods preserving the nodes connectivity information, and methods preserving nodes' structural information. Connectivity-based methods focus on encoding relationships between nodes, with neighboring nodes being closer together in the resulting latent space. On the other hand, structure-based methods generate a latent space where nodes serving a similar structural function in the network are encoded close to each other, independently of them being connected or even close to each other in the graph. While there are a lot of works that focus on preserving nodes' connectivity information, only a few works study the problem of encoding nodes' structure, specially in an unsupervised way.
In this dissertation, we demonstrate that properly encoding nodes' structural information is fundamental for many real-world applications, as it can be leveraged to successfully solve many tasks where connectivity-based methods fail. One concrete example is presented first. In this example, the task consists of detecting malicious entities in a real-world financial network. We show that to solve this problem, connectivity information is not enough and show how leveraging structural information provides considerable performance improvements. This particular example pinpoints the need for further research on the area of structural graph representation learning, together with the limitations of the previous state-of-the-art. We use the acquired knowledge as a starting point and inspiration for the research and development of three independent unsupervised structural graph representation learning methods: Structural Iterative Representation learning approach for Graph Nodes (SIR-GN), Structural Iterative Lexicographic Autoencoded Node Representation (SILA), and Sparse Structural Node Representation (SparseStruct). We show how each of our methods tackles specific limitations on the previous state-of-the-art on structural graph representation learning such as scalability, representation meaning, and lack of formal proof that guarantees the preservation of structural properties. We provide an extensive experimental section where we compare our three proposed methods to the current state-of-the-art on both connectivity-based and structure-based representation learning methods. Finally, in this dissertation, we look at extensions of the basic structural graph representation learning problem. We study the problem of temporal structural graph representation. We also provide a method for representation explainability.
DOI
10.18122/td/1754/boisestate
Recommended Citation
Joaristi, Mikel, "Unsupervised Structural Graph Node Representation Learning" (2020). Boise State University Theses and Dissertations. 1754.
10.18122/td/1754/boisestate