Publication Date


Date of Final Oral Examination (Defense)


Type of Culminating Activity


Degree Title

Doctor of Philosophy in Electrical and Computer Engineering


Electrical and Computer Engineering

Major Advisor

Nader Rafla, Ph.D.


Kurtis Cantley, Ph.D.


Edoardo Serra, Ph.D.


Neural networks are extensively used in software and hardware applications. In hardware applications, it is necessary to implement a small, accelerated, and configurable hardware architecture to be easily embedded in hardware devices to implement and execute the required neural network with superior performance. Such configurable hardware architecture allows the user to implement neural networks with different structures and easily modify or change them as needed.

In this dissertation, three architectures, each containing three layers, have been designed using a system-on-chip approach and implemented on a Field Programmable Gate Array (FPGA), to realize and accelerate the performance of three types of neural networks. These three neural networks are: Fully Connected Neural Networks (FCNN); Recurrent Neural Networks (RNN); and Convolution Neural Networks (CNN). The first layer of these architectures is a software Python layer, which contains a function that serves as the architecture’s user interface. The function accepts the description of the neural network structure and its training parameters as inputs and generates three binary files as outputs. These files include the network description, weights, and bias in a specific format. The second layer is an embedded software layer implemented on the on-chip ARM microcontroller. The embedded layer reads the binary files generated by the Python function and begins transferring the required parameters and configuration of each layer in the neural network to the third layer, the hardware layer. This embedded layer also monitors a status register(s) built in the third layer to determine when to send consequent layer parameters and configuration. The third layer is a hardware Intellectual Property (IP) implemented on the FPGA fabric and is configured by the second embedded layer to execute the required neural network layers consecutively. The first architecture supports implementing FCNN with up to 1024 layers, each with a 1024 maximum neurons and four distinct activation functions (Relu, Sigmoid, Tanh, and SoftMax). The design also supports implementing the Residual Neural Network (ResNet). The second architecture supports implementing RNN with three-layer types: Recurrent, Attention, and Fully Connected (FC) layers. This architecture allows the implementation of a Recurrent layer on an FPGA using a Long Short Term Memory (LSTM) model or a Gated Recurrent Unit (GRU) with up to 100 elements in each of the input and hidden vectors. It also supports executing an attention layer with up to 64 input vectors and a maximum vector length of 100 items. FC layers can be configured to support an input vector length up to a value of 256 and number of neurons up to a value of 256 in each layer. Each FC layer can use either Relu or SoftMax activation functions. Finally, the third architecture supports implementing a complete CNN, including three-layer types (Convolution, Pooling, and FC). The proposed design supports implementing the convolution layer with five different filter sizes and different stride and padding values. The CNN hardware IP also supports implementing two types of pooling (average and maximum) with various pooling window and stride sizes. This hardware architecture also supports FC layers with input and output vector lengths of up to 4096 elements and two distinct activation functions (Relu and SoftMax).


Available for download on Sunday, December 01, 2024

Files over 30MB may be slow to open. For best results, right-click and select "save as..."