Publication Date

12-2024

Date of Final Oral Examination (Defense)

8-14-2024

Type of Culminating Activity

Dissertation

Degree Title

Doctor of Philosophy in Computing

Department

Computer Science

Supervisory Committee Chair

Tim Andersen, Ph.D.

Supervisory Committee Member

Grady Wright, Ph.D.

Supervisory Committee Member

Casey Kennington, Ph.D.

Supervisory Committee Member

Hoda Mehrpouyan, Ph.D.

Abstract

This dissertation presents a framework for the development of deep learning models tailored for dynamic web tasks, leveraging generalized pre-trained multimodal transformers. A task generation framework, applied to multiple web datasets, is introduced, facilitating instruction fine-tuning of models for executing multi-step web workflows. This approach enhances the adaptability of pre-trained models to a spectrum of novel web tasks, which is vital for the reliable operation of web agents.

Moreover, this work proposes an encoding schema extending the Decision Transformer, which advances the adaptability of these models for downstream tasks through targeted modality tokenization, thereby broadening their practical applicability. These enhancements improve the adaptability and functionality of models as agents, which are essential for their deployment in real-world applications.

This work showcases methods and tools for the aggregation of complex web-based datasets and contributes to the computational infrastructure for curation of multi-action tasks in web environments. The findings indicate that deep learning models are increasingly capable of practical deployment, enabling agents to facilitate effective interactions across a variety of web-based tasks. This work contributes to the field by detailing the implementation of advanced learning models as agents within web environments, advancing the deployment and utilization of AI in complex digital landscapes.

DOI

https://doi.org/10.18122/td.2343.boisestate

Share

COinS