Blockchain Based Privacy Preserving Framework for Distributed RAG
Faculty Mentor Information
Dr. Min Long (Mentor), Boise State University; and Dr. Gaby Dagher (Mentor), Boise State University
Abstract
With the growing use of Large Language Models (LLMs) across various applications, Retrieval Augmented Generation (RAG) is being used to help LLMs to give more recent and accurate responses. While RAG has shown significant success in improving response accuracy of LLMs, it remains susceptible to inaccurate and maliciously manipulated data. In this paper, we propose Distributed RAG, a novel distributed blockchain framework to increase the integrity of RAG. Distributed RAG replaces RAG’s database with specialized communities where each community consists of a database and a permissioned blockchain. Each blockchain requires data to be verified by experts specific to the field through a privacy-preserving consensus protocol before being added to the database. The consensus protocol for these blockchains will be double-blind, where the identity of the proposer and validators are hidden using Zero Knowledge Proofs and Mix Networks. A retrieval blockchain is also incorporated which communicates between the multiple communities by retrieving documents for each query and ranking them using an LLM. These rankings are then agreed upon with the top document being retrieved by the LLM to generate a response. Our framework that we propose for RAG will increase the trust and security of RAG incorporated LLMs.
Blockchain Based Privacy Preserving Framework for Distributed RAG
With the growing use of Large Language Models (LLMs) across various applications, Retrieval Augmented Generation (RAG) is being used to help LLMs to give more recent and accurate responses. While RAG has shown significant success in improving response accuracy of LLMs, it remains susceptible to inaccurate and maliciously manipulated data. In this paper, we propose Distributed RAG, a novel distributed blockchain framework to increase the integrity of RAG. Distributed RAG replaces RAG’s database with specialized communities where each community consists of a database and a permissioned blockchain. Each blockchain requires data to be verified by experts specific to the field through a privacy-preserving consensus protocol before being added to the database. The consensus protocol for these blockchains will be double-blind, where the identity of the proposer and validators are hidden using Zero Knowledge Proofs and Mix Networks. A retrieval blockchain is also incorporated which communicates between the multiple communities by retrieving documents for each query and ranking them using an LLM. These rankings are then agreed upon with the top document being retrieved by the LLM to generate a response. Our framework that we propose for RAG will increase the trust and security of RAG incorporated LLMs.