privateGPT Clear Ingested Files

uright | Jan 27, 2024 min read

privateGPT: An Overview and Recent Updates

As an avid follower of privateGPT, I have been closely monitoring its evolution. Recently, privateGPT has undergone significant enhancements, notably the integration of a web-based User Interface (UI) and a robust backend Application Programming Interface (API).

Addressing a Common Challenge: Efficiently Managing Ingested Documents

A frequent scenario for privateGPT users involves the need to interact with a fresh set of documents. However, the current UI lacks a direct feature for clearing previously ingested files. Traditionally, this would require stopping the server and manually deleting the /local_data/private_gpt/qdrant folder to reset the vector database. For those seeking a more streamlined approach that avoids server restarts, I have developed a concise Python script. This script leverages the newly introduced ingest API to efficiently manage your documents.

The Solution: A Python Script for Simplified File Management

Below is the Python script I crafted, titled delete-ingested.py. This script is designed to list and delete all documents that have been ingested into privateGPT, thereby simplifying the file management process:

# Install required dependency
echo "requests" >> requirements.txt
pip install -r requirements.txt
# `delete-ingested.py`
import requests

def get_doc_ids():
    url = "http://localhost:8001/v1/ingest/list"
    headers = {
        "Accept": "application/json"
    }

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        data = response.json()
        doc_ids = [doc["doc_id"] for doc in data["data"]]
        return doc_ids
    else:
        print("Error: API call failed with status code:", response.status_code)
        return []

# Write a function to delete all the documents
# that have been ingested into the system
def delete_docs(doc_ids):
    url = "http://localhost:8001/v1/ingest/" + doc_ids
    headers = {
        "Accept": "application/json",
        "Content-Type": "application/json"
    }

    response = requests.delete(url, headers=headers)

    if response.status_code == 200:
        print("All documents deleted successfully")
    else:
        print("Error: API call failed with status code:", response.status_code)


doc_ids = get_doc_ids()

# Loop through the doc_ids and delete them
for doc_id in doc_ids:
    delete_docs(doc_id)

Executing the Script

To clear the ingested/indexed files, simply execute the script with the following command:

python3 delete-ingested.py
comments powered by Disqus