Contratus Rag Svelte Python

igorjm

Contratus AI is a modern platform for semantic processing and querying of contracts. Built with Python, Pinecone, and SvelteKit, it enables natural language search and question-answering on contract documents.

#daisyui #fastapi #langchain #openai #pinecone #python #sveltekit #tailwindcss #uvicorn

Download

Contratus AI - Contract Query System

A modern system for semantic processing and querying of contracts, using Python, Pinecone, and SvelteKit. It allows semantic search and natural language questions about contracts.

Project Structure

├── api_pinecone.py       # Main API with FastAPI
├── api_upload.py         # API for contract uploads
├── llm_router.py         # Router for LLM-based questions
├── pinecone_utils.py     # Pinecone utilities
├── processar_contrato.py # Contract processing
├── shared.py             # Shared functions and models
├── contratos/            # Directory to store contracts
├── frontend/             # SvelteKit application
├── diagrama_modulos.md   # Diagram of module relationships
└── .env                  # Environment variables

For a detailed view of how the backend modules are related, refer to the file diagrama_modulos.md.

Backend (Python + FastAPI)
- processar_contrato.py: Automatic processing of PDFs and embedding generation
- api_pinecone.py: REST API for semantic contract search using Pinecone
- api_upload.py: API for uploading and automatically processing new contracts
- pinecone_utils.py: Utility library for interacting with Pinecone
- llm_router.py: Router for processing questions with LLM
- shared.py: Shared functions across modules
Frontend (SvelteKit)
- Modern and responsive interface with Tailwind CSS and DaisyUI
- Semantic search in natural language
- Contract visualization
- Question mode for natural language queries

Backend Code Explanation

1. `pinecone_utils.py`

Purpose: Utility library for interacting with Pinecone (vector database).

Key functionalities:

Connecting to Pinecone
Generating embeddings using OpenAI (model text-embedding-3-small)
Document indexing
Semantic document search
Listing documents in the index

This file serves as a library of helper functions and does not need to be executed directly.

2. `processar_contrato.py`

Purpose: Processing PDF contracts and indexing them in Pinecone.

Key functionalities:

Loads PDF files using LangChain
Splits documents into smaller chunks
Generates embeddings for each chunk
Indexes the chunks in Pinecone with metadata
Can process a single contract or all contracts in a folder

This file can be executed directly to process contracts:

To process all contracts: python processar_contrato.py
To process a specific contract: python processar_contrato.py path/to/contract.pdf

3. `api_pinecone.py`

Purpose: REST API for semantic contract search using FastAPI.

Key functionalities:

Endpoint to list all contracts
Endpoint for semantic search with natural language queries
Endpoint to list unique files in the index
Manages connection to Pinecone
Integrates with the LLM router for natural language questions

This file starts a web server on port 8000 when executed: python api_pinecone.py

4. `api_upload.py`

Purpose: REST API for uploading and automatically processing new contracts.

Key functionalities:

Endpoint for uploading PDF files
Asynchronous processing of uploaded contracts
Endpoint to list available contracts in the folder

This file starts a web server on port 8001 when executed: python api_upload.py

5. `llm_router.py`

Purpose: Router for processing questions using LLM (Large Language Model).

Key functionalities:

Endpoint to answer questions about contracts using the LLM
Integrates with semantic search to provide context to the LLM
Formats responses with source citations

This file is imported by api_pinecone.py and does not need to be executed directly.

6. `shared.py`

Purpose: Shared functions and models across different modules.

Key functionalities:

Shared function for contract search
Data models for API responses

This file is imported by other modules and does not need to be executed directly.

Requirements

Python 3.8+
Node.js 18+
Pinecone account (https://www.pinecone.io/)
OpenAI API key (https://platform.openai.com/)

Setup

Backend

# Install dependencies
pip install -r requirements.txt

# Configure environment variables (.env)
OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_HOST=your_pinecone_host
PINECONE_INDEX_NAME=brito-ai
OPENAI_MODEL=gpt-4o-mini  # Optional, default is gpt-4o-mini

# Process existing contracts
python processar_contrato.py

# Process a specific contract
python processar_contrato.py contratos\EDUARD ROCHA FONTENELE.pdf

# Start the semantic search API with Uvicorn
uvicorn api_pinecone:app --host 127.0.0.1 --port 8000 --reload

# Start the upload API (in another terminal window)
uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload

Important Note: The vector index in Pinecone (brito-ai) must be manually created through the Pinecone dashboard, using the text-embedding-3-small model from OpenAI with a dimension of 1536.

Frontend

# Navigate to the frontend directory
cd frontend

# Install dependencies
npm install

# Start in development mode
npm run dev

Recommended Execution Order

Initial contract processing (if necessary):
```
python processar_contrato.py
```
This step is optional if the contracts have already been processed and indexed in Pinecone.
Start the semantic search API:
```
uvicorn api_pinecone:app --host 127.0.0.1 --port 8000 --reload
```
This API is essential for the frontend to function, as it provides semantic search and question mode with LLM.
Start the upload API (optional, if you want to allow new contract uploads):
```
uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload
```
This step is optional if there is no need to upload new contracts.
Start the frontend:
```
cd frontend
npm install (if dependencies are not yet installed)
npm run dev
```
This will start the SvelteKit development server on port 5173.

After these steps, you can access the application at http://localhost:5173 and perform semantic queries on contracts already indexed in Pinecone.

Usage

Contract Processing

Place the PDF contract files in the contratos/ folder.
Run python processar_contrato.py to process all contracts in the folder.
Alternatively, process a specific contract: python processar_contrato.py path/to/contract.pdf.

Uploading New Contracts

Start the upload API: uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload.
Upload new contracts via the POST /upload/contrato endpoint.
Uploaded contracts will be processed automatically in the background.

Contract Querying

Access the frontend at http://localhost:5173.
Use the search bar to query contracts in natural language.
View results sorted by relevance.
Alternatively, use the API directly via the GET /contratos/busca?q=your query endpoint.

Question Mode (LLM)

Access the frontend at http://localhost:5173.
Enable the "Question Mode" switch next to the search bar.
Type your question in natural language (e.g., "What is the rent amount Eduardo pays?" or "Do we have invoice information?").
Click "Search" to get a detailed response based on relevant contracts.

Alternatively, use the API directly via the POST /llm/ask endpoint with a JSON payload:

{
  "question": "Your question here",
  "max_results": 3
}

The system uses the model configured in the OPENAI_MODEL environment variable (default: gpt-4o-mini) to generate detailed responses based on contracts found in the semantic search.

Technologies

Backend:
- Python
- FastAPI
- Uvicorn (ASGI server)
- Pinecone (vector database)
- OpenAI Embeddings (model text-embedding-3-small)
- OpenAI Chat Completions (default model: gpt-4o-mini)
- LangChain (document processing)
Frontend:
- SvelteKit
- Tailwind CSS
- DaisyUI

API Documentation

Semantic Search API (`api_pinecone.py`)

Port: 8000

Endpoint	Method	Description	Parameters
`/`	GET	Checks the API status and Pinecone connection	-
`/contratos/lista`	GET	Lists all available contracts	`skip`: number of records to skip `limit`: maximum number of records to return
`/contratos/busca`	GET	Performs a semantic search on contracts	`q`: search query `limit`: maximum number of results
`/contratos/arquivos`	GET	Lists all unique file names in the index	-
`/llm/ask`	POST	Answers questions about contracts using the LLM	Body JSON: `{"question": "string", "max_results": int}`

Upload API (`api_upload.py`)

Port: 8001

Endpoint	Method	Description	Parameters
`/upload/contrato`	POST	Uploads a new PDF contract and processes it automatically	Form Data: `file`: PDF file
`/contratos/lista`	GET	Lists all contracts available in the contracts folder	-

Starting with Uvicorn

Uvicorn is a high-performance ASGI (Asynchronous Server Gateway Interface) server recommended for FastAPI applications. To start the APIs with Uvicorn, follow the commands below:

Semantic Search API

uvicorn api_pinecone:app --host 127.0.0.1 --port 8000 --reload

Important options:

--host 127.0.0.1: Restricts access to localhost only.
--port 8000: Sets port 8000 for the API.
--reload: Enables auto-reload mode (useful for development).

Upload API

uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload

For production, remove the --reload flag and consider using --host 0.0.0.0 if the API needs to be accessed from other devices on the network.

Recent Updates

Question Mode Fix (LLM): Resolved an issue affecting question mode after processing new contracts. The solution involved modifying llm_router.py to directly access the buscar_documentos function from the pinecone_utils.py module, bypassing format incompatibility with the buscar_contratos function.
Improved Error Handling: Implemented more robust error handling across the application, with clearer messages and detailed logs for easier debugging.
Frontend Optimization: Enhanced error handling in the frontend to display clearer messages to users.
New Contract Processing: Added and processed new contracts (contrato_joao_silva.pdf and contrato_maria_oliveira.pdf).
Documentation Update: Improved documentation with detailed instructions for initialization and system usage.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing

Contratus Rag Svelte Python

Contratus AI - Contract Query System

Project Structure

Backend Code Explanation

1. pinecone_utils.py

2. processar_contrato.py

3. api_pinecone.py

4. api_upload.py

5. llm_router.py

6. shared.py

Requirements

Setup

Recommended Execution Order

Usage

Contract Processing

Uploading New Contracts

Contract Querying

Question Mode (LLM)

Technologies

API Documentation

Semantic Search API (api_pinecone.py)

Upload API (api_upload.py)

Starting with Uvicorn

Semantic Search API

Upload API

Recent Updates

Top categories

1. `pinecone_utils.py`

2. `processar_contrato.py`

3. `api_pinecone.py`

4. `api_upload.py`

5. `llm_router.py`

6. `shared.py`

Semantic Search API (`api_pinecone.py`)

Upload API (`api_upload.py`)