A modern system for semantic processing and querying of contracts, using Python, Pinecone, and SvelteKit. It allows semantic search and natural language questions about contracts.
├── api_pinecone.py # Main API with FastAPI
├── api_upload.py # API for contract uploads
├── llm_router.py # Router for LLM-based questions
├── pinecone_utils.py # Pinecone utilities
├── processar_contrato.py # Contract processing
├── shared.py # Shared functions and models
├── contratos/ # Directory to store contracts
├── frontend/ # SvelteKit application
├── diagrama_modulos.md # Diagram of module relationships
└── .env # Environment variables
For a detailed view of how the backend modules are related, refer to the file diagrama_modulos.md.
Backend (Python + FastAPI)
processar_contrato.py
: Automatic processing of PDFs and embedding generationapi_pinecone.py
: REST API for semantic contract search using Pineconeapi_upload.py
: API for uploading and automatically processing new contractspinecone_utils.py
: Utility library for interacting with Pineconellm_router.py
: Router for processing questions with LLMshared.py
: Shared functions across modulesFrontend (SvelteKit)
pinecone_utils.py
Purpose: Utility library for interacting with Pinecone (vector database).
Key functionalities:
text-embedding-3-small
)This file serves as a library of helper functions and does not need to be executed directly.
processar_contrato.py
Purpose: Processing PDF contracts and indexing them in Pinecone.
Key functionalities:
This file can be executed directly to process contracts:
python processar_contrato.py
python processar_contrato.py path/to/contract.pdf
api_pinecone.py
Purpose: REST API for semantic contract search using FastAPI.
Key functionalities:
This file starts a web server on port 8000 when executed: python api_pinecone.py
api_upload.py
Purpose: REST API for uploading and automatically processing new contracts.
Key functionalities:
This file starts a web server on port 8001 when executed: python api_upload.py
llm_router.py
Purpose: Router for processing questions using LLM (Large Language Model).
Key functionalities:
This file is imported by api_pinecone.py
and does not need to be executed directly.
shared.py
Purpose: Shared functions and models across different modules.
Key functionalities:
This file is imported by other modules and does not need to be executed directly.
# Install dependencies
pip install -r requirements.txt
# Configure environment variables (.env)
OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_HOST=your_pinecone_host
PINECONE_INDEX_NAME=brito-ai
OPENAI_MODEL=gpt-4o-mini # Optional, default is gpt-4o-mini
# Process existing contracts
python processar_contrato.py
# Process a specific contract
python processar_contrato.py contratos\EDUARD ROCHA FONTENELE.pdf
# Start the semantic search API with Uvicorn
uvicorn api_pinecone:app --host 127.0.0.1 --port 8000 --reload
# Start the upload API (in another terminal window)
uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload
Important Note: The vector index in Pinecone (brito-ai
) must be manually created through the Pinecone dashboard, using the text-embedding-3-small
model from OpenAI with a dimension of 1536.
# Navigate to the frontend directory
cd frontend
# Install dependencies
npm install
# Start in development mode
npm run dev
Initial contract processing (if necessary):
python processar_contrato.py
This step is optional if the contracts have already been processed and indexed in Pinecone.
Start the semantic search API:
uvicorn api_pinecone:app --host 127.0.0.1 --port 8000 --reload
This API is essential for the frontend to function, as it provides semantic search and question mode with LLM.
Start the upload API (optional, if you want to allow new contract uploads):
uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload
This step is optional if there is no need to upload new contracts.
Start the frontend:
cd frontend
npm install (if dependencies are not yet installed)
npm run dev
This will start the SvelteKit development server on port 5173.
After these steps, you can access the application at http://localhost:5173
and perform semantic queries on contracts already indexed in Pinecone.
contratos/
folder.python processar_contrato.py
to process all contracts in the folder.python processar_contrato.py path/to/contract.pdf
.uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload
./upload/contrato
endpoint.http://localhost:5173
./contratos/busca?q=your query
endpoint.http://localhost:5173
.Alternatively, use the API directly via the POST /llm/ask
endpoint with a JSON payload:
{
"question": "Your question here",
"max_results": 3
}
The system uses the model configured in the OPENAI_MODEL
environment variable (default: gpt-4o-mini) to generate detailed responses based on contracts found in the semantic search.
text-embedding-3-small
)api_pinecone.py
)Port: 8000
Endpoint | Method | Description | Parameters |
---|---|---|---|
/ |
GET | Checks the API status and Pinecone connection | - |
/contratos/lista |
GET | Lists all available contracts | skip : number of records to skiplimit : maximum number of records to return |
/contratos/busca |
GET | Performs a semantic search on contracts | q : search querylimit : maximum number of results |
/contratos/arquivos |
GET | Lists all unique file names in the index | - |
/llm/ask |
POST | Answers questions about contracts using the LLM | Body JSON: {"question": "string", "max_results": int} |
api_upload.py
)Port: 8001
Endpoint | Method | Description | Parameters |
---|---|---|---|
/upload/contrato |
POST | Uploads a new PDF contract and processes it automatically | Form Data: file : PDF file |
/contratos/lista |
GET | Lists all contracts available in the contracts folder | - |
Uvicorn is a high-performance ASGI (Asynchronous Server Gateway Interface) server recommended for FastAPI applications. To start the APIs with Uvicorn, follow the commands below:
uvicorn api_pinecone:app --host 127.0.0.1 --port 8000 --reload
Important options:
--host 127.0.0.1
: Restricts access to localhost only.--port 8000
: Sets port 8000 for the API.--reload
: Enables auto-reload mode (useful for development).uvicorn api_upload:app --host 127.0.0.1 --port 8001 --reload
For production, remove the --reload
flag and consider using --host 0.0.0.0
if the API needs to be accessed from other devices on the network.
Question Mode Fix (LLM): Resolved an issue affecting question mode after processing new contracts. The solution involved modifying llm_router.py
to directly access the buscar_documentos
function from the pinecone_utils.py
module, bypassing format incompatibility with the buscar_contratos
function.
Improved Error Handling: Implemented more robust error handling across the application, with clearer messages and detailed logs for easier debugging.
Frontend Optimization: Enhanced error handling in the frontend to display clearer messages to users.
New Contract Processing: Added and processed new contracts (contrato_joao_silva.pdf
and contrato_maria_oliveira.pdf
).
Documentation Update: Improved documentation with detailed instructions for initialization and system usage.