Lakevision

Lakevision is a tool which provides insights into your Apache Iceberg based Data Lakehouse.

#iceberg #lakehouse #datalakehouse #apache #aws-s3 #carbon-components-svelte #carbon-design-system #daft #fast-api #mcp-server

Demo Download

lakevision logo

Lakevision

Lakevision is a tool that provides insights into your Data Lakehouse based on the Apache Iceberg table format.

It lists every namespace and table in your Lakehouse—along with each table’s schema, properties, snapshots, partitions, sort-orders, references/tags, and sample data—and supports nested namespaces. This helps you quickly understand data layout, file locations, and change history.

Lakevision is built with pyiceberg, a FastAPI backend, and a SvelteKit frontend, keeping other dependencies to a minimum.

https://github.com/user-attachments/assets/b6b2eef5-9f27-40ca-a80d-27b88d4a8cfd

Features

Search and view all namespaces in your Lakehouse
Search and view all tables in your Lakehouse
Display schema, properties, partition specs, and a summary of each table
Show record count, file count, and size per partition
List all snapshots with details
Graphical summary of record additions over time
OIDC/OAuth-based authentication support
Pluggable authorization
Lakehouse Health Feature
Optional “Chat with Lakehouse” capability

⚙️ Environment Setup

Before running Lakevision, you'll need to create and configure your local .env file:

cp my.env .env

Then edit .env to provide values for:

Your Iceberg catalog configuration (URI, warehouse path, etc.)

🧪 Don’t have a catalog yet? You can start with a sample one. See make make sample-catalog in the Makefile section.
Authentication details (e.g., token or credentials)
Optional cloud settings (S3, GCP, etc.)

This avoids modifying my.env, which is version-controlled and serves as a template.

🚀 Quick Start (Docker)

The easiest way to run Lakevision is with Docker.

Clone the repository and cd into the project root.
Build the image
```
docker build -t lakevision:1.0 .
```
Run the container

Make sure you’ve completed the Environment Setup step first.
```
docker run --env-file .env -p 8081:8081 lakevision:1.0 /app/start.sh
```
Run the health worker container

If the health functionality is enabled you need to start the container for the health worker.
```
docker run --env-file .env lakevision:1.0 /app/worker.sh
```

Once started, the backend listens on port 8000 and Nginx runs on port 8081. Visit http://localhost:8081 to explore the UI.

✅ Tested on Linux and macOS with the Iceberg REST catalog. Other PyIceberg-compatible catalogs should work too.

🧪 Want to try the in-memory sample catalog?

To build the image with the sample in-memory Iceberg catalog included:

docker build --build-arg ENABLE_SAMPLE_CATALOG=true -t lakevision:1.0 .

In your .env, comment out the default catalog settings and uncomment the sample catalog lines.
Then run the container as above

🛠️ Running Locally (Terminal or VS Code)

Prerequisites

Python 3.10+
Node.js 18+
A running Iceberg catalog

🔀 With Makefile (recommended)

Make sure you’ve completed the Environment Setup step first.

You can use the Makefile to automate common setup steps:

make init-be           # Set up Python backend
make sample-catalog    # Populate a local Iceberg catalog with sample data
make init-fe           # Install frontend dependencies
make run-be            # Start backend (FastAPI)
make run-fe            # Start frontend (SvelteKit)
make help              # List all Makefile commands

Once running, visit http://localhost:8081 to use the app.

🔧 Manual Setup (for advanced use)

1. Configure environment

Make sure you’ve completed the Environment Setup step first.

💡 Frontend note: All environment variables that begin with PUBLIC_ must be available in a separate .env file inside the /fe folder. You can do this manually, or by running:

make prepare-fe-env

This ensures the frontend build system (Vite) can access the variables during development.

2. Backend

cd be
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
set -a; source ../.env; set +a
PYTHONPATH=app uvicorn app.api:app --reload --port 8000

3. Frontend

cd ../fe
npm install
npm run dev -- --port 8081

🔧 Manual Setup with Authz (for advanced use)

1. Implement your Authz class

Implement your custom implementation module in the backend, must follow app/be/authz.py

init: Authz class configuration
has_access: Determines if user has access to a specific table
get_namespace_special_properties: Provide specific namespace properties from the Authz point of view. E.g.: Namespace's owners.
get_table_special_properties: Provide specific tble properties from the Authz point of view. E.g.: Table is restricted, table's owners, etc.

2. Enable Authz

Configure the following properties in your environment.

PUBLIC_AUTH_ENABLED=true
PUBLIC_OPENID_CLIENT_ID=
OPEN_ID_CLIENT_SECRET=
PUBLIC_OPENID_PROVIDER_URL=
PUBLIC_REDIRECT_URI=http://localhost:8081 #E.g. for local usage (or https://localhost:8081)
AUTHZ_MODULE_NAME=my_authz
AUTHZ_CLASS_NAME=MyAuthz

and run the be. E.g. make run-be

3. Run your Sveltekit localhost server with HTTPS (optional)

In case you need to run the frontend with https you can follow this simple steps:

Install a compatible plugin-basic-ssl to the vite version in the fe.

Add "@vitejs/plugin-basic-ssl": "^1.2.0" under devDependency in the package.json and install dependencies. Refers to: Running Locally section.

Update the vite config (vite.config.js):

... 
import basicSsl from '@vitejs/plugin-basic-ssl';

export default defineConfig({
plugins: [
  sveltekit(),
  // Optimize CSS from `carbon-components-svelte` when building for production.
  optimizeCss(),
  basicSsl()
],
...

This auto-generates a self-signed cert for dev. You’ll get a warning page you can bypass.

Run the frontend. E.g.: make run-fe

☸️ Kubernetes Deployment

Want to deploy Lakevision on Kubernetes or OpenShift?
Sample manifests are provided in k8s/, including example Deployment, Service, ConfigMap, and Secret YAMLs for running the unified (backend + frontend) container.

See k8s/README.md for quickstart instructions and customization notes.
You’ll need to edit the image name and environment variables before deploying.

Lakehouse Health Feature

The Lakehouse Health feature provides a system for running, scheduling, and monitoring data quality and health checks across your lakehouse.

When enabled, it adds two main UI components:

A "Health Check" tab on the main table-details page.
A dedicated "Lakehouse Health" page for a high-level overview of all known issues.

Architecture

This feature is disabled by default and operates as a small, services-oriented system. It relies on a central database and two independent background processes to function.

Main API (api.py): This is the main web server. It serves the frontend UI, handles user-triggered actions (e.g., "Run Health Check Now"), and reads from the database to display results.
Scheduler (scheduler.py): This is a lightweight, separate background process. Its only job is to run periodically (e.g., every 10 minutes), check for any scheduled jobs that are due, and enqueue them as tasks in the database.
Worker (worker.py): This is the heavy-lifting background process. It constantly polls the database task queue. When it finds a new task, it executes the actual health check against the Iceberg table, generates the results, and writes them back to the database. This would ideally run in another container, so that you can scale and have multiple workers active.

This separation ensures that a long-running health check (e.g., on a huge table) does not block or slow down the main API server.

Configuration

To enable this feature, you must set two environment variables.

PUBLIC_LAKEVISION_HEALTH_ENABLED
- true: Enables the feature in both the frontend and backend.
- false (or not set): Disables the feature entirely.
  - The UI will show a "Feature is disabled" message.
  - The backend API routes (/api/insights, /api/jobs) will not be loaded.
  - The scheduler.py and worker.py scripts will exit immediately if you try to run them.
LAKEVISION_DATABASE_URL
- This is only required if PUBLIC_LAKEVISION_HEALTH_ENABLED is true.
- It must be a connection string to a persistent database (e.g., PostgreSQL, MySQL).
- This database is used to store all health results, schedules, and the task queue. All three processes (API, Scheduler, and Worker) must be able to connect to it.

Running the Feature

When the health feature is enabled, you must run three separate processes for it to function correctly. Beside the main backend, you need to run 2 additional processes:

# 1. The scheduler process
scheduler: python -m app.scheduler

# 2. The worker process
worker: python -m app.worker


## 🧭 Roadmap

* Chat with Lakehouse capability using an LLM
* Table-level reports (most snapshots, partitions, columns, size, etc.)
* Optimization recommendations
* Limited SQL capabilities ✅
* Partition details (name, file count, records, size) ✅
* Sample data by partition ✅
* Table-level insights
* Time-travel queries

## 🤝 Contributing

Contributions are welcome!

1. **Fork** the repository and clone it locally.
2. **Create** a branch for your change, referencing an issue if one exists.
3. **Add tests** for new functionality where appropriate.
4. **Open a pull request** with a clear description of the changes.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing