This project transforms 3D renders of office interiors into photorealistic images using LoRA-Adapter.
Frontend view:
Example transformations:
Input Photo | Output Photo |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Images by PerOla Hammar and Gia Tu Tran, posted on Unsplash.
Create and activate a virtual environment:
# Create and activate virtual environment
python -m venv venv
# For Mac/Linux:
source venv/bin/activate
# For Windows:
venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
You can work with the script using three methods:
Option A: Batch Processing Script
task-images
folder (images in subfolders will not be processed)python scripts/enhance-image-lora_sdxl_render2photo_enhanced.py
processed-images
foldersettings.json
file will be generated for reproducibilityOption B: API
python api.py
enhance_image.postman_collection.json
(link)Option C: Web-Interface
python api.py
cd website/render2image-processor/
npm i
npm run dev -- --host 0.0.0.0
Optional: After you finished working, clear cache to free up disk space:
rm -rf ~/cache/huggingface/hub/models--stabilityai--sdxl-vae
rm -rf ~/cache/huggingface/hub/models--stabilityai--stable-diffusion-xl-base-1.0
(This removes downloaded Stable Diffusion models that may occupy significant space)
To train your own LoRA-Adapter, you need to do the following:
Collect the image pairs - 3d-renders and photos and post them on Hugging Face using the following structure:
{
{
id: 0,
photo_path: "path",
render_path: "path",
...
},
...
}
Add your Hugging Face token with this dataset to the .env file under HUGGING_FACE_TOKEN variable
Run the script scripts/final-fine-tune-lora-with-descaled-aspect-ratio.py
This will produce a LoRA-adapter, that you can use for the inference script
The following parameters can be customized:
Parameter | Default | Description |
---|---|---|
BASE_MODEL_PATH |
stabilityai/stable-diffusion-xl-base-1.0 | Base model for image generation |
VAE_PATH |
stabilityai/sdxl-vae | Improved VAE for better color reproduction |
LORA_DIR |
lora_sdxl_render2photo_enhanced/lora-weights-epoch-15 | Directory containing trained LoRA weights |
INPUT_DIR |
task-images | Directory for input images |
OUTPUT_DIR |
processed-images | Directory for processed images |
PROMPT |
high quality photograph, photorealistic, masterpiece, high quality, detailed, realistic, photorealistic, consistent shapes, consistent lighting, consistent shadows, preserve as many details from the original image as possible, 8k, 4k, sharp focus | General prompt for image enhancement |
FACE_PROMPT |
high quality photograph, photorealistic, masterpiece, perfect face details, realistic face features, high quality, detailed face, ultra realistic human face, perfect eyes, perfect skin texture, perfect facial proportions, clean render | Specialized prompt for face enhancement |
NEGATIVE_PROMPT |
low quality, bad anatomy, bad hands, text, error, blurry, out of focus, low resolution, cropped, worst quality, jpeg artifacts, signature, watermark, distorted | Characteristics to avoid in generation |
FACE_NEGATIVE_PROMPT |
low quality, bad anatomy, distorted face, deformed face, disfigured face, unrealistic face, bad eyes, crossed eyes, misaligned eyes, bad nose, bad mouth, bad teeth, bad skin | Face-specific characteristics to avoid |
STRENGTH |
0.4 | General processing strength |
FACE_STRENGTH |
0.35 | Face processing strength |
GUIDANCE_SCALE |
6.0 | Strength of prompt adherence |
FACE_GUIDANCE_SCALE |
8.0 | Face-specific prompt adherence |
RESIZE_LIMIT |
2048 | Maximum dimension for image resizing |
SEED |
42 | Random seed for reproducible results |
UNET_RANK |
32 | UNet rank from training script |
TEXT_ENCODER_RANK |
8 | Text encoder rank from training script |
LORA_SCALE |
0.8 | LoRA influence for main image processing |
FACE_LORA_SCALE |
0.3 | LoRA influence for face regions |
NUM_STEPS |
400 | Number of diffusion steps for main image |
FACE_NUM_STEPS |
200 | Number of steps for face regions |
USE_CUSTOM_NOISE |
True | Enable custom noise initialization |
MIXED_PRECISION |
fp16 | Precision setting for inference |
GRADIENT_CHECKPOINTING |
True | Memory optimization technique |
POST_PROCESS |
False | Enable post-processing |
CONTRAST_FACTOR |
1.2 | Contrast enhancement factor |
SHARPNESS_FACTOR |
1.7 | Sharpness enhancement factor |
SATURATION_FACTOR |
1.1 | Saturation enhancement factor |
FACE_DETECTION_CONFIDENCE |
0.7 | Confidence threshold for detection |
FACE_PADDING_PERCENT |
30 | Percentage to expand face crop area |
ENABLE_FACE_ENHANCEMENT |
False | Enable specialized face processing |
DEBUG_MODE |
True | Visualize face detection |
USE_DNN_FACE_DETECTOR |
True | Use robust DNN face detector |
FACE_DETECTOR_MODEL_PATH |
models/opencv_face_detector_uint8.pb | Path to detector model |
FACE_DETECTOR_CONFIG_PATH |
models/opencv_face_detector.pbtxt | Path to detector config |
MAX_IMG_SIZE |
2048 | Maximum image size for resizing |
USE_EMA |
True | Enable Exponential Moving Average |
EMA_DECAY |
0.9995 | EMA decay rate from training script |