Hunyuan Video 720p on RTX 3090: Full On-Premise AI Media Pipeline E2E

# docker

Jörg Fuchs

Running AI video generation on consumer hardware - here is our full E2E pipeline that generates...

Running AI video generation on consumer hardware - here is our full E2E pipeline that generates photos and videos without any cloud APIs.

Hardware

RTX 3090 24GB VRAM
Intel i7-14700F (20 cores)
30GB WSL2 RAM (critical for Hunyuan)

Photo Pipeline (FLUX Dev FP8)

Resolution: 1344x768
Generation time: ~44 seconds
Quality: Professional stock photo level
Guidance: 3.5 via FluxGuidance node
CFG: 1.0 (FLUX ignores traditional CFG)

Video Pipeline (Hunyuan Video FP8)

Resolution: 1280x720
Frames: 13 (~1.1s at 12fps)
Generation time: ~7.8 minutes
Key fix: quantization=fp8_e4m3fn keeps model at ~12GB on GPU

Critical Learning

The pre-quantized FP8 Hunyuan model with quantization=disabled causes OOM because HyVideoModelLoader upcasts weights to bf16 (~24GB). Setting quantization to fp8_e4m3fn keeps it in FP8 format (~12GB), leaving room for VAE and sampling.

VRAM Management

We built a custom VRAM Guard service that coordinates GPU access between Ollama (LLM) and ComfyUI (media generation). Before video generation, Ollama models are unloaded and ComfyUI cached models are freed.

Pipeline Architecture

ComfyUI API → n8n workflow orchestration → Social Poster service → auto-post to Twitter, LinkedIn, Reddit, Dev.to

All running on Docker Swarm across 6 nodes. No cloud dependencies.