Hunyuan Video 720p on RTX 3090: Full On-Premise AI Media Pipeline E2E

# docker
Hunyuan Video 720p on RTX 3090: Full On-Premise AI Media Pipeline E2EJörg Fuchs

Running AI video generation on consumer hardware - here is our full E2E pipeline that generates...

Running AI video generation on consumer hardware - here is our full E2E pipeline that generates photos and videos without any cloud APIs.

Hardware

  • RTX 3090 24GB VRAM
  • Intel i7-14700F (20 cores)
  • 30GB WSL2 RAM (critical for Hunyuan)

Photo Pipeline (FLUX Dev FP8)

  • Resolution: 1344x768
  • Generation time: ~44 seconds
  • Quality: Professional stock photo level
  • Guidance: 3.5 via FluxGuidance node
  • CFG: 1.0 (FLUX ignores traditional CFG)

Video Pipeline (Hunyuan Video FP8)

  • Resolution: 1280x720
  • Frames: 13 (~1.1s at 12fps)
  • Generation time: ~7.8 minutes
  • Key fix: quantization=fp8_e4m3fn keeps model at ~12GB on GPU

Critical Learning

The pre-quantized FP8 Hunyuan model with quantization=disabled causes OOM because HyVideoModelLoader upcasts weights to bf16 (~24GB). Setting quantization to fp8_e4m3fn keeps it in FP8 format (~12GB), leaving room for VAE and sampling.

VRAM Management

We built a custom VRAM Guard service that coordinates GPU access between Ollama (LLM) and ComfyUI (media generation). Before video generation, Ollama models are unloaded and ComfyUI cached models are freed.

Pipeline Architecture

ComfyUI API → n8n workflow orchestration → Social Poster service → auto-post to Twitter, LinkedIn, Reddit, Dev.to

All running on Docker Swarm across 6 nodes. No cloud dependencies.