Auto-detect hardcoded subtitles, watermarks, logos, and timestamps. Generate masks and inpaint — no manual work needed. Multilingual out of the box.
A complete pipeline from frame sampling to mask generation to background restoration. Built for production use.
DBNet-based text detector samples frames across the video, finds text regions, and clusters them by position. No manual mask drawing. Works with Chinese, English, Korean, Burmese, and more.
Default STTN backend runs on CPU. Swap in ProPainter, LaMa, or any external model via --external-command. Same mask pipeline, better quality when you have GPU.
Tell it what to remove in plain English: --intent "remove bottom Chinese subtitles". Optional OCR reads text content. LLM-backed selection via --agent.
The full pipeline runs in one command. Here's what happens under the hood.
Sample frames across the video. Run DBNet text detection on each. Cluster results by spatial position and select the best preview frame.
Classify regions as subtitle, watermark, logo, or timestamp. Optional OCR reads the text. Intent parser maps your instructions to specific targets.
Generate masks from selected regions. Fill in using temporal information from neighboring frames via STTN or your preferred external model.
Two ways to use VideoWipe — pick whichever fits your workflow.
Install with pip, run one command. Auto-detect handles the rest. Works on CPU, scales to GPU.
If VideoWipe saves you time on subtitle, watermark, or text-overlay cleanup, support helps keep model packaging, Docker images, detection tuning, and documentation maintained.