Abstract

Automating the colorization of line art in videos is crucial for streamlining animation production workflows and reducing labor costs. However, challenges such as misalignment between character design art and line art sketches, as well as the need for temporal consistency, hinder automation efforts. Previous methods often require manually colored keyframes and dense line art guidance, increasing the artist's workload and suffering from color information leakage due to non-binarized sketch conditioning. We propose a novel all-in-one model that leverages priors from video diffusion model to automate the colorization process. Our approach introduces an explicit correspondence mechanism with an injection module to align color information from reference images to input sketches, enhancing color accuracy. A two-stage training strategy learns to interpolate between keyframes, reducing the need for sketching intermediate frames. By conditioning on binarized sketches and employing data augmentation techniques, we improve training stability. Our method demonstrates superior quantitative and qualitative results, offering an effective solution for automatic line art video colorization and advancing the efficiency of animation production.

Gallery

Reference

Lineart

Result

Flexible Usage

Same Reference with Varying Sketches

By using the same reference, our model is able to generate consistent colorizations across different video clips, even when the sketches differ significantly in terms of pose or scale.

Satoru Gojo from Jujutsu Kaisen

Yuji Itadori from Jujutsu Kaisen

Same Sketch with Different References.

When applying different reference images to the same sketch sequence, our method preserves the identity of the character while adapting the finer details, such as lighting and background, according to the distinct styles of the references.

Anya Forger from Spy x Family

Yuji Itadori from Jujutsu Kaisen

Sparse Input Sketches

Thanks to our two-stage training strategy, our method supports animation with sparse sketches. By using only the start and end sketches, the model effectively produces smooth and coherent animations.

Multiple Characters

Although our work focuses on a single reference image and does not include specific training or processing for multiple references, we found that when a reference image containing more than one characters is provided, our model can automatically distinguish between the characters based on their corresponding features and apply correct coloring to each, even when the poses, angles, or relative positions of the characters in the reference and the line art differ significantly.

Reference with Different Background

When using images with different backgrounds as references, our model can transfer the style from the reference image to generate backgrounds with different styles.

Chun from Big Fish & Begonia

Comparisons

Comparisons with previous methods

Reference	LVCD	LVCD + IP-Adapter	ID-Animator


	ToonCrafter + IP-Adapter	AniDoc (Ours)

Reference	LVCD	LVCD + IP-Adapter	ID-Animator


	ToonCrafter + IP-Adapter	AniDoc (Ours)

Limitation

Although our method can colorize multiple clips containing the same character based on a single character design sheet while maintaining good character consistency, it still has certain limitations.

First, when a line art clip contains objects that are not present in the reference, the model struggles to determine the appropriate colors for these objects. It can only infer colors based on the color information available in the reference, leading to inaccuracies in the colorization.

Second, when the clothing of a character in the line art clip differs from that in the reference image (even though it is the same character), our model can only infer reasonable colors based on the color patterns of the character's clothing in the reference image

AniDoc: Animation Creation Made Easier