Automating the colorization of line art in videos is crucial for streamlining animation production workflows and reducing labor costs. However, challenges such as misalignment between character design art and line art sketches, as well as the need for temporal consistency, hinder automation efforts. Previous methods often require manually colored keyframes and dense line art guidance, increasing the artist's workload and suffering from color information leakage due to non-binarized sketch conditioning. We propose a novel all-in-one model that leverages priors from video diffusion model to automate the colorization process. Our approach introduces an explicit correspondence mechanism with an injection module to align color information from reference images to input sketches, enhancing color accuracy. A two-stage training strategy learns to interpolate between keyframes, reducing the need for sketching intermediate frames. By conditioning on binarized sketches and employing data augmentation techniques, we improve training stability. Our method demonstrates superior quantitative and qualitative results, offering an effective solution for automatic line art video colorization and advancing the efficiency of animation production.
Reference
Lineart
Result
By using the same reference, our model is able to generate consistent colorizations across different video clips, even when the sketches differ significantly in terms of pose or scale.
When applying different reference images to the same sketch sequence, our method preserves the identity of the character while adapting the finer details, such as lighting and background, according to the distinct styles of the references.
Thanks to our two-stage training strategy, our method supports animation with sparse sketches. By using only the start and end sketches, the model effectively produces smooth and coherent animations.
Although our work focuses on a single reference image and does not include specific training or processing for multiple references, we found that when a reference image containing more than one characters is provided, our model can automatically distinguish between the characters based on their corresponding features and apply correct coloring to each, even when the poses, angles, or relative positions of the characters in the reference and the line art differ significantly.
When using images with different backgrounds as references, our model can transfer the style from the reference image to generate backgrounds with different styles.
Although our method can colorize multiple clips containing the same character based on a single character design sheet while maintaining good character consistency, it still has certain limitations.
First, when a line art clip contains objects that are not present in the reference, the model struggles to determine the appropriate colors for these objects. It can only infer colors based on the color information available in the reference, leading to inaccuracies in the colorization.
Second, when the clothing of a character in the line art clip differs from that in the reference image (even though it is the same character), our model can only infer reasonable colors based on the color patterns of the character's clothing in the reference image