ComfyUI Qwen 3 VL Create Powerful Prompts For Image And Video Generation
YouTube transcript, YouTube translate
A quick preview of the first subtitles so you know what the video covers.
Hey everyone, let's check out Quen 3VL today in Comfy UI working locally with your other diffusion models and see how you can generate something like this using the vision language model Quen 3VL. We're also comparing this model with a recent launch vision language models available in Comfy UI. Let's see what that looks like with the generated output. I've got Quen 3VL here and above it we have mini CPMV which is another vision language model that was released recently and is also available to integrate with other diffusion models in Comfy UI. Here I use the exact same text prompt instructing it to create a dialogue for the video events generating dialogue every 3 seconds about what's happening during that duration. As you can see, Quen 3VL seems to provide more information in the response text, covering the full 0 to 12 seconds here. Later in this video, I'm going to use QuenVL and integrate it with a very simple image generation workflow and a video generation workflow showing you how you can collaborate with Quenv using an input image or input video in this node. For example, I've got this clip from the singing in the rain video and I'm creating text dialogues for generating video in Comfy UI. First, we're going to do the easy way using image generation. These templates are available in the browse template section in Comfy UI. You can use the Quen image text to image template or try otherw image templates with controlNET. We'll use this default template with all the preset values. It helps us go from an empty workflow to creating something faster. Based on this text to image workflow, I added this Quenv advanced node. We don't need the predefined text prompt. We can just empty it and use Quenv as the prompt generator because this AI model receives the image or video, then follows instructions to create text prompts. And that's good enough. I'm going to connect an image using load image. This is an AI image I generated in a previous video. Based on this image, I'm going to try it out with Quenv. I'm going to use the Quenv 4B model instruct FP8 type, which should load on most local PCs. I'm setting this to 720p resolution that's common for images and also works well for AI video generation. I'm also going to add a Laura model, the Quen image lightning four-step Laura. I'm connecting this Quen image lightning fourstep Laura into the model data pipe.