BatchFrames Join the waitlist
Prompt craft

The anatomy of a great AI image prompt for video

A vague prompt gives you a wish. A structured one gives you a shot. Here is the four-part formula we use for cinematic, consistent prompts, with examples you can copy.

Ask two people to prompt the same shot and you will usually get two completely different images. It is rarely about taste. One of them gave the model a structure to work from, and the other gave it a wish. “A man walking down a city street, cinematic” is a wish. The model fills in every missing decision for you, and it makes a different decision every time you run it.

A good prompt for video is not a clever sentence. It is a small, repeatable structure. Once you can see the structure, every shot in your script gets faster to write and far more consistent from one frame to the next. Here is the shape we use, four parts in the same order, every single time.

The four parts of a cinematic prompt: subject, camera, light, and style, stacked into one shot

1

Subject: who, and what they are doing

Start with the thing the shot is actually about, described in concrete nouns and one clear action. Not “a person.” A 40-year-old man with a short grey beard, in a navy wool coat, reading a folded newspaper. The specifics are not decoration. They are the anchor the model returns to, and they are what you will reuse to keep the same character looking the same across a hundred shots.

Two rules that matter more than they should:

2

Camera: how the shot is framed

This is the part most people skip, and it is the fastest way to make a flat image look like a frame from a film. You are telling the model where the camera sits and what lens it is using. You do not need film school for this, just a handful of terms used consistently.

“Medium close-up, 85mm lens, shallow depth of field, slight low angle” does more for the look of a shot than another paragraph of adjectives about the subject.

3

Light: the mood, in physical terms

Mood words like “dramatic” or “moody” are weak prompts because the model has to guess what you mean. Describe light the way it actually behaves instead. Where is it coming from, how hard is it, what color.

“Lit by a single desk lamp from below, hard shadows, warm tungsten color” is a mood. “Dramatic lighting” is a coin flip.

4

Style: the look that ties every shot together

Style is the layer that makes ten separate shots feel like one piece. This is your film stock, your color grade, your reference. Set it once and repeat it word for word on every shot in the video.

The trick is that the style block should be almost identical across your whole script. The subject and camera change shot to shot. The style does not.

One habit that pays off: write your subject description and your style block once, then paste them word for word into every shot. The boring repetition is exactly what keeps a character and a look consistent across hundreds of frames.

Putting it together

Here is a vague prompt and the same shot written with the structure. Same idea, very different odds of getting a usable frame.

Vague: “A detective in his office at night, cinematic, moody.”

Structured: “A 40-year-old detective with a short grey beard in a navy wool coat, sitting at a cluttered desk reading a file. Medium close-up, 85mm lens, shallow depth of field, slight low angle. Lit by a single desk lamp from below, hard shadows, warm tungsten color. Photoreal, 35mm film grain, muted teal and amber grade.”

The second one still leaves room for the model to surprise you, but every important decision is yours, and it will hold up when you run it again for the next shot in the scene.

The words that keep characters consistent

Consistency is the thing this audience asks about more than anything else, and most of it comes down to one unglamorous habit: exact-word repetition. Models latch onto specific tokens. “Navy bomber jacket” repeated identically across forty prompts holds far better than forty slightly reworded descriptions of a blue jacket. The same goes for faces, hair, and any signature prop. Pick the words once, lock them, and stop paraphrasing yourself.

This is also why writing prompts shot by shot, by hand, gets dangerous on a long script. By shot 80 your wording has drifted without you noticing, and the drift shows up as a character whose face and wardrobe slowly change. The fix is structural, not artistic: keep the subject and style blocks fixed, and only change what the shot actually requires.

There is already an automated way to do this

You just read the manual version. BatchFrames turns your whole script into structured prompts like these, keeps your characters worded identically across every shot, and routes the set straight to your image and video tools.

Where to go from here

You do not need a hundred adjectives. You need four parts, in the same order, with the subject and style locked so your shots feel like one film instead of a stack of unrelated images. Write one shot with the structure, then the next, and you will feel how much faster it gets once the decisions are made up front.

In the next post we will go deeper on character consistency specifically: reference images versus text, and how to keep one character steady across a few hundred shots without re-describing them every time.