"A flat image and a deep one can have the exact same content. The difference is whether the light tells you how far away everything is."
Depth comes from how light behaves across space. Layers, falloff, and atmosphere tell the eye what is near, middle, and far. For prompts, describe those depth cues directly instead of asking for a frame to feel three-dimensional.
AI builds a wall.
You want a room.
Ask a model for a scene and it tends to give you a single flat plane: subject and background sitting at the same distance, the whole frame equally lit, like a sticker pressed onto a backdrop. Real spaces don't look like that. They recede — there's a near, a middle, and a far, and your eye feels the distance between them. That feeling of three-dimensional space is depth, and like separation, it isn't something you add to the subject. It's something you build into the space with light.
The reason depth matters so much for AI video is that flatness is the single biggest giveaway of a synthetic image. A real camera in a real room records how light falls off across distance, how far objects lose contrast, how lights at different depths stack up. Models, left alone, skip all of it. The good news: depth comes from a few cues you can describe, and cinematographers lean on three of them constantly. Here they are, one frame each — though in practice they often work together; each frame below just lets one of them lead.
Layering: lights at different distances.
The most direct way to build depth is to put light at several distances and let the eye connect them. A lit thing close, another further, another far — each at its own size and brightness — and the gaps between them become space. The key is that the lights sit at different depths, so they read as a near, a middle, and a far.

Nothing here is doing anything fancy to the subject — there barely is a subject. The depth comes entirely from where the lights are. The foreground car reads as close because it's large and its taillights are bright; the streetlamps march backward, each smaller and a touch dimmer; the smokestack and the blue tower sit far off, separately lit so they don't just vanish into black. Roger Deakins built this in layers on purpose — boosting the real streetlamps so they'd carry, hiding powerful units on the far rooftops so the background had its own glow. Take away any one layer and the space gets shallower. This is the move to reach for first: don't light a subject against a backdrop, light a near, a middle, and a far.
Falloff: light fades with distance.
The second cue is what light does on its own: it gets dimmer the farther it travels. If you let that happen across a space — bright where the light is, darkening smoothly as it recedes — the gradient itself reads as depth. Where layering uses separate lights at different distances, falloff is about a single fall of light weakening across distance.

This is the opposite of a flat, evenly-lit frame. The light is strongest on the near floor and the left wall, and it falls off — continuously, not in jumps — toward the center, where it surrenders to near-black. Your eye reads that smooth darkening as receding distance, the way it reads a fading sound as moving away. The two figures are placed exactly where the gradient bottoms out, which is why they feel impossibly far inside the room. The lesson for a prompt: don't ask for the whole space to be evenly lit. Ask for light to come from one side or one source and fall off into shadow — the falloff is what carves the depth.
Atmosphere: distance you can see.
The third cue needs something in the air — fog, haze, snow, smoke, dust. Once the air isn't perfectly clear, light scatters through it, and far-off things lose contrast, go paler, and soften. The eye reads that as distance directly: the more washed-out a thing is, the farther away it must be. And this is a cue a cinematographer can either find or make.


Those two frames are the same depth cue from opposite directions: Tokyo's smog was there and the shot used it; Skyfall's haze was pumped into a room on purpose. Either way, particles in the air scatter the light, and distance becomes something you can literally see — less contrast, less color, less sharpness the farther back you look. Here's the good news, and it's the rare case where the AI creator has it easier than the cinematographer: a DP has to wait for weather or run a haze machine to get this. You just write it. Adding haze, fog, mist, atmospheric depth to a prompt is the cheapest depth you can buy.
Depth isn't in the subject, it's in how light behaves across the space — stacked in layers, falling off with distance, or scattering through the air. Real shots usually use more than one at once.
From cue to prompt.
You can't place lights at distances in a prompt, but you can describe the depth cues directly — where the layers are, how the light falls off, what's in the air. Stop describing a subject in front of a background; describe a space that recedes.
Neither prompt mentions "3D" or "depth" as an adjective — they describe the causes of depth, and let the model build the space from them. And note the cheapest move of all is in that second prompt: the word haze. It costs nothing and pushes the background back instantly.
The Core Ideas
- Depth isn't in the subject. It's in how light behaves across the space — flatness is the biggest giveaway of an AI image.
- Layering: put light at different distances — a near, a middle, a far — and the gaps between them read as space.
- Falloff: let one fall of light weaken smoothly across distance instead of lighting everything evenly. The gradient is the depth.
- Atmosphere: haze, fog, or smoke makes distance visible — far things lose contrast and color. A DP can find it or make it with a haze machine.
- The cues overlap. Real shots usually stack two or three at once; each example here just lets one lead.
- For prompts, atmosphere is nearly free. A DP waits for weather; you just write "haze" — the cheapest depth you can buy.
Study the light in every frame
you can't stop thinking about.
Save stills, break down the lighting setup, and build a private library of the frames that teach you to see.
Start Studying