SellerVisor Blog

使用 GPT Image 2 + SellerVisor 完全控制产品视频的方法

发布时间 2026/5/1 23:02:25

坦率地说。

在过去几周里，我在测试 AI 视频生成工具时烧掉了大量积分。大多数结果都不能用。

产品标签模糊不清，片剂在一帧中像豌豆大小，下一帧又变得像硬币。品牌名称被莫名其妙地改动，整段视频看起来不像同一款产品。即使有一次结果不错，那也不是技术，而是运气。

我想不只我遇到这种问题。

所以我提出了一个根本性的问题。为什么这种情况会反复发生？有没有真正的解决办法？

经过几周的实验，我找到了方法。这个方法彻底改变了我处理 AI 产品视频的方式。今天我将一步步公开这个工作流程。

这个方法为什么重要

先说清楚 AI 视频生成的根本问题。

仅靠文本提示来制作产品视频时，AI 会在每一帧都从头想象产品。因为没有视觉参考点。所以每一帧产品都会有细微不同。标签字体变化，颜色略有差异，比例也会崩坏。

这里出现了关键洞见。

不能让 AI 去想象你的产品。要把它展示给它看。

在生成视频之前，先把每个场景作为静态图像先生成怎么样？不是让 AI 在每帧重新解读，而是基于已确定的图像来制作视频。

这就是 GPT Image 2 + SellerVisor 工作流程的核心。

而且这种方法有非常现实的理由。

AI 视频生成消耗大量积分。如果对结果不满意就反复生成，积分会瞬间消耗殆尽。事先确定好分镜图像，会减少尝试生成视频的次数。因为第一或第二次尝试就能得到想要结果的概率会大幅提高。这既是质量问题，也是成本控制问题。

第1步. 在 GPT Image 2 模式下生成分镜图像

ChatGPT 的 Image 2 模式最近频繁成为话题。质量真的很好，已经可以自信使用了。直到不久前我还在用谷歌的 nanobanana，但现在使用 GPT 的 IMAGE 2 模式能生成更高质量的图像。

这里要做的一件事是：先把你想做成视频的每个场景生成成静态图像。

如果是 15 秒广告，通常由 6 个场景组成。比如下面这样。

场景 1. 吸引注意 / 提出问题 — 展示目标客户的不便

场景 2. 产品登场 — 干净的工作室风格英雄镜头

场景 3. 核心成分或工作原理 — 将产品差异可视化

场景 4. 使用方法 — 实际生活方式场景

场景 5. 信任要素 — 认证、数据、客户评价等

场景 6. 最终英雄镜头 + CTA

实际使用的提示词

我想为所附产品制作一个 15 秒的广告视频。将以竖屏 9:16 格式呈现。请基于旨在最大化销售的广告概念，生成一个高质量、专业的分镜图像，在单一图像中一致地展示所有场景，供实际广告使用。

在向 GPT Image 2 为每个场景输入提示词时，请务必遵守两点。

第一，具体说明产品规格。尽可能详细地写明容器颜色、标签文字、尺寸、材质等。例如“白色陶瓷风格圆形容器，标签上标注 Tidalove Fluoride Toothpaste Tablets Cool Mint，约高 8cm”之类。

第二，指定摄像角度和光线。像“正面英雄镜头，柔和自然光，奶油色背景”这样的拍摄方向预设，会让之后拼接成视频时更自然流畅。

如果生成的图像不满意，就当场修改。在进入视频阶段前把图像定下来是关键。重做一张图像的成本只是重做整段视频成本的几分之一。

所有场景图像完成后，它们就是你的分镜。

(观看视频链接)

https://youtube.com/shorts/JO1b9Um2Z54?si=Kg0_FvMWEaHt2OAF

第2步. 在 SellerVisor 中基于分镜制作视频

现在转到 SellerVisor 的视频制作功能。

这里能清楚看出常规 AI 视频生成与此工作流程的不同。

常规方法是只输入文本提示生成视频。AI 会从头想象一切。

这种方法不同。你要附上第1步制作的分镜图像。也上传产品参考图像。然后写一个尽可能能反映分镜的高质量提示词。我也会分享我实际写的提示词，见下方内容。

第3步. 提示词公开

这里公开我在测试中实际使用的提示词结构。你可以根据自家产品修改并直接使用。

A premium 15-second wellness supplement commercial in cinematic 9:16 vertical format.

Soft natural lighting, clean modern aesthetic, sage green and warm cream color palette.

Smooth professional camera movements only — no glitches, no distortion.

Product bottle and all on-screen text remain perfectly stable, sharp, and unaltered throughout.

0-2s (Hook / Problem):

Medium shot of a woman in her 30s sitting at a bright minimalist kitchen counter,

resting her chin on her hand with a tired, conflicted expression.

A plate of cookies, a bowl of chips, and a slice of brownie sit in front of her.

She holds a small cookie near her mouth, hesitating.

Soft morning light from the left. Subtle shallow depth of field.

Camera: slow 5% push-in, locked and steady.

2-4s (Product Reveal):

Smooth dissolve to a clean studio scene.

A dark amber glass supplement bottle labeled "BIOMA GLP-1 BOOSTER"

stands centered on a sage green cylindrical pedestal against a soft gradient sage backdrop.

A single monstera leaf is visible behind it. Two cream-colored capsules rest beside the base.

Soft rim light from behind, gentle key light from front-left.

Camera: slow 360° orbit at 30% speed around the bottle, then settles to a frontal hero angle.

The bottle, label text, and capsule shapes stay completely stable — no morphing, no warping.

4-7s (Formula / How It Works):

The bottle remains anchored on the left third of the frame, perfectly still.

On the right side, three soft circular icons gently fade in one by one in sequence:

first a probiotic icon, then a prebiotic leaf icon, then a postbiotic dot pattern icon.

Tiny floating particles of light drift slowly upward between the icons.

Camera: completely locked off, no movement. Bottle and label remain crystal sharp.

7-10s (Daily Use / Lifestyle):

Cut to a split-screen lifestyle moment.

Left half: the same woman, now smiling and refreshed, taking two cream capsules with a glass of water in soft morning kitchen light.

Right half: the same woman outdoors, eyes closed, breathing deeply against a soft-focus tropical green background, looking calm and energized.

Camera: gentle 3% push-in on both halves. Natural skin tones, warm golden hour lighting.

10-13s (Trust / Quality Proof):

Return to the studio product shot — bottle centered on the sage pedestal, completely stable.

Small minimalist trust badges fade in softly around the bottle in a balanced layout:

Made in USA, Vegetarian, Non-GMO, Stimulant-Free, 1M+ Customers, 14-Day Guarantee.

Camera: extremely slow 2% push-in. Bottle and all text remain razor sharp and unchanged.

13-15s (Final Hero / CTA):

Final hero shot. The bottle stands centered on the pedestal, surrounded by a few scattered cream capsules and a soft monstera leaf shadow.

Gentle volumetric light beams from the upper right.

Subtle floating dust particles catch the light.

Camera: ultra-slow pull-back revealing the full composition, ending on a perfectly composed beauty shot.

The bottle, label, and brand name stay completely intact and legible the entire time.

Overall mood: clean, premium, trustworthy, modern wellness.

Reference quality: high-end skincare and supplement TV commercials (Olay, Ritual, Seed).

Strict rules: no glitch effects, no morphing, no text distortion, no product transformation,

no extreme camera moves. Bottle label "BIOMA GLP-1 BOOSTER" must remain readable in every frame.

观看视频

https://youtube.com/shorts/cQVBoZynCjk?si=MM8PmRBMx2m-LMMa

结语

在测试这个工作流程时，我为 GLP-1 Booster 和 Tidalove Toothpaste Tablets 两个品牌制作了广告视频。先确认分镜图像再制作视频后，产品变形明显减少，想要的镜头更容易出现。

仅靠文本提示让 AI 生成视频就像掷骰子。先做出分镜图像再连接到视频，就像先画蓝图然后建房子。

亚马逊卖家们，积分很宝贵。你们的时间更宝贵。

就从这种方式开始吧。

Bopyo SellerVisor 联合创始人

现在就在 SellerVisor GPT Image 2 模式下创建你的第一份分镜吧。

https://sellervisor.com/?utm_source=blog&utm_medium=post&utm_campaign=storyboard_workflow&utm_content=cta_bottom

← 返回

这个方法为什么重要

第1步. 在 GPT Image 2 模式下生成分镜图像

第2步. 在 SellerVisor 中基于分镜制作视频

第3步. 提示词公开

结语

申请现场演示