Graphics Research for ReelsBuilder.ai
Basic markdown conversion of the original PDF, with images placed inline.
Technical Specification and Implementation Guide for Automated Short-Form Video Graphics Rendering
1. Introduction: The Technical Imperative of Short-Form Automation
The contemporary digital media landscape has been fundamentally reshaped by the dominance of vertical, short-form video content. Platforms such as Instagram Reels, TikTok, and YouTube Shorts have established a new visual vernacular that demands high-velocity production without sacrificing high-fidelity presentation. For automated systems like ReelsBuilder.ai, the challenge extends beyond simple video concatenation; it requires a sophisticated graphics engine capable of programmatic compositing, dynamic typography, and context-aware branding. The implementation of such a system necessitates a rigorous understanding of coordinate geometry, video signal processing, and the syntactical complexities of FFmpeg and Advanced Substation Alpha (ASS) scripting.
This report provides an exhaustive technical analysis of the graphics subsystem required for a production-grade automated video platform. It serves as a comprehensive instruction manual for configuring AI coding assistants—specifically Claude 3.5 Sonnet or Gemini 1.5 Pro—to generate robust, error-free rendering code. We will explore the precise dimensional constraints of 2025 mobile interfaces, the mathematical logic required for dynamic animations within FFmpeg’s filter complex, and the nuanced typography standards that drive viewer retention in the current algorithmic era.
The focus is on "Hormozi-style" editing—a high-retention aesthetic characterized by rapid-fire captions, dynamic progress indicators, and bold visual interruptions. Achieving this programmatically requires moving beyond basic text overlays into the realm of complex filter graphs and subtitle rendering libraries. By the conclusion of this document, the reader will possess the necessary architectural blueprints to implement a graphics engine that is not only functional but resilient to the varying edge cases of user-generated content and platform-specific UI variations.
2. Geometric Constraints and Platform-Specific Safe Zones
The foundational constraint of any automated graphics system is the "Safe Zone." This is the renderable area of the video canvas that remains unobscured by the native user interface elements of the hosting platform. While the raw video resolution is standardized at 1080x1920 pixels (a 9:16 aspect ratio), the functional canvas is significantly smaller and irregularly shaped. Failing to respect these boundaries results in critical metadata—such as captions or branding—being covered by "Like" buttons, descriptions, or platform navigation bars, which immediately degrades the perceived quality of the content.
2.1 The Vertical Canvas Standard
The industry has coalesced around the 1080x1920 pixel format.1This resolution offers a pixel density sufficient for high-definition mobile displays while maintaining a manageable bitrate for streaming. However, unlike traditional 16:9 horizontal video where the "title safe" area is a simple concentric rectangle, vertical video safe zones are defined by asymmetrical UI overlays
that vary by platform and even by user interaction (e.g., expanding descriptions).
2.2 Detailed Analysis of Platform Overlays
2.2.1 Instagram Reels
Instagram presents perhaps the most intrusive UI environment. The top of the screen is dominated by a persistent header containing the creator's profile picture and handle, while the bottom is heavily obscured by the video description, audio track information, and interactive buttons. Research indicates that the top 250 pixels are strictly reserved for the header interface.1Placing any graphical element inthis zone guarantees conflict with the username display.
The bottom region is even more restrictive. The bottom 350 pixels are covered by a gradient overlay that houses the caption and audio ticker. Furthermore, the right side of the screen features a vertical column of interaction icons (Like, Comment, Share, More). To avoid visual collision, a right-side margin of 120 pixels is recommended.4The primary "safe" area on Instagram is often cited as the central 1080 x 1350 pixels—a 4:5 aspect ratio centered within the 9:16 frame. Content within this 4:5 box is generally safe from UI obstructions and also ensures visibility if the Reel is viewed in the profile grid, which crops the preview to a square or 4:5 ratio.5
2.2.2 TikTok
TikTok's interface shares similarities with Instagram but introduces its own unique constraints. The top "Following/For You" tab navigation occupies roughly the top 130 pixels.2While smaller than Instagram's header, it is still a no-go zone for branding. The bottom area is highly variable; the caption overlay expands upward as the text length increases. A conservative estimate suggests avoiding the bottom 350 pixels entirely to account for the caption, the rotating record disc animation, and the home navigation bar.1
Crucially, TikTok's right-side interaction column is dense. The buttons for profile, like, comment, save, and share create a blind spot on the right edge. While the raw width is 1080 pixels, the functional width for text should be considered closer to 900 pixels, leaving significant padding on the right.3
2.2.3 YouTube Shorts
YouTube Shorts integrates with the broader YouTube mobile ecosystem, which brings distinct UI elements. The top 200 pixels are often obscured by the channel name and a prominent "Subscribe" button overlay.2This is a critical interactionpoint for YouTube, and obscuring it can negatively impact channel growth.
The bottom of the Shorts interface is arguably the most cluttered. It includes the video title, channel navigation, and a persistent progress bar (scrubber). Research suggests that the
bottom 350 to 480 pixels should be treated as "dead space".6Additionally, the right-side icons on Shorts function similarly to TikTok and Reels, necessitating a right-margin buffer.
2.3 The Universal Safe Zone Strategy
For an automated platform like ReelsBuilder.ai, maintaining distinct rendering pipelines for each destination platform introduces unnecessary complexity and potential points of failure. The superior architectural approach is to define a "Universal Safe Zone"—a single geometric intersection of all valid render areas across the three major platforms.
This universal area is defined by the most restrictive constraints of each platform.
-
Top Margin:Determined by Instagram's 250-pixel header.
-
Bottom Margin:Determined by the combination of YouTubeShorts' deep UI and
-
Instagram's bottom gradient, necessitating a floor at roughly pixel y=1420 (leaving the bottom ~500 pixels clear).
-
Side Margins:Determined by the right-side interactioncolumns common to all three,
requiring a right margin of roughly 120 pixels and a symmetrical left margin for aesthetic balance.
Recent composite analysis of these overlays indicates that the resulting "Universal Safe Zone" is a central rectangle of approximately900 x 1170pixels, positioned starting aty=250. This creates a "safe coordinate space" where graphical elements are guaranteed to remain unobstructed regardless of where the content is syndicated. When instructing the AI to generate layout code, the logic must strictly enforce these bounds, clamping any vertical position y such that 250 < y < 1420.
3. The Graphics Pipeline: FFmpeg Filter Complex Architecture
The implementation of these graphics relies entirely on FFmpeg, the industry-standard framework for multimedia handling. While simple operations can be performed with basic flags, the requirements of ReelsBuilder.ai—dynamic overlays, watermarking, and complex animations—necessitate the use of the filter_complex. This feature allows for the creation of a directed acyclic graph (DAG) of video filters, where multiple inputs can be processed, split, merged, and layered in a single pass.
3.1 Principles of the Filter Graph
In a standard transcoding operation, audio and video streams flow linearly from input to output. The filter_complex disrupts this linearity. It treats inputs (the main video, logo images, generated color solids) as nodes that can be injected into the graph.
The syntax relies on "link labels"—named variables enclosed in brackets (e.g., [v1])—to pass
video data between filters. A typical command structure for a graphics pipeline in ReelsBuilder.ai would resemble the following logic:
- Input 0:The main video file. 2. Input 1:The watermark image. 3. Filter Chain:
- Take Input 0 ([0:v]).
- Take Input 1 ([1:v]) and resize it (scale2ref).
- Output the resized logo as a temporary node [logo].
- Overlay [logo] onto [0:v] to create [outv]. 4. Map:Direct [outv] to the final file.
This modular approach allows for infinite extensibility. One could add a third input for a "Subscribe" animation, a fourth for a lower third, and chain them sequentially within the same complex filter string.
3.2 Dynamic Coordinate Systems and Expressions
A critical feature of FFmpeg's filter engine is its ability to evaluate mathematical expressions for positioning. Hardcoding pixel values (e.g., x=500) is brittle; it fails if the input video resolution changes (e.g., from 1080p to 720p). To build a robust engine, the AI must be instructed to use FFmpeg's internal variables:
- W and H: The width and height of the main video (background).
- w and h: The width and height of the overlay element (foreground).
- t: The timestamp in seconds.
- n: The frame number.
Using these variables allows for "responsive design" within the video. For example, centering an overlay horizontally is achieved with the expression x=(W-w)/2. This ensures the graphic is perfectly centered regardless of whether the video is 1080 pixels wide or 720 pixels wide. Similarly, positioning an element 50 pixels from the bottom edge is expressed as y=H-h-50. This dynamic logic is essential for the "Universal Safe Zone" implementation, where margins should be calculated as percentages or relative offsets rather than absolute pixels.
4. Branding Implementation: Watermarking and Logos
Watermarking is a fundamental requirement for SaaS-generated content. It serves both as a branding mechanism and, in freemium models, as a conversion driver. The technical implementation involves two primary challenges: ensuring the watermark scales proportionally to the video resolution and positioning it within the safe zone without obscuring content.
4.1 Proportional Scaling with scale2ref
The standard scale filter in FFmpeg is insufficient for automated pipelines because it operates on absolute dimensions. If a user uploads a 4K video, a 200-pixel watermark will appear microscopic. Conversely, on a 720p video, it may dominate the screen.
The scale2ref filter offers the solution.7 This filter accepts two inputs: the image to be scaled and a "reference" video. It allows the manipulation of the image's dimensions relative to the reference. The recommended syntax for the AI to generate is: [logo][video]scale2ref=h=ih0.15:w=ohmdar[logo_scaled][video_ref] In this expression:
- h=ih*0.15: Sets the watermark's height to exactly 15% of the input video's height (ih).
- w=oh*mdar: Calculates the watermark's width based on the output height (oh) multiplied
by the watermark's main display aspect ratio (mdar). This ensures the logo preserves its original aspect ratio and does not appear stretched or squashed.
4.2 Safe Zone Positioning Logic
Once the logo is scaled, it must be composited onto the video using the overlay filter. The positioning logic must incorporate the safe zone constraints discussed in Section 2. For a watermark positioned in the bottom-right corner (a standard location), the coordinates must account for the 120px right margin and the ~350px bottom margin. The FFmpeg expression for this placement is: [video_ref][logo_scaled]overlay=x=W-w-50:y=H-h-350
- W-w: Aligns the right edge of the logo with the right edge of the video.
- -50: Adds a 50-pixel buffer from the right edge.
- H-h: Aligns the bottom edge of the logo with the bottom edge of the video.
- -350: Shifts the logo up by 350 pixels to clear the bottom UI overlay.
This logic ensures that even if the watermark image size changes (due to the scale2ref operation), it remains anchored to the correct relative position.
5. Dynamic Progress Indicators: The "Hormozi" Retention Mechanic
A staple of high-retention short-form content is the visual progress bar. This element provides the viewer with a sense of completion, subtly encouraging them to watch until the end of the loop. Implementing this purely in code—without requiring the system to download and manage external progressbar.mov assets—is highly efficient and customizable.
5.1 The "Color Source" Generation Technique
FFmpeg allows for the generation of synthetic video streams using the color source filter.10 This enables the creation of a solid color bar of any dimension and color code on the fly.
To generate a red progress bar that spans the full width of a 1080p video, the command syntax is: color=c=red:s=1080x20[bar] This creates a red stream (c=red) with a size of 1080x20 pixels (s=1080x20) and labels it [bar]. 5.2 Equations of Motion: The Sliding Overlay
Animating this static bar requires determining how its position or dimensions change over time. The most performant method is the "Sliding Overlay." In this approach, the bar is generated at full width but is initially positioned "off-screen" to the left. As the video plays, the bar slides into the frame from left to right.
The equation for the x-coordinate is derived from the linear interpolation of time: x = -w + (w / Duration) * t
- At t=0: x = -w. The bar is shifted left by its entire width, making it invisible.
- At t=Duration: x = -w + w = 0. The bar is fully visible, covering the width of the screen.
- w / Duration: This represents the velocity of the bar in pixels per second.
5.3 The Two-Pass Logic Requirement
A significant technical hurdle in FFmpeg is that filter expressions often cannot access the total duration (D) of the input video dynamically within the filter graph itself.12The variable duration is not consistently exposed to the overlay filter in all FFmpeg builds. If the code attempts to use (w/duration)*t, it will likely fail or default to zero.
To robustly implement this, the AI must be instructed to perform aTwo-Pass Operation:
- Probe Pass:Execute ffprobe to extract the exact durationof the input video in seconds
(e.g., 15.5). 2. Construction Pass:Inject this retrieved value intothe FFmpeg command string as a
hardcoded constant.
- Constructed Command:overlay=x='-w+(w/15.5)*t'
This approach ensures that the animation is perfectly synchronized with the video length, regardless of the input file's duration.
6. Advanced Typography and "Hormozi-Style" Captions
The "Hormozi" caption style—characterized by bold, sans-serif fonts, rapid word-by-word animation, pop colors (yellow/green), and dynamic highlighting—is the gold standard for engagement in 2025. Implementing this requires moving beyond basic FFmpeg drawtext filters, which are too static and cumbersome for rapid word-level timing, to theAdvanced Substation Alpha (ASS)subtitle format.
6.1 Limitations of Standard Subtitles (SRT)
SubRip (SRT) is the most common subtitle format but is technically insufficient for this use case. SRT supports basic timing and rudimentary HTML-like tags (, , ) but lacks control over absolute positioning, layer stacking (Z-index), and precise animation.13It relies heavily on the video player's default rendering settings, which varies wildly between devices.
For "burned-in" captions that are part of the video aesthetic, absolute consistency is required.
6.2 The Power of Advanced Substation Alpha (ASS)
The ASS format provides the granular control necessary for high-end motion graphics. It allows for the definition of "Styles" (presets for font, border, shadow, alignment) and "Events" (individual lines of text). Crucially, it supports a rich tag set for overriding these styles on a per-character basis. Key capabilities include:
- Absolute Positioning (\pos(x,y)):Placing text exactlyat specific pixel coordinates.
- Karaoke Tags (\k):The engine for word-level animation.
- Drawing Mode:The ability to draw vector shapes directlyin the subtitle track.
6.3 Implementing the Karaoke Highlight Effect
The core of the "Hormozi" style is the active word highlight, where the spoken word changes color (e.g., to yellow) while the rest of the sentence remains white. This is achieved using the Karaoke tag family.15
- \k
: This tag marks a syllable or word with a specific duration in centiseconds. - Effect Logic:In standard karaoke mode, the text isfilled with a "Secondary Color"
initially. As the timer for each \k tag elapses, the text fills with the "Primary Color."
- \k: Immediate switch from Secondary to Primary color (Step fill).
- \kf or \K: Smooth gradient sweep from left to right (Sweep fill).
The "Hormozi" Configuration: To achieve the distinct "pop" effect:
- Style Definition:Define a style where the PrimaryColor is Yellow (&H00FFFF - BGR
format) and the Secondary Color is White (&HFFFFFF). 2. Tagging:Prepend each word with \k and its duration.
- Example Line:{\k10}This {\k20}is {\k15}automated.
- Result:"This" turns yellow instantly. 10cs later,"is" turns yellow. 20cs later,
"automated" turns yellow.
Advanced "Current Word Only" Highlighting: A common variation requires only the current word to be highlighted, with previous words reverting to white. The standard \k tag does not support this "reset" behavior; it keeps the text filled. To implement the "Current Word Only" style, the automation script must explicitly set colors using the \1c (Primary Color) tag for every word block.17
- Script Logic:{\c&H_WHITE_}Previous {\c&H_YELLOW_}Current{\c&H_WHITE_}Future
- This approach is more verbose but offers total control, allowing for effects where the
current word is yellow, previous words are gray, and future words are white.
6.4 Automated Pipeline: Whisper to ASS
The workflow for ReelsBuilder.ai to generate these captions automatically involves a multi-step transformation pipeline:
1. Transcription:Utilize OpenAI's Whisper model withword-level timestamps enabled. The
output provides a JSON array containing start, end, and word strings for every element.18
- Segmentation Logic:A Python or Node.js script mustparse this JSON. It groups words
into "Lines" based on constraints:
- Maximum Duration:e.g., 3 seconds per line.
- Maximum Length:e.g., 20 characters per line. "Hormozi"style favors short, punchy
lines (2-4 words) centered in the video. 3. Tag Injection:The script calculates the durationof each word in centiseconds (Duration
= End Time - Start Time) and injects the corresponding \k tags between words. 4. Header Generation:The script writes the standardASS header, defining the "Hormozi"
style (Font: TheBoldFont, Border: 4px Black, Shadow: 0, Alignment: 5). 5. Rendering:The final .ass file is passed to FFmpeg'sass filter.
- Command:ffmpeg -i input.mp4 -vf "ass=subtitles.ass"output.mp4
Critical Dependency:FFmpeg must be compiled with--enable-libass.13While standard in many builds, any custom Docker container used by ReelsBuilder.ai must verify this library is present to render the complex formatting correctly.
7. Lower Thirds and UI Graphics
Lower thirds—graphic overlays that display names, titles, or calls to action—add a layer of professional polish that distinguishes high-quality content. In a broadcast environment, these are often heavy video files with alpha channels (ProRes 4444). For a cloud-based automated builder, bandwidth and rendering speed are paramount, necessitating lightweight, procedural solutions.
7.1 Pure FFmpeg "Drawbox" Graphics
The most efficient method for creating simple background plates for text is the drawbox filter. This allows the system to draw geometric shapes directly onto the video frames without loading external image assets.
- Command:drawbox=x=0:y=1400:w=1080:h=200:[email protected]:t=fill
- This draws a semi-transparent black box (@0.5 opacity) at the bottom of the safe zone,
providing a high-contrast background for white text.
7.2 Rounded Corners: The geq vs. Masking Debate
Modern UI design trends (iOS, Instagram) favor rounded corners over sharp rectangles. Implementing this in FFmpeg presents a trade-off between CPU cycles and complexity.
- The geq (Generic Equation) Filter:This filter allows for pixel-level manipulation using
mathematical formulas. It can create a rounded rectangle by calculating the distance of every pixel from a center point.20
-
Pros:No external assets required.
-
Cons:Extremely CPU intensive. It evaluates a complexdistance formula for every
-
single pixel in the bounding box for every frame. For 1080p video, this can significantly slow down rendering.
-
The Image Mask Method (Recommended):A more performantapproach is to generate
a single, small PNG asset of a rounded rectangle (e.g., a white rounded box with a transparent center). The system uses scale2ref to stretch this PNG to the desired size of the lower third. This relies on simple image scaling rather than per-pixel mathematical evaluation, resulting in faster render times.22
7.3 Procedural Animation
Animating these elements (e.g., making the lower third slide up from the bottom) is achieved using the same expression logic as the progress bar.
- Ease-Out Animation:To achieve a professional "slide,"linear motion is often insufficient.
An "Ease-Out" effect (where the object moves fast and slows down as it settles) can be simulated using an exponential decay formula in the y coordinate expression.
- Target Y:1400
- Start Y:1920 (Off screen)
- Expression:y=1400+(1920-1400)exp(-5t)
- This formula calculates a position that starts at 1920 and exponentially approaches
1400 as t increases, creating a smooth, decelerating entry animation.
8. Subscribe Animations and Chroma Keying
Calls to Action (CTAs) like "Subscribe" or "Like" buttons are critical for conversion. These are often complex animations (bells ringing, thumbs clicking) that cannot be drawn procedurally. They are typically imported as green-screen video assets.
8.1 Chroma Key Implementation
To overlay a green-screen asset, ReelsBuilder.ai must use the chromakey filter.
-
Command Structure:
-
[cta][base]chromakey=0x00FF00:0.1:0.2[transparent_cta];[base][transparent_cta]overl ay
-
0x00FF00:The hex code for pure green.
-
0.1 (Similarity):Determines how close a color mustbe to green to be keyed out. A low
value (0.01) matches only exact green; a high value (0.3) might remove parts of the button itself.
- 0.2 (Blend):Smooths the edges of the key to prevent jagged artifacts.
8.2 Subscribe Button "Pop" Animation
If a static image is used for the subscribe button, a "pop" animation (scaling up and down) can be simulated using the zoompan filter or dynamic scale expressions. However, for a simple "scale up on entry" effect, the scale filter combined with the enable timeline editing is efficient.
- Expression:scale=w='if(lt(t,0.5),iwt2,iw)':h=-1
- This scales the image from 0 to full width (iw) over the first 0.5 seconds (t*2 reaches 1 at
t=0.5), effectively creating a "grow in" animation.
9. Performance Optimization and Instruction Sets
For a SaaS platform, the efficiency of the rendering pipeline directly correlates with infrastructure costs. Inefficient filter graphs can lead to excessive memory copying and slow encode times.
9.1 Filter Complex Optimization
Chaining multiple distinct filters (e.g., scale -> overlay -> drawtext -> overlay) creates a long pipeline where data is copied between buffers at each step.
-
Consolidation:Whenever possible, consolidate processing.For example, if scaling and
-
cropping are required, do them in a single chain before the overlay.
-
Pre-rendering:Assets that do not change (like thestatic geometry of a "Subscribe"
button background) should be pre-rendered as high-efficiency WebM files with alpha channels rather than being generated geometrically every time.
9.2 Font Management Instructions
FFmpeg relies on access to font files (.ttf or .otf) to render text. A common failure mode in containerized environments (Docker) is the system failing to locate fonts referenced by name (e.g., "Arial").
- Instruction to AI:The AI must be explicitly instructedto use absolute file paths for fonts
in the generated code.
- Bad:fontfile=TheBoldFont.ttf
- Good:fontfile=/app/assets/fonts/TheBoldFont.ttf
- This removes ambiguity and prevents "Font not found" errors that abort the render
process.
10. Conclusion and AI Prompt Strategy
The construction of the ReelsBuilder.ai graphics engine is a synthesis of rigid geometric
constraints and flexible programmatic rendering. By strictly enforcing theUniversal Safe Zone(Top: 250px, Bottom: 500px), the system ensurescontent validity across all platforms. The use ofFFmpeg's filter complexenables resolution-independentbranding and procedural animations, whileLibass/ASSprovides thetypographic precision required for "Hormozi-style" captions.
To successfully implement this via AI coding assistants, the user must provide structured, logic-heavy prompts. The prompts should not merely ask for "code for a watermark," but rather provide the architectural constraints: "Generate an FFmpeg filter graph that overlays a watermark. Use scale2ref to resize it to 15% of the video height. Calculate the position using (W-w)/2 for centering, but ensure the y coordinate does not exceed H-350 to respect the safe zone."
By embedding these specific technical requirements into the AI's context window, ReelsBuilder.ai can achieve a level of automated production quality that rivals manual professional editing.
Works cited
- Safe Zones for TikTok, Instagram & Facebook Stories 2025 - UGC Factory,
accessed January 6, 2026, https://www.ugcfactory.io/blog/the-ultimate-guide-to-safe-zones-for-tiktok-fac ebook-and-instagram-stories-reels-2025 2. Understanding the Safe Zone in TikTok, Instagram, and YouTube Videos - Ramdam
for Creators, accessed January 6, 2026, https://creators.ramd.am/creator-school/understanding-the-safe-zone 3. Stay Within Safe Zones – TikTok, Facebook and Instagram Reels/Stories, accessed
January 6, 2026, https://houseofmarketers.com/guide-to-safe-zones-tiktok-facebook-instagramstories-reels/ 4. Instagram Safe Zone Explained: Dimensions, Best Practices & Tips - Outfy,
accessed January 6, 2026,https://www.outfy.com/blog/instagram-safe-zone/ 5. Instagram Reel Size Guide: Ratios, Specs, and More - LitCommerce, accessed
January 6, 2026,https://litcommerce.com/blog/instagram-reel-size/ 6. Safe Zones… Guides, Free Overlays for Reels, TikTok, and Shorts - 2025 - Orson
Lord, accessed January 6, 2026, https://orsonlord.com/articles/free-safe-zone-overlays-for-reels-tiktok-and-shor ts 7. Scaling – FFmpeg, accessed January 6, 2026,https://trac.ffmpeg.org/wiki/Scaling 8. ffmpeg - Scaling a logo to be consistent between video sizes - Super User,
accessed January 6, 2026, https://superuser.com/questions/1745388/scaling-a-logo-to-be-consistent-betwe en-video-sizes 9. How can I integrate 'scale2ref' to scale a watermark while preserving its display
aspect ratio, into this line of code - Super User, accessed January 6, 2026,
https://superuser.com/questions/1438160/how-can-i-integrate-scale2ref-to-scale -a-watermark-while-preserving-its-displa 10.Using -filter_complex after concatenating images with -f concat : r/ffmpeg -
Reddit, accessed January 6, 2026, https://www.reddit.com/r/ffmpeg/comments/1lltbk5/using_filter_complex_after_co ncatenating_images/ 11.Showing in-video visual progress bar with FFMPEG? - Stack Overflow, accessed
January 6, 2026, https://stackoverflow.com/questions/62989964/showing-in-video-visual-progress -bar-with-ffmpeg 12.Which FFmpeg variable gives the total duration of a video?, accessed January 6,
2026, https://video.stackexchange.com/questions/35062/which-ffmpeg-variable-givesthe-total-duration-of-a-video 13.How to Add Subtitles to a Video File Using FFmpeg - Bannerbear, accessed
January 6, 2026, https://www.bannerbear.com/blog/how-to-add-subtitles-to-a-video-file-using-ff mpeg/ 14.Set ASS subtitle color with ffmpeg? - Reddit, accessed January 6, 2026,
https://www.reddit.com/r/ffmpeg/comments/q1yv89/set_ass_subtitle_color_with_f fmpeg/ 15.ASS Override Tags - Aegisub, accessed January 6, 2026,
https://aegisub.org/docs/latest/ass_tags/ 16.ASS Tags - Aegisub 手册, accessed January 6, 2026,
https://aegi.vmoe.info/docs/3.1/ASS_Tags/?ref=deadsuperhero.com 17.How to make words in ASS subtitles appear one at a time? - Stack Overflow,
accessed January 6, 2026, https://stackoverflow.com/questions/60608765/how-to-make-words-in-ass-subti tles-appear-one-at-a-time 18.Subtitle elements - JSON2Video, accessed January 6, 2026,
https://json2video.com/docs/tutorial/subtitle-elements/ 19.[Solved]How to enable libass to burn ass subtitles in videos?? / Multimedia and
Games / Arch Linux Forums, accessed January 6, 2026, https://bbs.archlinux.org/viewtopic.php?id=228878 20.Give a video rounded transparent edges so that it can be overlayed on another
video using FFMPEG - Stack Overflow, accessed January 6, 2026, https://stackoverflow.com/questions/32859841/give-a-video-rounded-transparen t-edges-so-that-it-can-be-overlayed-on-another-vi 21.How to draw text on a rectangle with rounded corners using ffmpeg? - Stack
Overflow, accessed January 6, 2026, https://stackoverflow.com/questions/75598230/how-to-draw-text-on-a-rectangl e-with-rounded-corners-using-ffmpeg 22.How to draw a round rectangle on the video with FFmpeg? - Super User,
accessed January 6, 2026, https://superuser.com/questions/1504881/how-to-draw-a-round-rectangle-on-th
e-video-with-ffmpeg
This is a comprehensive technical specification designed to be fed directly into an AI coding agent (Claude 3.5 Sonnet, Gemini 1.5 Pro, or GPT-4o). It provides the exact constraints, mathematical formulas, and library choices necessary to generate broadcast-grade graphics for 1080x1920 vertical video.
Technical Directive: Broadcast Graphics Engine for ReelsBuilder.ai
Role: Senior Video Engineer / Graphics Programmer
Objective: Implement a "Pixel-Perfect" automated graphics pipeline using FFmpeg and Python.
Target Output: 1080x1920 (9:16) Vertical Video, 30/60 FPS.
Aesthetic Standard: High-retention social media (Hormozi/TikTok) mixed with premium broadcast authority (CNN/Bloomberg).
1. The Canvas: "Universal Safe Zone" Implementation
Constraint:All generated graphicsMUSTreside withinthe Universal Safe Zone to prevent UI occlusion on Instagram, TikTok, and YouTube Shorts.
-
Canvas Size:$W=1080$, $H=1920$
-
Safe Zone Logic:
-
Top Margin:$250px$ (Avoids Profile/Search UI)
-
Bottom Margin:$450px$ (Avoids Captions/Scrubbers)
-
Side Margins:$60px$ (Avoids Interaction Buttons)
-
Renderable Area:A central box defined as $x \in $,$y \in $.
-
Implementation Directive:Define these constants globally.Any coordinate
calculation for overlays ($x, y$) must pass through a clamping function:
- Python
defget_safe_y(y_desired):
returnmax(250,min(y_desired,1470))
● ●
2. Component A: "Hormozi-Style" Dynamic Captions
Requirement: High-impact, word-by-word animated subtitles.
Technology: Libass (ASS/SSA format). Do not use FFmpeg drawtext for main captions (insufficient animation control).
Specification
-
Font:Montserrat ExtraBold or TheBoldFont.
-
Primary Color:White (&HFFFFFF).
-
Highlight Color:Electric Yellow (&H00FFFFin BGRhex) or Green (&H00FF00).
-
Outline:3px Black Border (\bord3) + 2px Drop Shadow(\shad2).
-
Animation Logic (The "Pop" Effect):
-
Use\ktags for timing.
-
Active Word Transform:Scale the active word up to115% instantly and
-
change color.
-
ASS Code Pattern:
-
Code snippet
{\an5\pos(540,960)\bord3\shad2\fs90} {\c&HFFFFFF&}This {\t(0,200,\fscx115\fscy115\c&H00FFFF&)\t(200,400,\fscx100\fscy100\c&HFFFFFF&)}IS {\c&HFFFFFF&} {\t(0,200,\fscx115\fscy115\c&H00FFFF&)\t(200,400,\fscx100\fscy100\c&HFFFFFF&)}AUTOMATED
- Positioning:Center screen (\an5), anchored at $y=960$or $y=800$.
3. Component B: The "Bloomberg" Rolling Ticker
Requirement: Smooth, 60fps scrolling stock/news ticker at the bottom of the screen.
Technology: Python (Pillow) for asset generation + FFmpeg overlay.
Specification
1. Strip Generation (Python):
-
Do not render text directly in FFmpeg. Generate a single wide PNG image
-
(e.g., $5000 \times 80$ px).
-
Background:Dark Navy Blue Gradient (Left:#001133,Right:#002244).
-
Content:Rich text support (White Symbol, Green Price,Red
-
Down-Arrow).
-
Output:ticker_strip.png. 2. Compositing (FFmpeg):
-
Loop:The strip must loop seamlessly.
-
Scroll Speed:150 pixels/second.
-
Math Formula:
-
$$x = W - \text{mod}(t \times 150, \text{strip_w} + W)$$
-
Filter Command:
-
Bash
[0:v][ticker]overlay=x='W-mod(t*150, overlay_w+W)':y=H-120
-
- Glass Backdrop:
-
Behind the ticker, apply a "frosted glass" effect to the video.
-
Chain:crop=1080:120:0:1800,boxblur=20:2then overlaythe PNG strip
on top.
4. Component C: "Breaking News" Lower Third
Requirement: Animated entrance, "glass" background, pulsing "Live" indicator.
Technology: FFmpeg filter_complex.
Specification
-
Geometry:Rounded Rectangle, $w=900, h=150$.
-
Backdrop:
-
Generate [email protected]:s=900x150source.
-
Rounded Corners:Use a mask image orgeqfilter (ifGPU allows) to
-
round edges.Preferred: Use a pre-made white roundedrectangle PNG as a mask.
-
Animation (Ease-Out Slide):
-
Slide from Left (offscreen) to Center ($x=90$).
-
Equation:
-
$$x = -900 + (900 + 90) * \min(1, \sin(t \cdot \frac{\pi}{2}))$$
-
This creates a smooth, decelerating entry over 1 second.
-
"LIVE" Pulse:
-
Red circle icon ($r=10$).
-
Opacity animation:alpha='0.5+0.5sin(2PI*t)'.
5. Component D: The "Netflix" Progress Bar
Requirement: A retention-driving progress indicator attached to the bottom frame.
Technology: FFmpeg drawbox or overlay with color source.
Specification
-
Color:#E50914(Netflix Red).
-
Height:$10px$.
-
Position:Bottom of the video ($y=1910$).
-
Logic:Width increases linearly with time.
-
Formula:
-
Must extract Video Duration ($D$) first viaffprobe.
-
Width Expression:$w = \frac{W \times t}{D}$
-
FFmpeg Command:
-
Bash
color=c=#E50914:s=1080x10[bar]; [0:v][bar]overlay=x=0:y=1910:shortest=1:enable='between(t,0,D)' [video_out]crop=w=iw*(t/D):h=10:x=0:y=0 # Note: Using the crop method on a full-width color source is more performant than scaling.
● ●
6. Implementation Workflow for Coding Agent
Step 1: Asset Generation (Python)
-
UsePIL.ImageDrawto create:
-
ticker_strip.png(Rich text strip).
-
mask_rounded_rect.png(Alpha mask for lower third).
Step 2: Subtitle Generation (Python)
- Parse Whisper timestamp JSON.
- Write.assfile withScript Infoheader settingPlayResX=1080,PlayResY=1920.
- Inject\ktags and\ttransforms for the "Hormozi"pop effect.
Step 3: The Filter Graph (FFmpeg Construction)
- Construct a singlefilter_complexchain:
- Input 0:Main Video. 2. Input 1:Ticker Strip (Loop). 3. Input 2:Lower Third Backplate. 4. Chain:
■ [0:v]boxblur(Region specific) ->[glass_bg] ■ [glass_bg][2:v]overlay (Slide-in math) ->[bg_with_l3] ■ [bg_with_l3]ass=subtitles.ass->[text_layer] ■ [text_layer][1:v]overlay (Scroll math) ->[final_comp] ■ [final_comp]drawbox/overlay (Progress bar) ->[output]
Instruction to Agent:"Generate Python code usingffmpeg-pythonor raw subprocesscalls. Prioritizelibassfor text renderingto ensure correct kerning and vector scaling. Use specific hex codes provided. Ensure all math expressions in FFmpeg are escaped correctly for the shell environment."
Ready to Create Viral AI Videos?
Join thousands of successful creators and brands using ReelsBuilder to automate their social media growth.
Thanks for reading!