LTX 2.3 Guide, Features, Low VRAM Options and Setup Tips
What Is LTX 2.3
LTX 2.3 is the newest release in the LTX model family from Lightricks, a production-grade AI foundation model built for AI video generation, synchronized video and audio, open weights, and practical production-ready workflows. The model is positioned as more than a research demo. It is designed for creators, developers, and teams that want an open system they can test locally, integrate through an API, or use inside creative tools such as ComfyUI and LTX Studio.
What makes LTX 2.3 important is the combination of visual quality, control, and production-quality output. Official materials describe sharper detail, cleaner audio, stronger motion, and much better prompt adherence. In real terms, that means the model is trying to solve the problems users care about most, soft visuals, unstable movement, ignored prompts, weak vertical video, and unusable sound.
Low VRAM?
No Problem.
Don’t let hardware limitations slow down your AI generation. Rent top-tier GPUs on demand for the absolute lowest prices on the market.
Rent a GPU NowWhy LTX 2.3 Matters Right Now
The linked tutorial arrives at the right moment because LTX 2.3 is one of the most meaningful AI video upgrades of 2026 so far. The release improves the full model generation pipeline, not just one benchmark, which is why many users now see it as a more serious generation model for real projects. LTX says it rebuilt the latent space with an updated VAE for finer detail, introduced a 4x larger text connector for better understanding of complex prompts, improved image-to-video motion, upgraded audio generation, and added native portrait video support for vertical content.
That last point matters more than it may seem. A huge amount of modern video is created for mobile-first platforms such as Instagram Reels, TikTok, and YouTube Shorts. Native 9:16 generation, including native portrait video workflows such as 1080×1920, means creators are no longer forced to rely on awkward crops from landscape output. For social media teams, short-form editors, and ad creators, that is a direct workflow advantage.
The Biggest Upgrades in LTX 2.3
The first major upgrade is detail quality. LTX 2.3 uses a rebuilt latent space and updated VAE, which the company says preserves fine textures, cleaner edges, hair detail, and even small text more effectively throughout generation. The second is prompt adherence. Official release material says the text connector is four times larger, allowing the model to follow complex multi-subject instructions with better visual consistency across spatial relationships, motion, and style.
The third improvement is image-to-video behavior. LTX specifically claims fewer frozen frames, less fake zoom motion, and stronger consistency with the source image. The fourth is audio quality, where filtered training data and a new vocoder are meant to reduce artifacts and tighten the match between visual events and generated sound. Put together, these changes make LTX 2.3 feel like a more usable engine for video with synchronized audio, not just a prettier one.
Core LTX 2.3 Workflows
For most people, LTX 2.3 begins with two core use cases, text-to-video and image-to-video. Those are also the focus of the linked tutorial. Text-to-video is best for ideation, concept shots, stylized scenes, ad experiments, and cinematic moodboards. You describe the subject, action, environment, camera, and style, then let the model build the clip from scratch.
Image-to-video is a different strength. It starts from a reference image and animates it according to your prompt. This is often the better choice when you already have concept art, a product image, a storyboard frame, or a character portrait that needs motion. Because LTX 2.3 specifically improves motion consistency from the input frame, this mode is one of the biggest reasons people are paying attention to the release.
Text-to-Video
Text-to-video works best when prompts are specific. Instead of a generic line such as “a man walking in a city,” users get better results by describing wardrobe, lighting, camera angle, lens feel, location, time of day, and motion. LTX 2.3 is built to reward that extra detail.
Image-to-Video
Image-to-video works best when the starting image is clear and compositionally strong. The prompt should explain what is supposed to move, what should remain stable, and how the camera should behave. That is the best way to avoid random motion that breaks the original frame.
Example Workflow
Please note that this is an example workflow created by RuneXX on hugging face
Here are the list of files that you will need to download in order for this work (note these are FP8 Diffusion models, you need to download GGUF models if you are low on VRAM:
LTX 2.3 Model Files and Correct ComfyUI Folders
Below are the required LTX 2.3 model files, their correct ComfyUI folder locations, and direct download links.
| Category | Folder Name | Model Name | Download |
|---|---|---|---|
| Diffusion Model | diffusion_models | ltx-2.3-22b-distilled_transformer_only_fp8_scaled.safetensors | Download |
| LoRA | loras | ltx-2.3-22b-distilled-lora-dynamic_fro09_avg_rank_105_bf16.safetensors | Download |
| Text Encoder | text_encoders | ltx-2.3_text_projection_bf16.safetensors | Download |
| Text Encoder | text_encoders | gemma_3_12B_it_fpmixed.safetensors | Download |
| VAE | vae | LTX23_audio_vae_bf16.safetensors | Download |
| VAE | vae | LTX23_video_vae_bf16.safetensors | Download |
| TinyVAE | vae | taeltx2_3.safetensors | Download |
| Upscaler | latent_upscale_models | ltx-2.3-spatial-upscaler-x2-1.0.safetensors | Download |
Fast, Pro, Dev, and Distilled, Which Version Should You Use
LTX 2.3 is not a single one-size-fits-all model. In the API, it comes in Fast and Pro variants. Fast is optimized for speed, quick brainstorming, and lower-cost iteration. Pro is intended for higher fidelity and more stable final output, and it is required for some advanced endpoints such as audio-to-video, retake, and extend. For many teams, the smart workflow is to explore with Fast and then switch to Pro for the final render.
On the open model side, the Hugging Face model card lists a 22B dev checkpoint, a 22B distilled checkpoint, a distilled LoRA, and separate spatial and temporal upscalers. The distilled version is aimed at faster use, while the dev version is closer to the full model checkpoint for users who want maximum flexibility. This gives advanced local users more flexibility depending on whether they care more about speed, quality, or experimentation.
ComfyUI, Open Weights, and Local Use
One reason LTX 2.3, often referred to in the ecosystem as ltx-video, has spread quickly is that it is not trapped inside a closed web interface. LTX publishes open weights, code, and documentation, and there is official support for ComfyUI. The ComfyUI-LTXVideo repository includes example workflows for text-to-video, image-to-video, distilled single-stage pipelines, two-stage upsampling workflows, and additional control setups using LoRAs.
That makes the linked video useful as an entry point. Users can start from a working graph rather than inventing a workflow from nothing. The same repository also explains which checkpoint files, upscalers, and text encoder files are needed. For people who want open infrastructure, this matters almost as much as image quality.
System Requirements and Hardware Reality
LTX 2.3 is powerful, but it is not lightweight. Official documentation lists an NVIDIA GPU with at least 32 GB of VRAM, 32 GB of system RAM, 100 GB of storage, CUDA 11.8 or higher, and Python 3.10 or higher as the minimum for the open-source route. Recommended setups are much stronger, including A100 or H100 class GPUs, 64 GB or more of RAM, and larger SSD storage.
That means local use is realistic for serious enthusiasts, workstation users, and cloud GPU renters, but not for every casual creator. The good news is that LTX also supports managed usage through the API and LTX Studio. There are also low-VRAM loader notes in the official ComfyUI repository that help some workflows fit into 32 GB cards more reliably.
Resolution, Duration, and Format Support
LTX 2.3 supports serious output formats rather than only small preview clips. According to the official model support documentation, the API offers portrait and landscape generation up to 4K, cinematic frame rates including up to 50 fps in supported modes, and first-to-last frame control for image-to-video. Fast and Pro differ in which combinations of resolution, frame rate, and duration they support, but the overall message is clear, this model is built for production-minded output.
Fast is the better option for rapid exploration, and the documentation notes support for longer quick-iteration runs, including up to 20 seconds at 1080p in certain settings. Pro focuses on high-fidelity video and more polished results. For creators working on ads, short films, product visuals, mobile campaigns, or social clips, native support for both 16:9 and 9:16 is one of the most practical strengths in the release.
Prompting Best Practices for LTX 2.3
Because prompt adherence is one of the headline upgrades, users should write prompts like directors, not keyword stuffers. A strong prompt should define the main subject, action, environment, camera movement, lighting, mood, and style. If the shot is vertical, say so. If you want a tracking shot, say so. If the scene should feel cinematic, documentary, dreamy, or commercial, make that clear.
For image-to-video, prompts should also protect the source image. Tell the model what should stay fixed and what should move. For text-to-video, think in layers, subject first, then action, then camera, then mood. LTX 2.3 is better at understanding complex prompts, but that only helps when the prompt itself is precise.
Where LTX 2.3 Fits in the Market
LTX 2.3 stands out because it combines open access with a real production story. It can serve solo creators making social content, studios building storyboards, agencies testing ad concepts, and developers embedding a single model video generation system into products. LTX also frames the model as the engine behind LTX Desktop, which shows that the company is thinking about editing workflows, not only raw generation.
That mix of openness, quality, and deployment flexibility is what gives the model weight in today’s market. Many tools can make short AI clips, but fewer can deliver high-fidelity video with synchronized video and audio inside one open ecosystem. Fewer can offer open weights, local execution, ComfyUI integration, vertical support, synchronized audio-video design, and serious API options in one ecosystem.
LTX 2.3 Low VRAM Guide, What “Low VRAM” Really Means
LTX 2.3 is powerful, but it is still a very large 22B model. In the official ComfyUI-LTXVideo repo, Lightricks says low VRAM mode is about making generation fit into 32 GB VRAM by using special loader nodes from low_vram_loaders.py and by reserving VRAM with the --reserve-vram startup flag in ComfyUI. In other words, low VRAM for LTX 2.3 does not mean 8 GB or 12 GB in the official workflow, it means optimizing a heavy model so it can run more safely on 32 GB cards instead of needing even more headroom.
For readers, this is an important expectation-setting point. Many people hear “low VRAM” and assume LTX 2.3 will run like a lightweight image model on mainstream gaming GPUs. That is not what the official documentation promises. The official low VRAM path is mainly about aggressive offloading, correct execution order, and memory reservation, not magic compression.
The most practical advice is to separate “official low VRAM” from “community low VRAM.” Official low VRAM means using Lightricks’ own ComfyUI nodes, offloading, and 32 GB class hardware. Community low VRAM means using quantized community releases such as FP8 or GGUF, where users trade some flexibility, and sometimes compatibility, for lower memory use.
LTX 2.3 GGUF Versions Explained
One of the biggest points of interest around LTX 2.3 right now is GGUF. The official Lightricks base model page lists the main 22B dev and distilled checkpoints, plus upscalers and LoRAs, but GGUF versions are currently being distributed by community publishers such as Unsloth and QuantStack rather than being presented as the main official Lightricks checkpoint format. That means GGUF is real and already being used, but readers should understand it is part of the broader ecosystem, not the default official install path from Lightricks.
The appeal of GGUF is simple, lower memory usage. On the Unsloth LTX-2.3-GGUF page, the quantized files range from about
while full F16 and BF16 files are listed at about 42 GB.
QuantStack’s GGUF page also shows the same trend, though its published sizes are somewhat larger for equivalent quant levels, for example
That size gap is exactly why GGUF is getting attention. It gives users a path to try LTX 2.3 on more modest hardware, or at least reduce memory pressure on stronger GPUs. But the tradeoff is that GGUF workflows are more fragile right now. They often depend on ComfyUI-GGUF, KJNodes, and matching embeddings and VAE files, so installation complexity is higher than with standard safetensors checkpoints.
A useful way to explain GGUF in your article is this:
GGUF is the “make it fit” option, not always the “cleanest install” option. It is attractive for low VRAM users, but it is also where people are currently seeing more size mismatch errors and workflow confusion.
FP8 vs FP16 in LTX 2.3
Lightricks now has an official FP8 model page for LTX 2.3. The model card says this is the FP8 version of the model, derived from the base model, and lists ltx-2.3-22b-dev-fp8 as the available checkpoint, while ltx-2.3-22b-distilled-fp8 is marked as coming soon on that page. It also notes that training is still recommended on the BF16 model, with FP8 training recipes welcomed as community contributions.
In the official LTX-2 code repo, Lightricks specifically recommends FP8 quantization as an optimization tip for lower memory footprint. The repo also says users can enable FP8 with –quantization fp8-cast, and on Hopper GPUs with TensorRT-LLM they can use –quantization fp8-scaled-mm for FP8 scaled matrix multiplication.
Advanced users also compare FP8 and FP16 versions when choosing the right setup. FP8 can reduce memory pressure, while FP16 is often preferred when users want the full model experience with fewer compatibility questions. Under the hood, the ltx-2 model follows a diffusion transformer, or DiT, design implemented in PyTorch, which helps explain why it is discussed as a serious generation model rather than a simple demo.
For article readers, the simplest comparison is this:
FP16 or BF16
Best for maximum compatibility, training, and the least amount of ecosystem weirdness. It is heavier on VRAM, but usually easier to reason about. The official base LTX 2.3 checkpoints are presented this way.
FP8
Best for reducing memory footprint while still staying closer to the official ecosystem than GGUF. It is a strong middle ground for users who want something lighter than full precision without jumping all the way into community GGUF workflows. But FP8 is still new enough that users are already reporting edge-case issues, including an open issue about fp8-scaled-mm crashing with a TypeError.
GGUF
Best for aggressive size reduction and experimentation on lower-end hardware, but it currently comes with the highest setup risk. It is the most likely route to trigger mismatch and loader errors if the exact files, nodes, and workflow are not aligned.
Which Version of LTX 2.3 Should You Choose?
For most readers, you can frame it like this.
Choose the official base dev or distilled safetensors version if you want the cleanest setup and are using recommended hardware. The base LTX-2.3 model card lists the 22B dev checkpoint, the 22B distilled checkpoint, the distilled LoRA, and spatial and temporal upscalers as the official core assets.
Choose FP8 if you want to lower memory use but still remain closer to the official Lightricks-supported path. This is especially attractive for people who are already comfortable with LTX’s codebase or ComfyUI ecosystem.
Choose GGUF if your main goal is squeezing LTX 2.3 onto tighter hardware, and you are comfortable dealing with a more experimental setup. Readers should know that this path is fast-moving, community-led, and not yet as smooth as the official safetensors route.
Common LTX 2.3 Errors People Are Running Into
1. Out of memory errors, especially with Gemma and custom scripts
OOM is one of the most common LTX 2.3 pain points. The official issue tracker already shows users reporting out-of-memory problems when trying to load Gemma alongside the main model, and another new issue says that when pipelines are used from external Python scripts instead of the CLI, the __call__ path may not be wrapped in torch.inference_mode(), which can keep about 37 GB of graph and activations alive and cause OOM when switching components.
Why it happens: LTX 2.3 is big, audio-video capable, and often paired with Gemma text encoder assets. Memory can spike badly if the workflow does not offload correctly or if custom scripts keep unnecessary activations alive.
What usually helps: low VRAM loaders, –reserve-vram, FP8, distilled pipelines, and wrapping custom Python inference in torch.inference_mode().
2. GGUF or embeddings connector size mismatch
This is already showing up in multiple places. On the LTX-2 GitHub issues page, users are reporting embeddings_connector size mismatch errors, including mismatched tensor sizes like 4096 versus 3840. Community GGUF users are also reporting transformer block mismatch errors when combining loaders and model files.
Why it happens: the checkpoint, connector, loader, or text encoder files do not belong to the same version or expected architecture. This is especially easy to trigger when mixing LTX 2.0 style pieces with LTX 2.3 assets, or when using community GGUF files with the wrong accompanying embeddings.
What usually helps: redownload all matching 2.3 assets, avoid mixing old connectors with new checkpoints, and use the exact workflow the model publisher provides before editing anything.
3. Bright flash, artifact, or random overlay at the end of the video
This is one of the most talked-about LTX 2.3 visual issues right now. Recent Hugging Face discussions and GitHub issues describe bright flashes, unwanted text-like overlays, and strange artifacting in the final second or final frames of generated videos, especially in longer clips and in some two-stage upscaling workflows.
Why it happens: based on user reports, it seems tied to some two-stage workflows and upscaler behavior, especially with longer outputs. Several users say shorter clips are less affected, and some say the x1.5 upscaler behaves better than x2 in their tests.
What usually helps: trying the x1.5 upscaler instead of x2, updating sigma and preprocess settings to match newer workflows, or trimming the final frames as a temporary workaround. These are community workarounds, not official fixes, so that distinction is worth making in the article.
4. Save video or audio export errors, including NaN/Inf
There are current reports of export failures with errors like [aac] Input contains (near) NaN/+-Inf, both in the ComfyUI-LTXVideo repo and in the LTX issues. This usually appears at the end of generation when the system tries to combine or encode video and audio.
Why it happens: the generation may succeed, but the audio export or final muxing step fails. Since LTX 2.3 is a joint audio-video model, failures can show up later in the pipeline, even if the visual part looks okay.
What usually helps: testing a simpler video combine node, temporarily disabling the problematic save path, or using known-working workflows from the official repo or trusted community examples until the pipeline matures.
5. Wrong dimensions or frame counts
This is one of the easiest errors to prevent, and it should absolutely be in the article because many beginners will hit it. The official model card says width and height must be divisible by 32, and frame count must be divisible by 8 plus 1. If not, the input should be padded and then cropped.
Why it happens: LTX 2.3 has strict latent and temporal structure requirements. Random dimensions that work in other video tools may not work here.
What usually helps: using approved workflow presets, standard resolutions, and frame counts that match the model’s structure from the start.
Common LTX 2.3 Errors and Fixes
Users testing LTX 2.3 are already reporting several recurring problems, especially in ComfyUI and custom local installs. The most common include out of memory errors, GGUF size mismatch problems, embeddings connector mismatch errors, final-frame bright flash artifacts, unwanted text overlays from some upscaler workflows, and audio export failures such as NaN/Inf errors during save. Many of these issues are not caused by the model alone, but by version mismatches between checkpoints, connectors, text encoders, upscalers, and custom nodes. The safest approach is to use a fully matched LTX 2.3 workflow, keep dimensions valid, and avoid mixing older LTX 2.0 assets into a 2.3 pipeline.
Final Verdict
LTX 2.3 is one of the most important AI video releases to study right now. It improves the visual core of generation, strengthens prompt handling, pushes image-to-video forward, upgrades audio, and treats portrait output as a native format instead of an afterthought. The linked tutorial and youtube video are practical places to start because they show how people can use the model today in ComfyUI, but the bigger story is that LTX 2.3 is shaping up as an open, production-grade video engine with real depth behind it.
FAQs About LTX 2.3
LTX 2.3 is an open AI video model from Lightricks designed for video generation, with synchronized audio support in many workflows, better detail, stronger motion, improved prompt adherence, and native portrait output.
Yes. LTX publishes open weights and official ComfyUI support, but local use has serious hardware needs, with 32 GB VRAM listed as the minimum starting point in the open-source documentation.
Yes. Both are core workflows, and image-to-video is one of the areas that received a major upgrade in this release.
Fast is meant for speed and experimentation. Pro is aimed at higher fidelity and final output quality, and it unlocks some advanced endpoints in the API.
Yes. Native 9:16 generation is one of its signature improvements, which makes it especially useful for Reels, Shorts, and mobile-first ad content.
Yes, you can use LTX 2.3 in ComfyUI. The weights, code, and tooling are openly available, and the LTX-2 community license is free for companies under the allowed terms, although it is not a fully unrestricted permissive license.
