Introduction
We are thrilled to announce the release of Qwen-Image, a groundbreaking 20B MMDiT image foundation model that achieves significant advances in complex text rendering and precise image editing. This innovative model represents a major leap forward in AI-powered image generation, offering unprecedented capabilities in handling multilingual text rendering and sophisticated image manipulation tasks.
Core Capabilities
Superior Text Rendering
Qwen-Image excels at complex text rendering, including multi-line layouts, paragraph-level semantics, and fine-grained details. The model supports both alphabetic languages (e.g., English) and logographic languages (e.g., Chinese) with exceptional fidelity, making it ideal for creating content that requires accurate text representation.
Consistent Image Editing
Through an enhanced multi-task training paradigm, Qwen-Image achieves exceptional performance in preserving both semantic meaning and visual realism during editing operations. This capability enables users to perform precise modifications while maintaining the integrity of the original image.
Strong Cross-Benchmark Performance
Evaluated on multiple public benchmarks, Qwen-Image consistently outperforms existing models across diverse generation and editing tasks, establishing a strong foundation model for image generation that sets new standards in the field.
Technical Performance
Benchmark Results
Qwen-Image has been comprehensively evaluated across multiple public benchmarks:
- General Image Generation: GenEval, DPG, and OneIG-Bench
- Image Editing: GEdit, ImgEdit, and GSO
- Text Rendering: LongText-Bench, ChineseWord, and TextCraft
The model achieves state-of-the-art performance on all benchmarks, demonstrating its strong capabilities in both image generation and editing. Particularly noteworthy is its exceptional performance in Chinese text generation, outperforming existing state-of-the-art models by a significant margin.
Key Technical Features
- 20B Parameter Model: Large-scale architecture for superior performance
- MMDiT Architecture: Advanced transformer-based design
- Multi-task Training: Enhanced paradigm for diverse capabilities
- Cross-language Support: Seamless handling of multiple languages
Demo Showcases
Chinese Text Rendering Excellence
Example 1: Miyazaki-Style Scene
One of Qwen-Image's outstanding capabilities is its ability to achieve high-fidelity text rendering in different scenarios. Let's examine a Chinese rendering case:
Miyazaki Anime Style Scene

Scene featuring shop signs like "云存储", "云计算", and "云模型" with realistic depth of field and accurate character poses.
The model not only accurately captures Miyazaki's anime style but also features shop signs like "云存储", "云计算", and "云模型" as well as "千问" on wine jars, all rendered realistically and accurately with proper depth of field. The poses and expressions of the characters are also perfectly preserved.
Example 2: Traditional Chinese Couplets
Let's look at another example of Chinese rendering:
Traditional Chinese Couplets

Accurately rendered couplets with calligraphy effects and Yueyang Tower in the center.
The model accurately drew the left and right couplets and the horizontal scroll, applied calligraphy effects, and accurately generated the Yueyang Tower in the middle. The blue and white porcelain on the table also looked very realistic.
English Text Rendering Capabilities
Example 3: Bookstore Window Display
So, how does the model perform on English? Let's look at an English rendering example:
Bookstore Window Display

Accurately generated "New Arrivals This Week" and book titles including "The light between worlds", "When stars are scattered", "The silent patient", and "The night circus".
In this example, the model not only accurately outputs "New Arrivals This Week" but also accurately generates the cover text of four books: "The light between worlds", "When stars are scattered", "The silent patient", and "The night circus".
Example 4: Complex Infographic Layout
Let's look at a more complex case of English rendering:
Emotional Wellbeing Infographic

Complex layout with 6 submodules, each with icons, titles, and descriptive text.
In this case, the model needs to generate 6 submodules, each with its own icon, title, and corresponding introductory text. Qwen-Image has completed the layout beautifully.
Small Text and Bilingual Capabilities
Example 5: Handwritten Poetry
What about smaller text? Let us test it:
Handwritten Poetry on Paper

Accurate rendering of handwritten poetry despite the text occupying less than one-tenth of the image.
In this case, the paper is less than one-tenth of the entire image, and the paragraph of text is relatively long, but the model still accurately generates the text on the paper.
Example 6: Bilingual Content
What if it's bilingual? For the same scenario, let's try this prompt:
Bilingual Handwritten Content

Seamless switching between English and Chinese in handwritten text rendering.
As you can see, the model can switch between two languages at any time when rendering text.
Creative Applications
Movie Poster Creation
Qwen-Image's text capabilities make it easy to create posters, such as:
Movie Poster: "Imagination Unleashed"

Complex poster with multiple text elements including title, subtitle, cast, director, and release information.
Professional Presentation Design
Since we can make posters, of course we can also make PPTs directly. Let's look at a case of making PPTs in Chinese:
Professional PPT Design

Enterprise-quality PPT with Alibaba branding, technical content, and traditional Chinese cultural elements.
Advanced Capabilities
General Image Generation
Beyond text processing, Qwen-Image also excels at general image generation, supporting a wide range of artistic styles. From photorealistic scenes to impressionistic paintings, from anime styles to minimalist designs, the model flexibly responds to a wide range of creative prompts, becoming a versatile tool for artists, designers, and storytellers.
Image Editing Features
In terms of image editing, Qwen-Image supports a variety of operations, including:
- Style Transfer: Transform images into different artistic styles
- Additions and Deletions: Add or remove objects from images
- Detail Enhancement: Improve image quality and details
- Text Editing: Modify or add text within images
- Character Pose Adjustment: Change poses and expressions
This allows even ordinary users to easily achieve professional-level image editing.
Use Cases
Creative Content Production
- Poster Design: Create marketing materials with accurate text
- Presentation Graphics: Generate professional slides and infographics
- Book Covers: Design covers with precise text rendering
- Social Media Content: Create engaging visual content
Commercial Applications
- Marketing Materials: Generate promotional content with accurate branding
- Product Visualization: Create product images with detailed specifications
- Brand Identity: Develop consistent visual branding elements
- Advertising: Create compelling ad visuals with text elements
Educational Content
- Learning Materials: Generate educational graphics with text
- Infographics: Create informative visual content
- Tutorial Graphics: Develop step-by-step visual guides
- Multilingual Content: Create content in multiple languages
Entertainment
- Game Assets: Generate game graphics with text elements
- Animation Frames: Create animated content with text
- Comic Creation: Develop comics with accurate text bubbles
- Story Illustrations: Generate book illustrations with text
Advantages
-
Superior Text Rendering
- Complex multi-line layouts
- Multi-language support
- Fine-grained detail preservation
- Accurate font and style rendering
-
Advanced Image Editing
- Precise object manipulation
- Style transfer capabilities
- Detail enhancement
- Semantic preservation
-
Cross-Benchmark Excellence
- State-of-the-art performance
- Consistent quality across tasks
- Strong foundation model capabilities
- Reliable and reproducible results
-
User-Friendly Interface
- Accessible through Qwen Chat
- No complex setup required
- Immediate access to capabilities
- Intuitive user experience
Getting Started
To try the latest Qwen-Image model:
-
Visit Qwen Chat
- Navigate to the Qwen Chat platform
- Select "Image Generation" option
- Start creating with text prompts
-
Experiment with Text Rendering
- Try Chinese and English text prompts
- Test complex layouts and designs
- Explore different artistic styles
-
Explore Image Editing
- Test various editing operations
- Experiment with style transfers
- Create professional-quality content
Conclusion
Qwen-Image represents a significant advancement in AI-powered image generation technology. Its exceptional text rendering capabilities, combined with powerful image editing features and strong cross-benchmark performance, make it a valuable tool for creators, businesses, and developers alike.
The model's ability to handle complex multilingual text rendering while maintaining high-quality image generation opens up new possibilities for content creation across various industries. From marketing materials to educational content, from entertainment to commercial applications, Qwen-Image provides the tools needed to create professional-quality visual content with unprecedented accuracy and flexibility.
We hope that Qwen-Image can further promote the development of image generation, lower the technical barriers to visual content creation, and inspire more innovative applications. At the same time, we also look forward to the active participation and feedback of the community to jointly build an open, transparent, and sustainable generative AI ecosystem.
Written by
Qwen Image
At
Sun Aug 10 2025