Introduction

We are thrilled to announce the release of Qwen-Image, a groundbreaking 20B MMDiT image foundation model that achieves significant advances in complex text rendering and precise image editing. This innovative model represents a major leap forward in AI-powered image generation, offering unprecedented capabilities in handling multilingual text rendering and sophisticated image manipulation tasks.

Core Capabilities

Superior Text Rendering

Qwen-Image excels at complex text rendering, including multi-line layouts, paragraph-level semantics, and fine-grained details. The model supports both alphabetic languages (e.g., English) and logographic languages (e.g., Chinese) with exceptional fidelity, making it ideal for creating content that requires accurate text representation.

Consistent Image Editing

Through an enhanced multi-task training paradigm, Qwen-Image achieves exceptional performance in preserving both semantic meaning and visual realism during editing operations. This capability enables users to perform precise modifications while maintaining the integrity of the original image.

Strong Cross-Benchmark Performance

Evaluated on multiple public benchmarks, Qwen-Image consistently outperforms existing models across diverse generation and editing tasks, establishing a strong foundation model for image generation that sets new standards in the field.

Technical Performance

Benchmark Results

Qwen-Image has been comprehensively evaluated across multiple public benchmarks:

General Image Generation: GenEval, DPG, and OneIG-Bench
Image Editing: GEdit, ImgEdit, and GSO
Text Rendering: LongText-Bench, ChineseWord, and TextCraft

The model achieves state-of-the-art performance on all benchmarks, demonstrating its strong capabilities in both image generation and editing. Particularly noteworthy is its exceptional performance in Chinese text generation, outperforming existing state-of-the-art models by a significant margin.

Key Technical Features

20B Parameter Model: Large-scale architecture for superior performance
MMDiT Architecture: Advanced transformer-based design
Multi-task Training: Enhanced paradigm for diverse capabilities
Cross-language Support: Seamless handling of multiple languages

Demo Showcases

Chinese Text Rendering Excellence

Example 1: Miyazaki-Style Scene

One of Qwen-Image's outstanding capabilities is its ability to achieve high-fidelity text rendering in different scenarios. Let's examine a Chinese rendering case:

Miyazaki Anime Style Scene

Miyazaki-style scene with Chinese text rendering

Scene featuring shop signs like "云存储", "云计算", and "云模型" with realistic depth of field and accurate character poses.

The model not only accurately captures Miyazaki's anime style but also features shop signs like "云存储", "云计算", and "云模型" as well as "千问" on wine jars, all rendered realistically and accurately with proper depth of field. The poses and expressions of the characters are also perfectly preserved.

Example 2: Traditional Chinese Couplets

Let's look at another example of Chinese rendering:

Traditional Chinese Couplets

Accurately rendered couplets with calligraphy effects and Yueyang Tower in the center.

The model accurately drew the left and right couplets and the horizontal scroll, applied calligraphy effects, and accurately generated the Yueyang Tower in the middle. The blue and white porcelain on the table also looked very realistic.

English Text Rendering Capabilities

Example 3: Bookstore Window Display

So, how does the model perform on English? Let's look at an English rendering example:

Bookstore Window Display

Bookstore window with accurate text rendering

Accurately generated "New Arrivals This Week" and book titles including "The light between worlds", "When stars are scattered", "The silent patient", and "The night circus".

In this example, the model not only accurately outputs "New Arrivals This Week" but also accurately generates the cover text of four books: "The light between worlds", "When stars are scattered", "The silent patient", and "The night circus".

Example 4: Complex Infographic Layout

Let's look at a more complex case of English rendering:

Emotional Wellbeing Infographic

Complex infographic with multiple text sections

Complex layout with 6 submodules, each with icons, titles, and descriptive text.

In this case, the model needs to generate 6 submodules, each with its own icon, title, and corresponding introductory text. Qwen-Image has completed the layout beautifully.

Small Text and Bilingual Capabilities

Example 5: Handwritten Poetry

What about smaller text? Let us test it:

Handwritten Poetry on Paper

Man with handwritten poetry on yellowed paper

Accurate rendering of handwritten poetry despite the text occupying less than one-tenth of the image.

In this case, the paper is less than one-tenth of the entire image, and the paragraph of text is relatively long, but the model still accurately generates the text on the paper.

Example 6: Bilingual Content

What if it's bilingual? For the same scenario, let's try this prompt:

Bilingual Handwritten Content

Seamless switching between English and Chinese in handwritten text rendering.

As you can see, the model can switch between two languages at any time when rendering text.

Creative Applications

Movie Poster Creation

Qwen-Image's text capabilities make it easy to create posters, such as:

Movie Poster: "Imagination Unleashed"

Sci-fi movie poster with complex text layout

Complex poster with multiple text elements including title, subtitle, cast, director, and release information.

Professional Presentation Design

Since we can make posters, of course we can also make PPTs directly. Let's look at a case of making PPTs in Chinese:

Professional PPT Design

Professional PPT slide with Chinese text and graphics

Enterprise-quality PPT with Alibaba branding, technical content, and traditional Chinese cultural elements.

Advanced Capabilities

General Image Generation

Beyond text processing, Qwen-Image also excels at general image generation, supporting a wide range of artistic styles. From photorealistic scenes to impressionistic paintings, from anime styles to minimalist designs, the model flexibly responds to a wide range of creative prompts, becoming a versatile tool for artists, designers, and storytellers.

Image Editing Features

In terms of image editing, Qwen-Image supports a variety of operations, including:

Style Transfer: Transform images into different artistic styles
Additions and Deletions: Add or remove objects from images
Detail Enhancement: Improve image quality and details
Text Editing: Modify or add text within images
Character Pose Adjustment: Change poses and expressions

This allows even ordinary users to easily achieve professional-level image editing.

Use Cases

Creative Content Production

Poster Design: Create marketing materials with accurate text
Presentation Graphics: Generate professional slides and infographics
Book Covers: Design covers with precise text rendering
Social Media Content: Create engaging visual content

Commercial Applications

Marketing Materials: Generate promotional content with accurate branding
Product Visualization: Create product images with detailed specifications
Brand Identity: Develop consistent visual branding elements
Advertising: Create compelling ad visuals with text elements

Educational Content

Learning Materials: Generate educational graphics with text
Infographics: Create informative visual content
Tutorial Graphics: Develop step-by-step visual guides
Multilingual Content: Create content in multiple languages

Entertainment

Game Assets: Generate game graphics with text elements
Animation Frames: Create animated content with text
Comic Creation: Develop comics with accurate text bubbles
Story Illustrations: Generate book illustrations with text

Advantages

Superior Text Rendering
- Complex multi-line layouts
- Multi-language support
- Fine-grained detail preservation
- Accurate font and style rendering
Advanced Image Editing
- Precise object manipulation
- Style transfer capabilities
- Detail enhancement
- Semantic preservation
Cross-Benchmark Excellence
- State-of-the-art performance
- Consistent quality across tasks
- Strong foundation model capabilities
- Reliable and reproducible results
User-Friendly Interface
- Accessible through Qwen Chat
- No complex setup required
- Immediate access to capabilities
- Intuitive user experience

Getting Started

To try the latest Qwen-Image model:

Visit Qwen Chat
- Navigate to the Qwen Chat platform
- Select "Image Generation" option
- Start creating with text prompts
Experiment with Text Rendering
- Try Chinese and English text prompts
- Test complex layouts and designs
- Explore different artistic styles
Explore Image Editing
- Test various editing operations
- Experiment with style transfers
- Create professional-quality content

Conclusion

Qwen-Image represents a significant advancement in AI-powered image generation technology. Its exceptional text rendering capabilities, combined with powerful image editing features and strong cross-benchmark performance, make it a valuable tool for creators, businesses, and developers alike.

The model's ability to handle complex multilingual text rendering while maintaining high-quality image generation opens up new possibilities for content creation across various industries. From marketing materials to educational content, from entertainment to commercial applications, Qwen-Image provides the tools needed to create professional-quality visual content with unprecedented accuracy and flexibility.

We hope that Qwen-Image can further promote the development of image generation, lower the technical barriers to visual content creation, and inspire more innovative applications. At the same time, we also look forward to the active participation and feedback of the community to jointly build an open, transparent, and sustainable generative AI ecosystem.

Introducing Qwen-Image: Supports both semantic and appearance-level editing