This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.
ONE is short for "ONE for all"
- [2025.12.24] We release v0.5.0, compatibility with π€ Transformers v4.57.1 (70+ new models) and π€ Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
- [2025.11.02] v0.4.0 is released, with 280+ transformers models and 70+ diffusers pipelines supported. See here
- [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
- [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
- [2024.11.06] v0.2.0 is released
To install v0.5.0, please install MindSpore 2.6.0 - 2.7.1 and run pip install mindone
Alternatively, to install the latest version from the master branch, please run:
git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .
We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.
Hello MindSpore from Stable Diffusion 3!
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")- mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
- compatible with π€ diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see support list
- 18+ training examples - controlnet, dreambooth, lora and more
- mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
- compatibale with π€ transformers v4.57.1
- providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see support list
| task | model | inference | finetune | pretrain | institute |
|---|---|---|---|---|---|
| Text/Image-to-Video | wan2.1 π₯ | β | βοΈ | βοΈ | Alibaba |
| Text/Image-to-Video | wan2.2 π₯π₯ | β | β | βοΈ | Alibaba |
| Audio/Image-Text-to-Text | qwen2_5_omni π₯π₯ | β | β | βοΈ | Alibaba |
| Image/Video-Text-to-Text | qwen2_5_vl π₯π₯ | β | β | βοΈ | Alibaba |
| Any-to-Any | qwen3_omni_moe π₯π₯π₯ | β | βοΈ | βοΈ | Alibaba |
| Image-Text-to-Text | qwen3_vl/qwen3_vl_moe π₯π₯π₯ | β | βοΈ | βοΈ | Alibaba |
| Text-to-Image | qwen_image π₯π₯π₯ | β | β | βοΈ | Alibaba |
| Text-to-Text | minicpm π₯π₯ | β | βοΈ | βοΈ | OpenBMB |
| Any-to-Any | janus | β | β | β | DeepSeek |
| Any-to-Any | emu3 | β | β | β | BAAI |
| Class-to-Image | var | β | β | β | ByteDance |
| Text-to-Image | omnigen2 π₯ | β | β | βοΈ | VectorSpaceLab |
| Text/Image-to-Video | hpcai open sora 1.2/2.0 | β | β | β | HPC-AI Tech |
| Text/Image-to-Video | cogvideox 1.5 5B~30B | β | β | β | Zhipu |
| Image/Text-to-Text | glm4v π₯ | β | βοΈ | βοΈ | Zhipu |
| Text-to-Video | open sora plan 1.3 | β | β | β | PKU |
| Text-to-Video | hunyuanvideo | β | β | β | Tencent |
| Image-to-Video | hunyuanvideo-i2v π₯ | β | βοΈ | βοΈ | Tencent |
| Text-to-Video | movie gen 30B | β | β | β | Meta |
| Segmentation | lang_sam π₯ | β | βοΈ | βοΈ | Meta |
| Segmentation | sam2 | β | βοΈ | βοΈ | Meta |
| Text-to-Video | step_video_t2v | β | βοΈ | βοΈ | StepFun |
| Text-to-Speech | sparktts | β | βοΈ | βοΈ | Spark Audio |
| Text-to-Image | flux | β | β | βοΈ | Black Forest Lab |
| Text-to-Image | stable diffusion 3 | β | β | βοΈ | Stability AI |
| task | model | inference | finetune | pretrain | features |
|---|---|---|---|---|---|
| Image-Text-to-Text | pllava | β | βοΈ | βοΈ | support video and image captioning |
Introduce dit infer acceleration - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.
