Skip to content

mindspore-lab/mindone

MindSpore ONE

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

  • [2025.12.24] We release v0.5.0, compatibility with πŸ€— Transformers v4.57.1 (70+ new models) and πŸ€— Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
  • [2025.11.02] v0.4.0 is released, with 280+ transformers models and 70+ diffusers pipelines supported. See here
  • [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
  • [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
  • [2024.11.06] v0.2.0 is released

Quick tour

To install v0.5.0, please install MindSpore 2.6.0 - 2.7.1 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run:

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatible with πŸ€— diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see support list
  • 18+ training examples - controlnet, dreambooth, lora and more

run hf transformers on mindspore

  • mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatibale with πŸ€— transformers v4.57.1
  • providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see support list

supported models under mindone/examples

task model inference finetune pretrain institute
Text/Image-to-Video wan2.1 πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Text/Image-to-Video wan2.2 πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Audio/Image-Text-to-Text qwen2_5_omni πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Image/Video-Text-to-Text qwen2_5_vl πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Any-to-Any qwen3_omni_moe πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Image-Text-to-Text qwen3_vl/qwen3_vl_moe πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Text-to-Image qwen_image πŸ”₯πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Text-to-Text minicpm πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ OpenBMB
Any-to-Any janus βœ… βœ… βœ… DeepSeek
Any-to-Any emu3 βœ… βœ… βœ… BAAI
Class-to-Image var βœ… βœ… βœ… ByteDance
Text-to-Image omnigen2 πŸ”₯ βœ… βœ… βœ–οΈ VectorSpaceLab
Text/Image-to-Video hpcai open sora 1.2/2.0 βœ… βœ… βœ… HPC-AI Tech
Text/Image-to-Video cogvideox 1.5 5B~30B βœ… βœ… βœ… Zhipu
Image/Text-to-Text glm4v πŸ”₯ βœ… βœ–οΈ βœ–οΈ Zhipu
Text-to-Video open sora plan 1.3 βœ… βœ… βœ… PKU
Text-to-Video hunyuanvideo βœ… βœ… βœ… Tencent
Image-to-Video hunyuanvideo-i2v πŸ”₯ βœ… βœ–οΈ βœ–οΈ Tencent
Text-to-Video movie gen 30B βœ… βœ… βœ… Meta
Segmentation lang_sam πŸ”₯ βœ… βœ–οΈ βœ–οΈ Meta
Segmentation sam2 βœ… βœ–οΈ βœ–οΈ Meta
Text-to-Video step_video_t2v βœ… βœ–οΈ βœ–οΈ StepFun
Text-to-Speech sparktts βœ… βœ–οΈ βœ–οΈ Spark Audio
Text-to-Image flux βœ… βœ… βœ–οΈ Black Forest Lab
Text-to-Image stable diffusion 3 βœ… βœ… βœ–οΈ Stability AI

supported captioner

task model inference finetune pretrain features
Image-Text-to-Text pllava βœ… βœ–οΈ βœ–οΈ support video and image captioning

training-free acceleration

Introduce dit infer acceleration - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.