Diffusion-Type AIGC Request Scheduling with Inference Sharing
H Yang, Y Hou, Y Zheng, Z Li - 2025 IEEE/ACM 33rd …, 2025 - ieeexplore.ieee.org
H Yang, Y Hou, Y Zheng, Z Li
2025 IEEE/ACM 33rd International Symposium on Quality of Service …, 2025•ieeexplore.ieee.orgAIGC-as-a-Service (AaaS) enables diverse and high-quality content creation. Due to the
computational intensity and high costs of model inference, efficiently scheduling generation
requests is non-trivial for AIGC Service Providers (ASPs). A judicious balance is required
between generation quality, limited resources, and service delay, while coping with online
arrivals and operational constraints. Existing scheduling systems often neglect the quality
metrics of generated content, and fail to deploy the latest architecture in model inference. To …
computational intensity and high costs of model inference, efficiently scheduling generation
requests is non-trivial for AIGC Service Providers (ASPs). A judicious balance is required
between generation quality, limited resources, and service delay, while coping with online
arrivals and operational constraints. Existing scheduling systems often neglect the quality
metrics of generated content, and fail to deploy the latest architecture in model inference. To …
AIGC-as-a-Service (AaaS) enables diverse and high-quality content creation. Due to the computational intensity and high costs of model inference, efficiently scheduling generation requests is non-trivial for AIGC Service Providers (ASPs). A judicious balance is required between generation quality, limited resources, and service delay, while coping with online arrivals and operational constraints. Existing scheduling systems often neglect the quality metrics of generated content, and fail to deploy the latest architecture in model inference. To address these challenges, we introduce a distributed diffusion framework that reduces resource consumption by sharing inference steps across different inference tasks. On this basis, we establish a quality model using the refined CLIP Score, and formulate the scheduling problem as a mixed-integer nonlinear program. We first develop a prompt similarity-based algorithm to determine the number of shared inference steps within each request group. Adopting a primal-dual framework, we then design an online algorithm to dynamically manage request admission and scheduling, maximizing social welfare of the AIGC ecosystem while ensuring generation quality. Extensive real-world trace-driven experiments demonstrate that our approach improves social welfare by up to 38.2% compared to the state-of-the-art method.
ieeexplore.ieee.org
Showing the best result for this search. See all results