After model training is complete, you can use Elastic Algorithm Service (EAS) to quickly deploy your model as an online inference service or a web application. EAS supports heterogeneous resources and combines capabilities such as automatic scaling, one-click stress testing, canary release, and real-time monitoring to ensure service stability and business continuity in high-concurrency scenarios at a lower cost.
EAS architecture
Supported regions
EAS is available in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Heyuan), China (Guangzhou), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Indonesia (Jakarta), US (Silicon Valley), US (Virginia), and Germany (Frankfurt).
Billing
See Billing of EAS.
Usage
Step 1: Preparations
Prepare inference resources.
Select an EAS resource group. EAS provides three types of resources: public resources, dedicated resources, and Lingjun resources. To use dedicated resources or Lingjun resources, you need to purchase them before use. For guidance on resource selection and purchase configurations, see Overview of EAS resource groups.
Prepare the model, pre-processing and post-processing code files, and others.
Prepare the developed and trained model, code processing files, and others. Upload them to the specific cloud storage service. Access these data by storage mounting.
Step 2: Deploy a service
Deployment tools: EAS supports deploying and managing services through graphical interface or command line methods. They have different deployment processes and operational details.
Type
GUI method
Command line method
Deploy service
Manage service
On the Inference Service tab of the Elastic Algorithm Service (EAS) page Manage EAS services.
Including:
View model calling information.
View logs, monitoring information, and deployment information.
Scale in, scale out, start, stop, and delete model services.
Manage model services through EASCMD, see Run commands to use the EASCMD client.
Deployment methods: EAS supports image deployment (recommended) and processor deployment. For differences, see Parameters for custom deployment in the console.
Type
Description
References
Image deployment (recommended)
Images ensure consistency between the development and deployment environments.
EAS provides official images that are suitable for various scenarios. You can use an official image to implement push-button deployment.
You can also use a custom image without modification to deploy a model service in a convenient manner.
Processor deployment
EAS provides prebuilt processors for common frameworks, such as PMML and XGBOOST. Using EAS prebuilt processors allows you to quickly start services, but may not meet specific requirements.
You can also build custom processors to implement more flexible logic.
Step 3: Calling and stress testing
Deploy a model as a web UI application: You can use the console to open the web app in a browser and interact with the inference service.
Deploy a model as an API service:
After deployment, you can use the online service debugging feature to send HTTP service requests to verify that the service can perform inference normally.
Call the service to implement online inference and asynchronous inference. EAS services support multiple service calling methods such as Internet endpoint, VPC endpoint, and VPC direct connection.
For information about stress testing, see Automatic service stress testing.
Step 4: Monitor services and service scaling
After the service is running normally, you can enable service monitoring alerts to monitor the usage of service resources.
You can also enable horizontal or scheduled automatic scaling features to manage the computing resources of online services in real time.
Step 5: Asynchronous inference
Queue service and asynchronous inference are required in scenarios where inference takes a long time. When your inference service receives many requests, create an input queue to store the requests. After the requests are processed, save the results to the output queue and asynchronously return the results. This prevents unprocessed requests from being discarded. Additionally, EAS supports multiple methods of sending request data to the queue service and automatically scales the inference service by monitoring the amount of data in the queue. This effectively controls the number of service instances. For more information, see Asynchronous inference services.
References
For information about EAS use cases, see EAS use cases.
Data Science Workshop (DSW) of PAI is a cloud-based and interactive integrated development environment (IDE) for machine learning. You can use Notebooks to read data, develop algorithms, and train and deploy models. For more information, see DSW overview.
Visualized Modeling (Designer) of PAI provides hundreds of algorithm components. It supports large-scale distributed training for traditional machine learning, deep learning, and reinforcement learning, as well as streaming training and batch training. For more information, see Designer overview.