To ensure that your data synchronization tasks and data scheduling tasks in DataWorks run as expected, you must establish a network connection between the virtual private cloud (VPC) with which your resource group is associated and the data source you want to access. The data source can be a database, a data service, or other data in a network environment. This topic describes network connectivity solutions for data sources deployed in different types of network environments.
Background information
Many features and services of DataWorks need to be used based on data sources or computing resources, such as data source addition, data synchronization, data analytics, data collection, and DataService Studio. If a data source that you want to access is not deployed in the VPC with which your resource group is associated, you must select an appropriate network connectivity solution to establish a network connection between the VPC and the network environment in which the data source is deployed. For example, if the data source to access is deployed in another VPC or in a data center, you must establish a network connection between the VPC or data center and the VPC with which the resource group is associated.
For example, when you configure a data synchronization task, you must establish a network connection between the VPC with which your resource group is associated and the source, along with a network connection between the VPC and destination.
Prerequisites
A resource group with appropriate specifications is purchased. For more information, see Create and use a serverless resource group.
For more information about resource groups, see Overview.
The network connectivity solutions provided in this topic are suitable for serverless resource groups and the following types of old-version resource groups: exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio.
Precautions
You can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. However, serverless resource groups cannot access the Internet by default. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate an EIP with the Internet NAT gateway. For more information, see Scenario 6: Establish a network connection between a resource group and a data source that is deployed on the Internet.
If you run a data synchronization task to synchronize data over the Internet, the efficiency and stability of the task cannot be ensured. We recommend that you run the data synchronization task to synchronize data over an internal network or use Cloud Enterprise Network (CEN) for data synchronization.
Network connectivity is an important factor that affects the running result of your task.
Network connections cannot be established between a resource group and data sources that are deployed in the classic network. If the data source or business that you want to access is deployed in the classic network, we recommend that you migrate the data source or business to a VPC.
Network connectivity solutions
The network connectivity solution that you can use varies based on the relationship between your data source and your resource group. You can select a network connectivity solution based on your business requirements:
Solution 1: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account and reside in the same region
Scenarios
If your data source and DataWorks workspace meet the following conditions at the same time, we recommend that you use this solution.
The data source belongs to an Alibaba Cloud service.
The data source and DataWorks workspace belong to the same Alibaba Cloud account.
The data source and DataWorks workspace reside in the same region.
Solution description
If your data source and resource group belong to the same Alibaba Cloud account and reside in the same region, we recommend that you deploy the resource group and data source in the same VPC. This way, the resource group can access the data source over a VPC.
Diagram for network connectivity
Configure network connectivity
For more information about the solution description and configuration steps, see Solution 1: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account and reside in the same region.
Solution 2: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account but reside in different regions
Scenarios
If your data source and DataWorks workspace meet the following conditions at the same time, we recommend that you use this solution.
The data source belongs to an Alibaba Cloud service.
The data source and DataWorks workspace belong to the same Alibaba Cloud account.
The data source and DataWorks workspace reside in different regions.
Solution description
If your data source and resource group belong to the same Alibaba Cloud account but reside in different regions, we recommend that you use Cloud Enterprise Network (CEN) or a VPC peering connection to establish a network connection between the resource group and the VPC in which the data source is deployed. This way, the resource group can access the data source over a VPC.
Diagram for network connectivity
Configure network connectivity
For more information about the solution description and configuration steps, see Solution 2: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account but reside in different regions.
Solution 3: Establish a network connection between a resource group and a data source that belong to different Alibaba Cloud accounts
Scenarios
If your data source and DataWorks workspace meet the following conditions at the same time, we recommend that you use this solution.
The data source belongs to an Alibaba Cloud service.
The data source and DataWorks workspace belong to different Alibaba Cloud accounts.
Solution description
If your data source belongs to Alibaba Cloud Account A and your resource group belongs to Alibaba Cloud Account B, we recommend that you use Cloud Enterprise Network (CEN) or a VPC peering connection to establish a network connection between the accounts. This way, the resource group can access the data source over a VPC.
Diagram for network connectivity
Configure network connectivity
For more information about the solution description and configuration steps, see Solution 3: Establish a network connection between a resource group and a data source that belong to different Alibaba Cloud accounts.
Solution 4: Establish a network connection between a resource group and a data source that is hosted on an ECS instance
Scenarios
If your data source meets the following condition, we recommend that you use this solution.
The data source is hosted on an Elastic Compute Service (ECS) instance.
Solution description
If the ECS instance on which the data source is hosted and DataWorks belong to the same Alibaba Cloud account and reside in the same region, we recommend that you deploy the resource group and the ECS instance in the same VPC. This way, the resource group can access the data source over a VPC.
If the ECS instance on which the data source is hosted and DataWorks belong to different Alibaba Cloud accounts or belong to the same Alibaba Cloud account but reside in different regions, we recommend that you use Cloud Enterprise Network (CEN) or a VPC peering connection to establish a network connection between the resource group and the VPC in which the ECS instance is deployed. This way, the resource group can access the data source over a VPC.
Diagrams for network connectivity
Same Alibaba Cloud account and same region
Same Alibaba Cloud account but different regions
Different Alibaba Cloud accounts
Configure network connectivity
For more information about the solution description and configuration steps, see Solution 4: Establish a network connection between a resource group and a data source that is hosted on an ECS instance.
Solution 5: Establish a network connection between a resource group and a data source that is deployed in a data center
Scenarios
If your data source meets the following condition, we recommend that you use this solution.
The data source is deployed in an on-premises data center.
Solution description
If your data source is deployed in an on-premises data center, we recommend that you use Express Connect to establish a network connection between the network environment in which the data source resides and the VPC with which the resource group is associated. This way, the resource group can access the data source over a VPC.
Diagram for network connectivity
Configure network connectivity
For more information about the solution description and configuration steps, see Solution 5: Establish a network connection between a resource group and a data source that is deployed in a data center.
Solution 6: Establish a network connection between a resource group and a data source that is deployed on the Internet
Scenarios
If your data source meets the following condition, we recommend that you use this solution.
A public endpoint is configured for the data source.
Solution description
By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source that is deployed on the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate an EIP with the Internet NAT gateway.
Resource groups of the old version can access the Internet and can directly connect to the data source over the Internet.
NoteOld-version resource groups are being phased out. We recommend that you use serverless resource groups.
Diagram for network connectivity
The following diagram is suitable only for serverless resource groups. EIPs are associated with old-version resource groups by default, and you can directly establish network connections between old-version resource groups and data sources.
Configure network connectivity
For more information about the solution description and configuration steps, see Solution 6: Establish a network connection between a resource group and a data source that is deployed on the Internet.
References
For more information about resource groups, see Overview.
For more information about how to create and use a resource group, see Create and use a serverless resource group.
For more information about how to associate a resource group with a VPC, see Associate a resource group with a VPC.
For more information about how to configure an SNAT entry on an Internet NAT gateway for the VPC and vSwitch with which the resource group is associated, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
For more information about network connectivity issues, see Network connectivity and operations on resource groups.