All Products
Search
Document Center

DataWorks:Data source management

Last Updated:Jul 04, 2025

To perform operations on data in your database or data warehouse in DataWorks (such as a MaxCompute project), you must add your database or warehouse to DataWorks as a data source on the Data Source page in Management Center in the DataWorks console and apply this data source to the DataWorks service where you want to use it. For example, if you want to synchronize data from a MaxCompute project, you must add the MaxCompute project as a data source. Then, when you configure a synchronization task in Data Integration, you can select the data source and use it as the source or destination of the synchronization task.

Notes

The data sources created by following this document cannot be used for developing periodically scheduled tasks. If you want to process data based on a specific engine service in DataWorks and schedule such engine tasks periodically, you need to add it as a computing resource to DataWorks.

Permission management

Only a workspace member with the O&M or Workspace Administrator role and a RAM user with the AliyunDataWorksFullAccess or AdministratorAccess policy can add data sources. For information about authorization, see Manage permissions on workspace-level services and Grant permissions to a RAM user.

In addition to the preceding permissions, other permissions may also be required for adding specific types of data sources. You can perform the authorization based on the instructions displayed in the DataWorks console.

Data source isolation

A workspace in standard mode supports the data source isolation feature. You can add a data source separately in the development environment and production environment. This way, the data source used for testing and the data source used for task scheduling in the production environment are isolated. This ensures data security in the production environment. For more information, see Appendix: Environments of data sources.

  • Data sources in the development environment: You can select such a data source when you create a synchronization node and use it in the development environment. You cannot commit it to the production environment or use it in the production environment.

  • Data sources in the production environment: You cannot select such a data source when you configure a synchronization node. You can use such a data source only in the production environment.

Supported data source types

For information about the data source types that are supported by DataWorks, see Supported data source types and synchronization operations.

Note

The data sources that can be used for different modules of DataWorks vary.

Add a data source

  1. Go to the SettingCenter page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

  2. In the left-side navigation pane, choose Data Sources > Data Sources. The Data Sources page appears.

  3. On the Data Sources page, click Add Data Source or Batch Add Data Sources based on your business requirements.

    Note

    For information about the data source types that are supported by DataWorks, see the Supported data source types section in this topic.

    Add a data source

    1. Click Add Data Source. In the Add Data Source dialog box, click the desired data source type. On the page that appears, configure the parameters to add a data source of the selected type. The parameters that you must configure when you add different types of data sources vary. You can view the infotip of each parameter on the configuration page of the related data source.

    2. Optional. Test the connectivity of the resource group.

      In the Connection Configuration section of the Add Data Source dialog box, find the resource group that is associated with the workspace and click Test Network Connectivity in the Connection Status column.

      Note

      Different resource groups have different properties and characteristics. For more information, see Overview.

      • If Connected is displayed in the Connection Status column, click Complete Creation.

      • If Connection Failed is displayed in the Connection Status column, the resource group cannot be connected to the data source. In this case, tasks that use the data source cannot be run.

        Note

        You can perform the following operations to troubleshoot connectivity issues:

        • Click Self-service Troubleshoot to troubleshoot connectivity issues in the Network Connectivity Diagnostic Tool panel.

        • If the connectivity diagnostics tool does not provide a solution, check the parameters that you configure, such as the account, password, and connection address, and make sure that the IP address of the resource group is added to the IP address whitelist of the data source. For more information, see Network connectivity.

        • By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source over the Internet, you must configure an Internet NAT gateway and EIP for the VPC with which the resource group is associated. This way, the resource group can access the data source over the Internet.

    Add multiple data sources at a time

    Click Batch Add Data Sources and perform the following operations. You can add only multiple MySQL, Hive, SQL Server, or Oracle data sources at a time.

    1. In the Batch Add Data Sources dialog box, select the desired data source type and download the configuration template for this data source type.

      The information that you must configure in the template varies based on the value of the Data Source Type parameter. You can set the Data Source Type parameter to Connection Mode or Instance Mode. You can view the information that you must configure in the DataWorks console.

    2. Configure data source information in the template.

    3. After the data source information is configured, upload the template. Then, the system adds the data sources to DataWorks at a time based on the information in the template.

      When the system adds the data sources, you can view the progress and details in the Batch Add Data Sources dialog box. If specific data sources fail to be added, you can troubleshoot the issue based on the error message.

Note
  • DataWorks allows you to add a data source in Connection String Mode or Instance Mode. You can select a mode based on your business requirements. The parameters that you must configure vary based on the mode that you select.

    If you add a data source in Connection String Mode, DataWorks parses the JDBC URL of the data source. If the JDBC URL contains parameters that are not supported by DataWorks, DataWorks automatically removes the parameters. If you want to retain the unsupported parameters in the JDBC URL, submit a ticket to contact technical support personnel.

  • You can configure different data source information for the development environment and production environment by using the same data source name. Data source configurations in different environments are independent of each other.

Manage data sources

On the Data Source page, you can configure Data Source Type and Data Source Name to search for the data source that you want to manage. On the Data Source page, you can also perform the following operations on a data source.image.png

  • Modify Data Source: You can modify the configuration information of a data source based on your business requirements. You cannot change the name or environment of a data source.

    Note

    You cannot directly edit data sources that are automatically created when binding computing resources in Resource Management. If you need to modify them, you can do so on the Resource Management page.

  • Delete Data Source: You can delete a data source that is no longer required. The following table describes the impacts that are generated if you delete data sources in different environments.

    Note
    • If you authorize a member in Workspace A to use a data source in Workspace B and you delete the data source, tasks that use the data source across the workspaces fail.

    • You cannot directly delete data sources that are automatically created when binding computing resources in Resource Management. In the left-side navigation pane of the Management Center page, click Computing Resource, find the computing resource that you want to delete, and then click Disassociate. After the disassociation is complete, the data source will be deleted automatically.

    • Impact on the Data Integration module.

      Environment of the data source to be deleted

      Operation and impact

      Solution that can be applied before data source deletion

      Development environment and production environment

      You must check whether the data source is being used by synchronization tasks in the production environment. The deletion operation is irreversible. If synchronization tasks configured for the data source are used in the production environment and you delete the data source, the following issues occur:

      • The synchronization tasks in the production environment cannot be run as expected. Before you delete the data source, delete the synchronization task that uses the data source.

      • The data source is not available when you configure a synchronization task in the development environment.

      Go to the Batch Operation-Data Development tab on the DataStudio page, change the data source used by the synchronization tasks at a time, and then commit and deploy the synchronization tasks.

      Development environment

      You must check whether the data source is being used by synchronization tasks in the production environment. The deletion operation is irreversible. If synchronization tasks configured for the data source are used in the production environment and you delete the data source, the following issues occur:

      • The synchronization tasks in the production environment can be run as expected. However, you cannot obtain metadata information when you modify the synchronization tasks.

      • The data source is not available when you configure a synchronization task in the development environment.

      Production environment

      You must check whether the data source is being used by synchronization tasks in the production environment. If synchronization tasks configured for the data source are used in the production environment and you delete the data source, the following issues occur:

      • The synchronization tasks in the production environment cannot be run as expected. Before you delete the data source, delete the synchronization task that uses the data source.

      • If you configure a synchronization task for the data source in the development environment, you cannot commit or deploy the synchronization task to the production environment.

    • Impacts of data source deletion on other modules

      Module

      Risk level of the deletion operation

      Impact

      Related tasks

      Solution that can be applied before data source deletion

      Operation Center

      High

      The running of related tasks fails.

      Go to the Batch Operation-Data Development tab on the DataStudio page, change the data source used by the tasks at a time, and then commit and deploy the tasks.

      Data Service API

      High

      Related tasks fail to call DataService Studio APIs.

      Change the data source of DataService Studio APIs.

      DataAnalysis

      Medium

      The running of related query tasks fails.

      Query tasks that are run in DataAnalysis.

      Change the data source for SQL queries.

      Data Quality

      Medium

      Errors occur when related tasks are checked.

      Tasks for which Data Quality monitoring rules are configured. For more information, see View the details of a monitor.

      Go to Operation Center and disassociate Data Quality monitoring rules from tasks. For more information, see View and manage auto triggered tasks.

  • Clone Data Source: You can use the cloning feature to quickly generate a new data source whose configuration information is the same as an existing data source.

    Note

    The name of the new data source must be different from that of the existing data source.

  • Permission Management: You can use the Permission Management feature to grant permissions on a data source in the current workspace to a member in another workspace. After the permissions are granted to the member, the member can view and use the data source but cannot modify the data source. For more information, see Manage permissions on data sources.

    Note

    If you grant permissions on a data source to a workspace, all members in the workspace can view and use the data source.

Appendix: Environments of data sources

In a workspace in standard mode, the same data source has two different sets of configurations in the development environment and production environment. The configurations correspond to two databases or data warehouses at the underlying layer. You can configure different data source information for different environments. This way, the data source that is used for testing and the data source that is used for task scheduling in the production environment can be isolated, and data security in the production environment can be ensured. For example, if you specify different databases for the development environment and production environment when you add a data source, a batch synchronization task that uses the data source accesses different databases when you run the task. This way, the data in the development environment and the data in the production environment are isolated.

Note

示例

In a workspace in standard mode, a task accesses different data sources when it is run in different environments:

  • When the task is run in DataStudio and Operation Center in the development environment, the task accesses the data source in the development environment by default.

  • When the task is run in Operation Center in the task accesses the data source in the production environment by default.

Note
  • When you add a data source, you must check whether the database or data warehouse to which the data source in the development environment or production environment corresponds meets your business requirements. If the configurations of the data source in the development environment and those of the data source in the production environment are different, such as different database usernames and passwords, the following issues may occur:

    • The related task is successfully run in DataStudio but fails to be scheduled in the production environment.

    • The volume of data that is generated when the related task is run in DataStudio and the volume of data that is generated when the task is scheduled to run in the production environment are different.

    You can compare the run logs generated in the development environment and production environment for the task to troubleshoot the issue.

  • If the configurations of a data source in the development environment and those of the data source in the production environment are different, you must make sure that your resource group can separately connect to the data source in the development environment and the data source in the production environment.