Procedure of configuring an assignment node - DataWorks - Alibaba Cloud Documentation Center

If you want a node to pass its output to a descendant node, you can configure the current node as an assignment node. Assignment nodes support the Shell, ODPS SQL, and Python languages. An assignment node can assign the output of its last statement to the outputs parameter, which is a built-in output parameter of the node. This way, descendant nodes can reference the value of the outputs parameter. This topic describes how to use an assignment node.

Precautions

Features
- Assignment nodes can pass data only to their level-1 child nodes.
- An assignment node can use the outputs parameter to pass only the output of its last statement to its descendant nodes.
- The version of the Python language used by assignment nodes is Python 2.0.
- You cannot add comments to the code of assignment nodes. Otherwise, the result may be incorrect.
Edition and the outputs parameter
- For some types of nodes, you do not need to configure assignment nodes if you want to pass data between nodes. You can manually add the outputs parameter to Output Parameters or Input Parameters for the nodes. The outputs parameter functions the same way as an assignment node. For example, you can manually add the outputs parameter to Output Parameters or Input Parameters for the EMR Hive, EMR Spark SQL, ODPS Script, Hologres SQL, AnalyticDB for PostgreSQL, Click House SQL, and MySQL nodes. For more information about how to add the outputs parameter, see Configure input and output parameters.
- Only DataWorks Standard Edition or a more advanced edition supports assignment nodes and lets you use the outputs parameter for the EMR Hive, EMR Spark SQL, ODPS Script, Hologres SQL, AnalyticDB for PostgreSQL, and MySQL nodes. For information about how to activate DataWorks, see Purchasing guide.
To prevent a node that depends on an assignment node from failing to obtain a result set of the outputs parameter from the assignment node, you must run the workflow to which the current node belongs after the current node and the assignment node are configured.
- For information about how to debug and run a node, see the Debug and run nodes section in this topic.
- You can commit a node that depends on an assignment node and the assignment node to Operation Center in the development environment and test whether the referenced data is correct after the current node references the result set of the outputs parameter from the assignment node. For more information, see the Test the result set passed from an assignment node section in this topic.
  Note
  All nodes that depend on an assignment node can obtain the result set of the outputs parameter from the assignment node. No limits are imposed on the node type. This topic uses ODPS SQL and Shell nodes as examples to describe how to obtain the result set of the outputs parameter from an assignment node.
You can use an extract, transform, and load (ETL) workflow template to experience the capabilities of using an assignment node to pass the output of its last statement to descendant nodes of the assignment node.
- Only users that are assigned the Workspace Administrator role can import an ETL workflow template to a desired workspace. For more information about how to assign a Workspace Administrator role, see Manage permissions on workspace-level services.
- For information about how to import an ETL workflow template, see Quick experience with ETL workflow.
- For quick access to the ETL workflow template, click Assignment node application.

How it works

In DataWorks, input and output parameters are used to transmit parameter settings between ancestor and descendant nodes. An assignment node can assign the output of its last statement to the outputs parameter. If a node depends on the assignment node, you can configure the outputs parameter of the assignment node as an input parameter of the current node. This way, the current node can obtain the result set of the outputs parameter from the assignment node.

You cannot modify the outputs parameter of an assignment node. The value of the outputs parameter is determined by the output of the last statement in the code.
If a node wants to obtain the result set of the outputs parameter from an assignment node, you must make sure that the node is a level-1 child node of the assignment node, and the outputs parameter is added to Input Parameters for the node. You can specify a custom name for the added input parameter, such as sql_inputs in the preceding figure.
The output format of the outputs parameter varies based on the assignment language used by an assignment node. The result set of the outputs parameter or the specified data in the result set is passed to the descendant nodes of the assignment node in the ${Parameter name} format as a one-dimensional array or two-dimensional array.

Procedure of using an assignment node

Configure an assignment node: Define the result set of the outputs parameter of an assignment node. In this phase, you need to select the assignment language and determine the output of the last statement in the code for the assignment node.
Configure scheduling dependencies: Configure scheduling dependencies to allow a node to directly depend on the assignment node.
A downstream node can obtain the result set of an assignment node by choosing Node Context > Add Node Input Parameter. You can then reference the result set in the code using the `${ParameterName}` format. To retrieve specific data from the result set, you must use a one-dimensional or two-dimensional array based on the assignment language of the assignment node.
- Example 1 of obtaining the result set passed from an assignment node
- Example 2 of obtaining the result set passed from an assignment node
Debug and run the descendant node: Run the workflow to which the descendant node belongs to check whether the reference results are as expected.
Test the obtained result set: After the descendant node of the assignment node references the result set passed from the assignment node, you can commit the descendant node and the assignment node to Operation Center in the development environment and test whether the referenced data is correct.

Go to an entry point for creating an assignment node

Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Go to an entry point for creating an assignment node.
In the Scheduled Workflow pane of the DataStudio page, find the desired workflow and create an assignment node in the workflow. Configure basic information for the node, such as the name and storage path. The following figure shows the entry points.
In this example, three assignment nodes that use the Python, ODPS SQL, and Shell languages are created. The node names are fuzhi_python, fuzhi_sql, and fuzhi_shell.
The output format of the outputs parameter varies based on the assignment language. For more information, see the Output format of the outputs parameter section in this topic.
In addition, you can configure basic properties, time properties, and resource properties for all the nodes as needed. For more information, see Configure basic properties, Configure time properties, and Configure resource properties.

Output format of the outputs parameter

Assignment nodes support the Shell, ODPS SQL, and Python languages. The output format of the outputs parameter varies based on the assignment language. The result set of the outputs parameter or the specified data in the result set is passed to the descendant nodes of the assignment node in the ${Parameter name} format as a one-dimensional array or two-dimensional array.

Assignment Language	Value of the outputs parameter	Output format of the outputs parameter	Size limit on the value of the outputs parameter
ODPS SQL	The output of the SELECT statement in the last row is used as the value of the outputs parameter for the assignment node. This way, the output can be referenced by other nodes.	The data is passed to the descendant nodes of the assignment node as a two-dimensional array.	The value of the outputs parameter cannot exceed 2 MB in size. If the value exceeds 2 MB in size, the assignment node fails to run.
SHELL	The output of the ECHO statement in the last row is used as the value of the outputs parameter for the assignment node. This way, the output can be referenced by other nodes.	The data is passed to the descendant nodes of the assignment node as a one-dimensional array whose elements are separated by commas (,).
Python	The output of the PRINT statement in the last row is used as the value of the outputs parameter for the assignment node. This way, the output can be referenced by other nodes.	The data is passed to the descendant nodes of the assignment node as a one-dimensional array whose elements are separated by commas (,).

Example 1 of obtaining the result set passed from an assignment node

In this example, you can draw lines to configure the start node as the ancestor node of all assignment nodes and the down_compare node as the descendant node of all assignment nodes to establish dependencies among all the nodes. The down_compare node is a Shell node. The down_compare node references the result set or the specified data in the result set that is passed from the assignment nodes fuzhi_sql, fuzhi_python, and fuzhi_shell in the ${Parameter name} format as a one-dimensional array or two-dimensional array. 参数透传

Assignment nodes (fuzhi_python, fuzhi_sql, and fuzhi_shell): contain a built-in output parameter named outputs.
Descendant node (down_compare): After you configure the scheduling dependency, add the outputs parameter to Node Context > Input Parameters For This Node. You can specify a custom name for the input parameter.

The following sections describe the procedure details.

ODPS SQL assignment language

This section describes how to configure Output Parameters for fuzhi_sql and Input Parameters for down_compare.

Configure the upstream assignment node.
1. In the desired workflow, find ODPS SQL and double-click its name fuzhi_sql.
2. On the configuration tab of fuzhi_sql, select ODPS_SQL for Language and write value assignment code.
  Sample code:
```
select * from xc_dpe_e2.xc_rpt_user_info_d  where dt='20191008' limit 10;  
```
3. In the right-side navigation pane, click Properties. Then, configure Output Parameters in the Parameters section of the Properties tab.
  fuzhi_sql assigns the output of the code to the outputs parameter.

Configure down_compare.

In the desired workflow, find down_compare and double-click its name.

On the configuration tab of down_compare, write code.

Sample code:

echo '${sql_inputs}';
echo 'Retrieve the first row of output from the upstream SQL node: '${sql_inputs[0]};
echo 'Retrieve the second row of output from the upstream SQL node: '${sql_inputs[1]};
echo 'Retrieve the second field of the first row from the upstream SQL node output: '${sql_inputs[0][1]};
echo 'Retrieve the third field of the second row from the upstream SQL node output: '${sql_inputs[1][2]};

In the right-side navigation pane, click Properties. Then, configure Input Parameters in the Parameters section of the Properties tab.
Add the outputs parameter of fuzhi_sql to Input Parameters for down_compare and rename the parameter sql_inputs.

Debug and run the descendant node.

Python assignment statements

This section describes how to configure Output Parameters for fuzhi_python and Input Parameters for down_compare.

Configure the upstream assignment node.
1. In the desired workflow, find Python and double-click its name fuzhi_python.
2. On the configuration tab of fuzhi_python, select Python for Language and write value assignment code.
  For example:
```
print "a,b,c";
```
3. In the right-side navigation pane, click Properties. Then, configure Output Parameters in the Parameters section of the Properties tab.
  fuzhi_python assigns the output of the code to the outputs parameter. In this example, the output is a,b,c.
  When Python is used as the assignment language, the query result is split by commas (,) into a one-dimensional array and assigned to the outputs parameter in Output Parameters.
Configure the downstream node.
1. In the desired workflow, find down_compare and double-click its name.
2. You can write code on the code development page.
  For example:
```
echo 'This is the output of the upstream python node'${python_inputs};
echo 'Get the first data of the upstream python node output'${python_inputs[0]};
echo 'Get the second data of the upstream python node output'${python_inputs[1]};
```
3. In the right-side navigation pane, click Properties. Then, configure Input Parameters in the Parameters section of the Properties tab.
  Add the outputs parameter of fuzhi_python to Input Parameters for down_compare and rename the parameter python_inputs.
Run a debug Job.

Shell assignment language

This section describes how to configure Output Parameters for fuzhi_shell and Input Parameters for down_compare.

Configure the upstream assignment node.
1. In the desired workflow, find SHELL and double-click its name fuzhi_shell.
2. On the configuration tab of fuzhi_shell, select SHELL for Language and write value assignment code.
  For example:
```
echo "hello,world";
```
3. In the right-side navigation pane, click Properties. Then, configure Output Parameters in the Parameters section of the Properties tab.
  fuzhi_shell assigns the output of the code to the outputs parameter. In this example, the output is hello,world.
  When the assignment language is Shell, the output is split by commas (,) into a one-dimensional array and assigned to the outputs parameter in Output Parameters For This Node.
Configure down_compare.
1. In the desired workflow, find down_compare and double-click its name.
2. On the configuration tab of down_compare, write code.
  For example:
```
echo 'This is the output of the upstream shell node'${shell_inputs};
echo 'Retrieve the 1st data from the upstream shell node output'${shell_inputs[0]};
echo 'Retrieve the 2nd data from the upstream shell node output'${shell_inputs[1]};
```
3. In the right-side navigation pane, click Properties. Then, configure Input Parameters in the Parameters section of the Properties tab.
  Add the outputs parameter of fuzhi_shell to Input Parameters for down_compare and rename the parameter shell_inputs.
Debug and run the job.

Example 2 of obtaining the result set passed by an assignment node

The following table describes the value assignment cases of the outputs parameter for the assignment nodes that use different languages.


Assignment Language	Value of the outputs parameter	Configuration of scheduling parameters for assignment nodes	Configuration of scheduling parameters for descendant nodes	Method for descendant nodes to obtain data	Returned result of descendant nodes
ODPS SQL	Query the fuzhi_tb table. Statement: `SELECT * FROM fuzhi_tb;`. Result:	By default, an output parameter named Outputs is generated for an assignment node in the Scheduling Configuration > Node Context section. On the configuration tab of the assignment node, click the icon to commit the assignment node. For more information about how to configure input and output parameters, see Configure input and output parameters.	In the following example, the assignment node uses the ODPS SQL language. Configure scheduling dependencies between the assignment node and its descendant nodes. In the Parameters section of the Properties tab, add the input parameter named `inputs_odps_sql`.	Valid values for different types of descendant nodes: ODPS SQL: `select '${inputs_odps_sql[0][0]}';`. Shell: `echo '${inputs_shell[0]}';`. PyODPS 3: `print ('${inputs_python[0]}');`.	Hello
SHELL	Example: `echo 'Data','I am assignment node 2 using Shell language';`.				Data
Python	Example: `print "Works!,I am assignment node 3 using Python language";`.				Works!

Debug and run nodes

After a node that depends on an assignment node references the result set passed from the assignment node, you can double-click the name of the workflow to which the node belongs to open the configuration tab of the workflow, and click the icon in the top toolbar of the configuration tab of the workflow to run the workflow and check whether the reference results are as expected.

Note

If the descendant node of an assignment node is a For-each Node or a Do-while Node, you must go to Operation Center to run the descendant node and view the reference results.
For information about the best practices of using an assignment node together with a for-each node or a do-while node, see Configure a for-each node and Configure a do-while node.

Test the result set passed from an assignment node

After a node that depends on an assignment node references the result set passed from the assignment node, you can commit the current node and the assignment node to the development environment, and go to Operation Center in the development environment to backfill data to test whether the obtained result set is correct.