DataWorks provides Hologres Reader and Hologres Writer for you to read data from and write data to Hologres data sources. This topic describes the capabilities of synchronizing data from or to Hologres data sources.
Supported Hologres versions
Hologres V0.7, V0.8, V0.9, V0.10, V1.1, V1.2, and V1.3
Limits
Batch data read and write
Hologres data sources support serverless resource groups (recommended)and exclusive resource groups for Data Integration.
Hologres Writer cannot write data to Hologres foreign tables.
Logic based on which the endpoint of a Hologres data source is obtained during the establishment of a network connection between the Hologres data source and a resource group:
If the Hologres data source resides in the same region as the resource group, the endpoint of the Hologres data source is obtained in the following order:
.If the Hologres data source resides in a different region from the resource group, the endpoint of the Hologres data source is obtained in the following order:
.
Real-time data write in a database
You can use serverless resource groups (recommended) or exclusive resource groups for Data Integration to run real-time synchronization tasks.
A real-time synchronization task cannot be used to synchronize data from a table that has no primary key.
Supported data types
Data type | Hologres Reader for batch data read | Hologres Writer for batch data write | Hologres Writer for real-time data write |
UUID | Not supported | Not supported | Not supported |
CHAR | Supported | Supported | Supported |
NCHAR | Supported | Supported | Supported |
VARCHAR | Supported | Supported | Supported |
LONGVARCHAR | Supported | Supported | Supported |
NVARCHAR | Supported | Supported | Supported |
LONGNVARCHAR | Supported | Supported | Supported |
CLOB | Supported | Supported | Supported |
NCLOB | Supported | Supported | Supported |
SMALLINT | Supported | Supported | Supported |
TINYINT | Supported | Supported | Supported |
INTEGER | Supported | Supported | Supported |
BIGINT | Supported | Supported | Supported |
NUMERIC | Supported | Supported | Supported |
DECIMAL | Supported | Supported | Supported |
FLOAT | Supported | Supported | Supported |
REAL | Supported | Supported | Supported |
DOUBLE | Supported | Supported | Supported |
TIME | Supported | Supported | Supported |
DATE | Supported | Supported | Supported |
TIMESTAMP | Supported | Supported | Supported |
BINARY | Supported | Supported | Supported |
VARBINARY | Supported | Supported | Supported |
BLOB | Supported | Supported | Supported |
LONGVARBINARY | Supported | Supported | Supported |
BOOLEAN | Supported | Supported | Supported |
BIT | Supported | Supported | Supported |
JSON | Supported | Supported | Supported |
JSONB | Supported | Supported | Supported |
How Hologres Reader and Hologres Writer work
Batch data read
Hologres Reader reads data from Hologres tables by using PostgreSQL statements. The number of parallel threads that are used to read data is based on the number of shards in the Hologres table from which you want to read data. One SELECT statement is executed for each shard.
When you execute the
CREATE TABLE
statement to create a table in Hologres, you can use theCALL set_table_property('table_name', 'shard_count', 'xx')
command to configure the number of shards for the table.By default, the shard_count field is set to the default number of table shards for your Hologres database. The configurations of your Hologres instance determine the default number of table shards for your Hologres database.
A SELECT statement uses the shard that is specified by the built-in field hg_shard_id of the source Hologres table to query data.
Batch data write
Hologres Writer obtains data from a reader and writes data to Hologres based on the conflict processing mode that is configured for the related batch synchronization task.
You can configure a conflict processing mode for the batch synchronization task to control how to process conflicting data in a primary key conflict.
You can configure a conflict processing mode only for a table that has a primary key. For information about the principle based on which data is written to Hologres when a primary key conflict occurs, and data write performance, see How it works.
If you configure the replacement mode for the batch synchronization task, new data overwrites the existing data, and all fields in a row are overwritten. NULL is written to the fields that have no mapped source fields as the value.
If you configure the update mode for the batch synchronization task, new data overwrites the existing data, and only fields that have mapped fields are overwritten.
If you configure the ignoring mode for the batch synchronization task, existing data is retained, and new data is ignored.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Configure a real-time synchronization task to synchronize data of a single table or synchronize all data in a database
For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.
Configure synchronization settings to implement real-time synchronization of full data or incremental data in a single table or a database
For more information about the configuration procedure, see Configure a synchronization task in Data Integration.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Code for Hologres Reader
Read data from a non-partitioned Hologres table
In the following code, a synchronization task is configured to read data from a non-partitioned Hologres table:
{ "transform": false, "type": "job", "version": "2.0", "steps": [ { "stepType": "holo", "parameter": { "datasource": "holo_db", "envType": 1, "column": [ "tag", "id", "title", "body" ], "where": "", "table": "holo_reader_basic_src" }, "name": "Reader", "category": "reader" }, { "stepType": "stream", "parameter": { "print": false, "fieldDelimiter": "," }, "name": "Writer", "category": "writer" } ], "setting": { "executeMode": null, "failoverEnable": null, "errorLimit": { "record": "0" }, "speed": { "concurrent": 2, "throttle": false } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
The following sample DDL statements are used to create a non-partitioned Hologres table:
begin; drop table if exists holo_reader_basic_src; create table holo_reader_basic_src( tag text not null, id int not null, title text not null, body text, primary key (tag, id)); call set_table_property('holo_reader_basic_src', 'orientation', 'column'); call set_table_property('holo_reader_basic_src', 'shard_count', '3'); commit;
Read data from a partition in a partitioned Hologres table
Configure reading data from a Hologres partition into memory.
NoteExercise caution when you configure the partition parameter.
{ "transform": false, "type": "job", "version": "2.0", "steps": [ { "stepType": "holo", "parameter": { "selectedDatabase": "public", "partition": "tag=foo", "datasource": "holo_db", "envType": 1, "column": [ "tag", "id", "title", "body" ], "tableComment": "", "where": "", "table": "public.holo_reader_basic_part_src" }, "name": "Reader", "category": "reader" }, { "stepType":"stream", "parameter":{}, "name":"Writer", "category":"writer" } ], "setting":{ "errorLimit":{ "record":"0" }, "speed":{ "throttle":true, "concurrent":1, "mbps":"12" } }, "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] } }
The following sample DDL statements are used to create a partitioned Hologres table:
begin; drop table if exists holo_reader_basic_part_src; create table holo_reader_basic_part_src( tag text not null, id int not null, title text not null, body text, primary key (tag, id)) partition by list( tag ); call set_table_property('holo_reader_basic_part_src', 'orientation', 'column'); call set_table_property('holo_reader_basic_part_src', 'shard_count', '3'); commit; create table holo_reader_basic_part_src_1583161774228 partition of holo_reader_basic_part_src for values in ('foo'); # Make sure that the child partitioned table from which you want to read data is created and data is inserted into the child partitioned table. postgres=# \d+ holo_reader_basic_part_src Table "public.holo_reader_basic_part_src" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+----------+--------------+------------- tag | text | | not null | | extended | | id | integer | | not null | | plain | | title | text | | not null | | extended | | body | text | | | | extended | | Partition key: LIST (tag) Indexes: "holo_reader_basic_part_src_pkey" PRIMARY KEY, btree (tag, id) Partitions: holo_reader_basic_part_src_1583161774228 FOR VALUES IN ('foo')
Parameters in code for Hologres Reader
Parameter | Description | Required | Default value |
database | The name of the source database in the Hologres instance. | Yes | No default value |
table | The name of the table from which you want to read data. If the table is a partitioned table, specify the name of the table instead of the name of the partition from which you want to read data. | Yes | No default value |
column | The names of the columns from which you want to read data. The names of the primary key columns in the source table must be included. If you want to read data from all columns in the source table, set this parameter to | Yes | No default value |
partition | The partition key column and the related value of the source table, in the format of Important
| No | Left empty, which indicates that the source table is a non-partitioned table |
Code for Hologres Writer
Write data to a non-partitioned table
In the following code, a synchronization task is configured to write the data obtained from MySQL to a non-partitioned Hologres table in JDBC mode:
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "mysql", "parameter": { "envType": 0, "useSpecialSecret": false, "column": [ "<column1>", "<column2>", ......, "<columnN>" ], "tableComment": "", "connection": [ { "datasource": "<mysql_source_name>",//MySQL data source name "table": [ "<mysql_table_name>" ] } ], "where": "", "splitPk": "", "encoding": "UTF-8" }, "name": "Reader", "category": "reader" }, { "stepType": "holo", "parameter": { "selectedDatabase":"public", "schema": "public", "maxConnectionCount": 9, "truncate":true,//Cleanup rule "datasource": "<holo_sink_name>",//Hologres data sink name "conflictMode": "ignore", "envType": 0, "column": [ "<column1>", "<column2>", ......, "<columnN>" ], "tableComment": "", "table": "<holo_table_name>" }, "name": "Writer", "category": "writer" } ], "setting": { "executeMode": null, "errorLimit": { "record": "0" }, "locale": "zh_CN", "speed": { "concurrent": 2,//Number of concurrent jobs "throttle": false } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
The following sample DDL statements are used to create a non-partitioned Hologres table:
begin; drop table if exists mysql_to_holo_test; create table mysql_to_holo_test( tag text not null, id int not null, body text not null, brrth date, primary key (tag, id)); call set_table_property('mysql_to_holo_test', 'orientation', 'column'); call set_table_property('mysql_to_holo_test', 'distribution_key', 'id'); call set_table_property('mysql_to_holo_test', 'clustering_key', 'birth'); commit;
Write data to a partitioned table
NoteHologres supports only list partitioning. You can specify only one column as the partition key column. The data type of the partition key column must be INT4 or TEXT.
The value of the partition parameter must match the partition filter expression in the DDL statements that are used to create the destination Hologres table.
In the following code, a synchronization task is configured to write the data obtained from MySQL to a child partitioned table of a partitioned Hologres table:
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "mysql", "parameter": { "envType": 0, "useSpecialSecret": false, "column": [ "<column1>", "<column2>", ......, "<columnN>" ], "tableComment": "", "connection": [ { "datasource": "<mysql_source_name>", "table": [ "<mysql_table_name>" ] } ], "where": "", "splitPk": "<mysql_pk>",//PK field in MySQL "encoding": "UTF-8" }, "name": "Reader", "category": "reader" }, { "stepType": "holo", "parameter": { "selectedDatabase": "public", "maxConnectionCount": 9, "partition": "<partition_key>",//Partition key in Hologres "truncate": "false", "datasource": "<holo_sink_name>",//Hologres data sink name "conflictMode": "ignore", "envType": 0, "column": [ "<column1>", "<column2>", ......, "<columnN>" ], "tableComment": "", "table": "<holo_table_name>" }, "name": "Writer", "category": "writer" } ], "setting": { "executeMode": null, "failoverEnable": null, "errorLimit": { "record": "0" }, "speed": { "concurrent": 2,//Number of concurrent jobs "throttle": false } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
The following sample DDL statements are used to create a partitioned Hologres table:
BEGIN; CREATE TABLE public.hologres_parent_table( a text , b int, c timestamp, d text, ds text, primary key(ds,b) ) PARTITION BY LIST(ds); CALL set_table_property('public.hologres_parent_table', 'orientation', 'column'); CREATE TABLE public.holo_child_1 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201215'); CREATE TABLE public.holo_child_2 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201216'); CREATE TABLE public.holo_child_3 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201217'); COMMIT;
Parameters in code for Hologres Writer
Parameter | Description | Required | Default value |
database | The name of the destination database in the Hologres instance. | Yes | No default value |
table | The name of the Hologres table to which you want to write data. You can specify the table name in the format of | Yes | No default value |
conflictMode | The conflict processing mode. Valid values: Replace, Update, and Ignore. For more information, see How Hologres Reader and Hologres Writer work. | Yes | No default value |
column | The names of the columns to which you want to write data. The names of the primary key columns in the destination Hologres table must be included. If you want to write data to all the columns in the destination Hologres table, set this parameter to | Yes | No default value |
partition | The partition key column and the related value in the destination Hologres table. Configure this parameter in the format of Note
| No | Left empty, which indicates that the destination Hologres table is a non-partitioned table |
truncate | Specifies whether to delete the existing data in the destination Hologres table before Hologres Writer writes data to the table.
| No | false |