Dataflow managed I/O for databases

Dataflow supports reading from and writing to databases using Java Database Connectivity (JDBC). Dataflow uses the Apache Beam JdbcIO connector to connect to JDBC data sources and sinks. Managed I/O for Databases supports standard JDBC drivers for various databases, including PostgreSQL, MySQL, and Microsoft SQL Server.

Requirements

The following SDKs support managed I/O for databases:

  • Apache Beam SDK for Java version 2.69.0 or later
  • Apache Beam SDK for Python version 2.69.0 or later

Configuration

Managed I/O for Databases supports the following configuration parameters:

POSTGRES Read

Configuration Type Description
jdbc_url str Connection URL for the JDBC source.
connection_properties str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
fetch_size int32 This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location str Name of the table to read from.
num_partitions int32 The number of partitions
output_parallelization boolean Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column str Name of a column of numeric type that will be used for partitioning.
password str Password for the JDBC source.
read_query str SQL query used to query the JDBC source.
username str Username for the JDBC source.

POSTGRES Write

Configuration Type Description
jdbc_url str Connection URL for the JDBC sink.
autosharding boolean If true, enables using a dynamically determined number of shards to write.
batch_size int64 n/a
connection_properties str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location str Name of the table to write to.
password str Password for the JDBC source.
username str Username for the JDBC source.
write_statement str SQL query used to insert records into the JDBC sink.

MYSQL Read

Configuration Type Description
jdbc_url str Connection URL for the JDBC source.
connection_init_sql list[str] Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit boolean Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
fetch_size int32 This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location str Name of the table to read from.
num_partitions int32 The number of partitions
output_parallelization boolean Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column str Name of a column of numeric type that will be used for partitioning.
password str Password for the JDBC source.
read_query str SQL query used to query the JDBC source.
username str Username for the JDBC source.

MYSQL Write

Configuration Type Description
jdbc_url str Connection URL for the JDBC sink.
autosharding boolean If true, enables using a dynamically determined number of shards to write.
batch_size int64 n/a
connection_init_sql list[str] Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location str Name of the table to write to.
password str Password for the JDBC source.
username str Username for the JDBC source.
write_statement str SQL query used to insert records into the JDBC sink.

SQLSERVER Read

Configuration Type Description
jdbc_url str Connection URL for the JDBC source.
connection_properties str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit boolean Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
fetch_size int32 This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location str Name of the table to read from.
num_partitions int32 The number of partitions
output_parallelization boolean Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column str Name of a column of numeric type that will be used for partitioning.
password str Password for the JDBC source.
read_query str SQL query used to query the JDBC source.
username str Username for the JDBC source.

SQLSERVER Write

Configuration Type Description
jdbc_url str Connection URL for the JDBC sink.
autosharding boolean If true, enables using a dynamically determined number of shards to write.
batch_size int64 n/a
connection_properties str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location str Name of the table to write to.
password str Password for the JDBC source.
username str Username for the JDBC source.
write_statement str SQL query used to insert records into the JDBC sink.

What's next

For more information and code examples, see the following documents: