Documentation Index

Fetch the complete documentation index at: https://docs.dataddo.com/llms.txt

Use this file to discover all available pages before exploring further.

Database Replication

Prev Next

Dataddo offers multiple data replication (extraction) methods across our supported database connectors (including PostgreSQL, MySQL, Microsoft SQL Server, Oracle, and MariaDB). Choosing the right method ensures efficient data syncing, maintains high data integrity, and minimizes performance overhead on your production database.

When configuring your database dataset, you can choose from four distinct extraction strategies based on your table structure and business requirements.

Comparison Overview

Replication Method Captures New Rows? Captures Updated Rows? Captures Deleted Rows? Performance Impact Best For
Table Replication by Timestamp Yes Yes No Low to Medium Frequently edited/updated tables
Table Replication by Row ID Yes No No Low Append-only logs and transactions
Log-based Replication Yes Yes Yes Very Low High-volume, real-time syncs
Custom SQL Query Highly Flexible Highly Flexible Highly Flexible Variable Complex filters and table joins

Table Replication by Timestamp

This method tracks changes based on a specific date, time, or timestamp column in your database table. During each sync, Dataddo queries only for rows where the timestamp value is newer than the last successful extraction window.

  • How it works: Dataddo isolates and loads data within a specific moving time window defined by your configuration.
  • Best for: Tables where existing records are regularly updated, modified, or cycled over time (e.g., user profiles, active order statuses, or inventory levels).
DATADDO TIP

To use this method effectively, your table must feature a column (such as last_updated, modified_at, or timestamp) that automatically updates via database triggers or application logic every time a row is modified.

Table Replication by Row ID

This method tracks changes using a sequential, continuously increasing numerical value. Dataddo stores the highest pointer value (high-watermark ID) from the previous sync and starts the next extraction from that exact point.

  • How it works: Dataddo checks for any row IDs greater than the last known index (WHERE id > last_stored_id), ensuring seamless data continuity.
  • Best for: Append-only tables where new records are constantly inserted, but historical records are never changed or deleted (e.g., transaction histories, immutable system logs, or event streams).

This method cannot capture updates made to historical rows. If an existing row with an older ID is modified, Dataddo will not sync the change. If your data is subject to post-insert modifications, use Table Replication by Timestamp or Log-based Replication instead.

Log-based Replication

Log-based replication utilizes Change Data Capture (CDC) to track data engine events directly from the database's internal transaction log rather than querying the active storage tables.

  • How it works: It acts as an event-driven system, instantaneously reading the log files to capture every INSERT, UPDATE, and DELETE action.
  • Best for: High-volume, mission-critical tables where near-real-time syncing is required, or where running heavy analytical queries causes too much performance degradation on the production master instance.

Custom SQL Query

For advanced scenarios, Dataddo allows you to completely bypass automated replication logic and write a custom SQL statement to control exactly how your data is extracted from the source engine.

  • How it works: Dataddo executes your exact custom script during the scheduled extraction window. You can leverage dynamic parameters like {{dateRangeStart}} and {{dateRangeStop}} to implement manual incremental logic within your text query.
  • Best for: Complex data architectural requirements, such as joining multiple relational tables together, applying sophisticated multi-conditional WHERE clauses, or pre-aggregating metrics before they reach your destination storage.
DATADDO TIP

Ensure your custom query is highly optimized. Poorly indexed queries or heavy, unindexed joins can result in slow extraction times or timeout errors on large datasets, affecting overall pipeline performance.