Dataddo offers multiple data replication (extraction) methods across our supported database connectors (including PostgreSQL, MySQL, Microsoft SQL Server, Oracle, and MariaDB). Choosing the right method ensures efficient data syncing, maintains high data integrity, and minimizes performance overhead on your production database.
When configuring your database dataset, you can choose from four distinct extraction strategies based on your table structure and business requirements.
Comparison Overview
| Replication Method | Captures New Rows? | Captures Updated Rows? | Captures Deleted Rows? | Performance Impact | Best For |
|---|---|---|---|---|---|
| Table Replication by Timestamp | Yes | Yes | No | Low to Medium | Frequently edited/updated tables |
| Table Replication by Row ID | Yes | No | No | Low | Append-only logs and transactions |
| Log-based Replication | Yes | Yes | Yes | Very Low | High-volume, real-time syncs |
| Custom SQL Query | Highly Flexible | Highly Flexible | Highly Flexible | Variable | Complex filters and table joins |
Table Replication by Timestamp
This method tracks changes based on a specific date, time, or timestamp column in your database table. During each sync, Dataddo queries only for rows where the timestamp value is newer than the last successful extraction window.
- How it works: Dataddo isolates and loads data within a specific moving time window defined by your configuration.
- Best for: Tables where existing records are regularly updated, modified, or cycled over time (e.g., user profiles, active order statuses, or inventory levels).
To use this method effectively, your table must feature a column (such as last_updated, modified_at, or timestamp) that automatically updates via database triggers or application logic every time a row is modified.
Table Replication by Row ID
This method tracks changes using a sequential, continuously increasing numerical value. Dataddo stores the highest pointer value (high-watermark ID) from the previous sync and starts the next extraction from that exact point.
- How it works: Dataddo checks for any row IDs greater than the last known index (
WHERE id > last_stored_id), ensuring seamless data continuity. - Best for: Append-only tables where new records are constantly inserted, but historical records are never changed or deleted (e.g., transaction histories, immutable system logs, or event streams).
This method cannot capture updates made to historical rows. If an existing row with an older ID is modified, Dataddo will not sync the change. If your data is subject to post-insert modifications, use Table Replication by Timestamp or Log-based Replication instead.
Log-based Replication
Log-based replication utilizes Change Data Capture (CDC) to track data engine events directly from the database's internal transaction log rather than querying the active storage tables.
- How it works: It acts as an event-driven system, instantaneously reading the log files to capture every
INSERT,UPDATE, andDELETEaction. - Best for: High-volume, mission-critical tables where near-real-time syncing is required, or where running heavy analytical queries causes too much performance degradation on the production master instance.
Custom SQL Query
For advanced scenarios, Dataddo allows you to completely bypass automated replication logic and write a custom SQL statement to control exactly how your data is extracted from the source engine.
- How it works: Dataddo executes your exact custom script during the scheduled extraction window. You can leverage dynamic parameters like
{{dateRangeStart}}and{{dateRangeStop}}to implement manual incremental logic within your text query. - Best for: Complex data architectural requirements, such as joining multiple relational tables together, applying sophisticated multi-conditional
WHEREclauses, or pre-aggregating metrics before they reach your destination storage.
Ensure your custom query is highly optimized. Poorly indexed queries or heavy, unindexed joins can result in slow extraction times or timeout errors on large datasets, affecting overall pipeline performance.