# Spring Boot + MyBatis-Plus + Flyway 大批量 SQL Server 到 PostgreSQL 数据迁移方案
对于大批量数据迁移,我们需要优化性能并处理可能的内存问题。下面是一个完整的解决方案:
## 1. 优化后的依赖配置 (pom.xml)
```xml
<dependencies>
<!-- Spring Boot Starters -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<!-- MyBatis-Plus -->
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
<version>3.5.3.1</version>
</dependency>
<!-- 数据库驱动 -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<scope>runtime</scope>
</dependency>
<!-- Flyway -->
<dependency>
<groupId>org.flywaydb</groupId>
<artifactId>flyway-core</artifactId>
</dependency>
<!-- 批处理性能优化 -->
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-core</artifactId>
</dependency>
<dependency>
<groupId>com.zaxxer</groupId>
<artifactId>HikariCP</artifactId>
</dependency>
<!-- 工具类 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
```
## 2. 数据源配置 (application.yml)
```yaml
spring:
datasource:
source:
jdbc-url: jdbc:sqlserver://localhost:1433;databaseName=source_db
username: sa
password: your_password
driver-class-name: com.microsoft.sqlserver.jdbc.SQLServerDriver
hikari:
maximum-pool-size: 10
connection-timeout: 30000
target:
jdbc-url: jdbc:postgresql://localhost:5432/target_db
username: postgres
password: postgres
driver-class-name: org.postgresql.Driver
hikari:
maximum-pool-size: 15
connection-timeout: 30000
flyway:
enabled: true
locations: classpath:db/migration
baseline-on-migrate: true
# 只对目标数据源进行迁移
url: ${spring.datasource.target.jdbc-url}
user: ${spring.datasource.target.username}
password: ${spring.datasource.target.password}
```
## 3. 多数据源配置类
```java
import com.zaxxer.hikari.HikariDataSource;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.boot.jdbc.DataSourceBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;
import org.springframework.jdbc.core.JdbcTemplate;
import javax.sql.DataSource;
@Configuration
public class DataSourceConfig {
@Bean(name = "sourceDataSource")
@ConfigurationProperties(prefix = "spring.datasource.source")
public DataSource sourceDataSource() {
return DataSourceBuilder.create().type(HikariDataSource.class).build();
}
@Primary
@Bean(name = "targetDataSource")
@ConfigurationProperties(prefix = "spring.datasource.target")
public DataSource targetDataSource() {
return DataSourceBuilder.create().type(HikariDataSource.class).build();
}
@Bean(name = "sourceJdbcTemplate")
public JdbcTemplate sourceJdbcTemplate() {
return new JdbcTemplate(sourceDataSource());
}
@Bean(name = "targetJdbcTemplate")
public JdbcTemplate targetJdbcTemplate() {
return new JdbcTemplate(targetDataSource());
}
}
```
## 4. 批量迁移服务实现
```java
import lombok.extern.slf4j.Slf4j;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.util.StopWatch;
import java.util.List;
import java.util.Map;
@Slf4j
@Service
public class BulkDataMigrationService {
private final JdbcTemplate sourceJdbcTemplate;
private final JdbcTemplate targetJdbcTemplate;
private static final int BATCH_SIZE = 5000;
public BulkDataMigrationService(JdbcTemplate sourceJdbcTemplate,
JdbcTemplate targetJdbcTemplate) {
this.sourceJdbcTemplate = sourceJdbcTemplate;
this.targetJdbcTemplate = targetJdbcTemplate;
}
public void migrateTable(String tableName, List<String> columnNames) {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
log.info("开始迁移表: {}", tableName);
// 1. 获取总行数
long totalCount = sourceJdbcTemplate.queryForObject(
"SELECT COUNT(*) FROM " + tableName, Long.class);
log.info("表 {} 共有 {} 条记录", tableName, totalCount);
// 2. 分批迁移
int migratedCount = 0;
int offset = 0;
String columns = String.join(", ", columnNames);
String placeholders = String.join(", ",
columnNames.stream().map(c -> "?").toArray(String[]::new));
while (offset < totalCount) {
// 使用游标方式读取数据,避免内存溢出
List<Map<String, Object>> batchData = sourceJdbcTemplate.queryForList(
"SELECT " + columns + " FROM " + tableName +
" ORDER BY (SELECT NULL) OFFSET ? ROWS FETCH NEXT ? ROWS ONLY",
offset, BATCH_SIZE);
if (batchData.isEmpty()) {
break;
}
// 批量插入目标数据库
batchInsert(tableName, columns, placeholders, batchData);
migratedCount += batchData.size();
offset += BATCH_SIZE;
log.info("已迁移 {}/{} 条记录 ({}%)",
migratedCount, totalCount,
String.format("%.2f", (migratedCount * 100.0 / totalCount)));
}
stopWatch.stop();
log.info("表 {} 迁移完成,共 {} 条记录,耗时 {}",
tableName, migratedCount, stopWatch.getTotalTimeSeconds());
}
@Transactional(transactionManager = "targetTransactionManager")
private void batchInsert(String tableName, String columns,
String placeholders, List<Map<String, Object>> batchData) {
String sql = String.format("INSERT INTO %s (%s) VALUES (%s)",
tableName, columns, placeholders);
targetJdbcTemplate.batchUpdate(sql, batchData.stream()
.map(row -> columnNames.stream()
.map(row::get)
.toArray())
.toArray(Object[][]::new));
}
// 处理特殊数据类型转换
private Object convertDataType(Object value) {
if (value instanceof java.sql.Timestamp) {
return value.toString(); // 转换为字符串或适当格式
}
// 添加其他需要转换的类型
return value;
}
}
```
## 5. Flyway 迁移脚本 (V1__Create_schema.sql)
```sql
-- PostgreSQL 表结构
CREATE TABLE IF NOT EXISTS users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100),
age INTEGER,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
-- 创建其他表...
-- 注意处理 SQL Server 和 PostgreSQL 之间的数据类型差异
```
## 6. 主应用类
```java
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import java.util.Arrays;
import java.util.List;
@SpringBootApplication
public class BulkDataMigrationApplication {
public static void main(String[] args) {
SpringApplication.run(BulkDataMigrationApplication.class, args);
}
@Bean
public CommandLineRunner run(BulkDataMigrationService migrationService) {
return args -> {
// 配置要迁移的表和列
migrationService.migrateTable("users",
Arrays.asList("username", "email", "age", "created_at", "updated_at"));
// 可以添加更多表
// migrationService.migrateTable("products", ...);
};
}
}
```
## 7. 事务管理器配置
```java
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.datasource.DataSourceTransactionManager;
import org.springframework.transaction.PlatformTransactionManager;
import javax.sql.DataSource;
@Configuration
public class TransactionManagerConfig {
@Bean(name = "targetTransactionManager")
public PlatformTransactionManager targetTransactionManager(DataSource targetDataSource) {
return new DataSourceTransactionManager(targetDataSource);
}
}
```
## 大批量迁移优化策略
1. **分批处理**:使用 OFFSET-FETCH 分页查询避免内存溢出
2. **批量插入**:使用 JdbcTemplate 的 batchUpdate 方法
3. **连接池优化**:配置合适的 HikariCP 连接池大小
4. **并行处理**:对不相关的表可以并行迁移
5. **数据类型转换**:处理 SQL Server 和 PostgreSQL 之间的数据类型差异
6. **错误恢复**:记录已迁移的数据位置,支持断点续传
7. **性能监控**:使用 StopWatch 监控迁移速度
## 扩展功能实现
### 1. 断点续传功能
```java
public class MigrationState {
private String tableName;
private long lastMigratedId;
private int migratedCount;
// getters and setters
}
@Service
public class MigrationStateService {
private final JdbcTemplate targetJdbcTemplate;
public void saveState(MigrationState state) {
targetJdbcTemplate.update(
"INSERT INTO migration_state (table_name, last_id, count) VALUES (?, ?, ?) " +
"ON CONFLICT (table_name) DO UPDATE SET last_id = ?, count = ?",
state.getTableName(), state.getLastMigratedId(), state.getMigratedCount(),
state.getLastMigratedId(), state.getMigratedCount());
}
public MigrationState loadState(String tableName) {
// 从数据库加载状态
}
}
```
### 2. 并行迁移控制器
```java
@Service
public class ParallelMigrationController {
private final ExecutorService executor = Executors.newFixedThreadPool(5);
private final BulkDataMigrationService migrationService;
public void migrateAllTables() {
List<CompletableFuture<Void>> futures = new ArrayList<>();
futures.add(CompletableFuture.runAsync(() ->
migrationService.migrateTable("users", ...), executor));
futures.add(CompletableFuture.runAsync(() ->
migrationService.migrateTable("products", ...), executor));
// 等待所有迁移完成
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
}
}
```
## 注意事项
1. **数据类型映射**:特别注意 datetime、text、uniqueidentifier 等类型的转换
2. **索引和约束**:迁移完成后在目标数据库创建索引,提高迁移速度
3. **内存管理**:监控 JVM 内存使用情况,适当调整批处理大小
4. **网络带宽**:如果数据库位于不同服务器,考虑网络传输速度
5. **日志记录**:详细记录迁移过程,便于排查问题
6. **验证机制**:迁移完成后进行数据一致性校验
这个方案可以处理千万级甚至更大规模的数据迁移,通过分批处理和批量插入优化性能,同时避免内存溢出问题。