反欺诈业务 Elasticsearch 分页与导出问题分析及解决方案-CSDN博客

本文链接：https://blog.csdn.net/qq_25385555/article/details/149357922

我是如何在反欺诈系统中使用 Redis 缓存客户年龄信息，提升导出性能的（实战经验总结）

一、背景

我在开发一个反欺诈系统的开户流水导出功能时，需要导出约 7w 条开户流水数据，每条数据包含客户号、开户日期，并且要补充客户年龄字段。

年龄字段来源于客户基本信息表，数据存储在 Elasticsearch 中，字段是客户出生日期。

开发初期，我们采用一次性查询客户号的方式，但上线后发现：

大部分年龄字段为空，数据异常！

排查发现：一次性传入 7w 个客户号查询 ES，由于 index.max_result_window 默认限制为 10000，只返回了前 1w 条数据。

于是我们做了如下优化：

分批查询客户信息（每批 2000 条）
使用 filter 查询，不记分
只查询 custNo 和 birthDate 字段
使用 Java 8 并行流加快查询速度
引入 Redis 缓存客户出生日期信息，设置过期时间

二、我遇到的问题

1. 客户信息查询效率低

每次导出都要重新查询 ES，性能差
客户出生日期是静态数据，重复查询浪费资源

2. 一次性查询客户号数据不完整

一次性传入 7w 个客户号，ES 默认最多返回 1w 条数据
导致年龄字段缺失，数据异常

3. 未使用缓存，重复查询浪费资源

客户出生日期基本不变，每次导出都重新查 ES，浪费资源

三、我是怎么做的？

我最终采用了如下方案进行优化：

优化项	说明
分批查询	每次查 2000 个客户号，规避 ES 的 max_result_window 限制
使用 filter 查询	不记分，提升查询效率
只查 custNo 和 birthDate 字段	减少数据传输量
使用 Java 8 并行流	提升分批查询效率
引入 Redis 缓存客户出生日期	减少重复查询
设置缓存过期时间（如 1 天）	保证数据新鲜度

四、具体实现（Java + Elasticsearch + Redis）

✅ Redis 缓存工具类（使用 Spring Data Redis）：

@Component
public class RedisCache {

    @Autowired
    private RedisTemplate<String, String> redisTemplate;

    // 设置缓存，带过期时间
    public void setWithExpire(String key, String value, long timeout, TimeUnit unit) {
        redisTemplate.opsForValue().set(key, value, timeout, unit);
    }

    // 获取缓存
    public String get(String key) {
        return redisTemplate.opsForValue().get(key);
    }

    // 批量获取缓存
    public List<String> multiGet(List<String> keys) {
        return redisTemplate.opsForValue().multiGet(keys);
    }
}

✅ 分批查询客户信息 + 并行流 + Redis 缓存：

@Service
public class CustomerInfoService {

    @Autowired
    private RestHighLevelClient esClient;

    @Autowired
    private RedisCache redisCache;

    private static final int BATCH_SIZE = 2000;
    private static final String CACHE_KEY_PREFIX = "cust_birthdate_";

    public Map<String, String> getCustomerBirthDates(List<String> customerNos) throws IOException {
        Map<String, String> result = new HashMap<>();

        // 去重客户号
        List<String> uniqueCustNos = customerNos.stream().distinct().collect(Collectors.toList());

        // 先查 Redis 缓存
        List<String> cacheKeys = uniqueCustNos.stream()
                .map(custNo -> CACHE_KEY_PREFIX + custNo)
                .collect(Collectors.toList());

        List<String> cachedValues = redisCache.multiGet(cacheKeys);
        Map<String, String> cachedMap = new HashMap<>();

        for (int i = 0; i < uniqueCustNos.size(); i++) {
            String custNo = uniqueCustNos.get(i);
            String cachedValue = cachedValues.get(i);
            if (cachedValue != null) {
                cachedMap.put(custNo, cachedValue);
            }
        }

        // 筛出未缓存的客户号
        List<String> notCached = uniqueCustNos.stream()
                .filter(custNo -> !cachedMap.containsKey(custNo))
                .collect(Collectors.toList());

        // 分批查询 ES
        List<List<String>> batches = Lists.partition(notCached, BATCH_SIZE);

        batches.parallelStream().forEach(batch -> {
            try {
                SearchRequest searchRequest = new SearchRequest("customer_info_index");
                SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

                sourceBuilder.query(QueryBuilders.boolQuery()
                        .filter(QueryBuilders.termsQuery("custNo", batch)));

                sourceBuilder.fetchSource(new String[]{"custNo", "birthDate"}, null);
                sourceBuilder.size(batch.size());

                searchRequest.source(sourceBuilder);

                SearchResponse response = esClient.search(searchRequest, RequestOptions.DEFAULT);

                for (SearchHit hit : response.getHits()) {
                    Map<String, Object> source = hit.getSourceAsMap();
                    String custNo = source.get("custNo").toString();
                    String birthDate = source.get("birthDate").toString();

                    String cacheKey = CACHE_KEY_PREFIX + custNo;

                    // 放入结果 & 缓存
                    result.put(custNo, birthDate);
                    redisCache.setWithExpire(cacheKey, birthDate, 1, TimeUnit.DAYS);
                }

            } catch (IOException e) {
                e.printStackTrace();
            }
        });

        // 合并缓存和新查的数据
        result.putAll(cachedMap);

        return result;
    }
}

✅ 补充年龄字段逻辑：

public List<OpenAccountRecord> enrichWithAge(List<OpenAccountRecord> records) throws IOException {
    List<String> customerNos = records.stream()
            .map(OpenAccountRecord::getCustNo)
            .distinct()
            .collect(Collectors.toList());

    // 查询客户出生日期（优先 Redis 缓存，未命中则查 ES）
    Map<String, String> birthDateMap = getCustomerBirthDates(customerNos);

    // 补充年龄字段
    for (OpenAccountRecord record : records) {
        String custNo = record.getCustNo();
        String birthDate = birthDateMap.get(custNo);
        if (birthDate != null) {
            int age = calculateAge(birthDate);
            record.setAge(age);
        }
    }

    return records;
}

private int calculateAge(String birthDateStr) {
    LocalDate birthDate = LocalDate.parse(birthDateStr, DateTimeFormatter.ofPattern("yyyy-MM-dd"));
    LocalDate now = LocalDate.now();
    return now.getYear() - birthDate.getYear();
}

五、优化亮点总结

优化点	说明
分批查询	避免一次性查询超过 ES 限制
filter 查询	不记分，提升性能
只查必要字段	减少数据传输
并行流处理	提升查询效率
Redis 缓存客户出生日期	减少重复查询，支持分布式
设置缓存过期时间（1 天）	保证数据新鲜度
构建映射表	便于字段补充，代码结构清晰