Spark-StructuredStreaming 写入 elastic 动态索引

最新推荐文章于 2021-12-10 21:29:04 发布

2.wa

最新推荐文章于 2021-12-10 21:29:04 发布

阅读量1.3k

点赞数

CC 4.0 BY-SA版权

分类专栏： Spark Elasticsearch 文章标签： Structured Streaming spark elastic es-hadoop

本文链接：https://blog.csdn.net/xiaohulunb/article/details/90518801

Spark 同时被 2 个专栏收录

13 篇文章

订阅专栏

Elasticsearch

8 篇文章

订阅专栏

本文是 Spark Structured Streaming 写入 elastic 动态索引支持的测试用例。输入源为 csv 文件，给出了示例数据。输出端为 es，动态设置输出索引，最终写入 elastic 的索引有多种。还提供了官网文档链接和伪代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Spark Structured Streaming 写入 elastic 动态索引支持测试用例

ES 官方文档 https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-sql-streaming

输入源 csv 文件

示例数据如下

liwei,20,中国,2019-05-14
liwei,10,中国,2019-06-15
zhangsan,20,中国,2019-05-16
zhangsan,10,中国,2019-06-17

输出端 es

输出索引动态设置为 "structured.es.example.{name}.{date|yyyy-MM}/_doc"
官网文档 https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#cfg-multi-writes-format
最终写入 elastic 索引为

structured.es.example.zhangsan.2019-06
structured.es.example.zhangsan.2019-05
structured.es.example.liwei.2019-06
structured.es.example.liwei.2019-05

伪代码

/**
  * Spark Structured Streaming 写入 elastic 动态索引支持测试用例
  * @author wei.Li by 2019-05-24
  */
object StructuredEs {

  def main(args: Array[String]): Unit = {

    val spark = SparkSession.builder
      .config(
        new SparkConf()
          .setAppName("StructuredEs")
          .setMaster("local")
          .set("es.nodes", "ES-IP:9201")
          .set("es.nodes.wan.only", "true")
      )
      .getOrCreate()

    // 测试用例数据对应schema , 重点说明: date 日期数据格式为 `yyyy-MM-dd`
    val userSchema = new StructType()
      .add("name", "string").add("age", "integer").add("address", "string").add("date", "string")

    spark
      .readStream
      .option("sep", ",")
      .schema(userSchema)
      .csv("/data/csv/") // csv 文件所在目录
      .writeStream
      .format("es")
      .outputMode(OutputMode.Append())
      .option("checkpointLocation", "file:/data/job/spark/checkpointLocation/example/StructuredEs")
      .start("structured.es.example.{name}.{date|yyyy-MM}") // 写出索引配置，ES 7+ 无需配置 type
      .awaitTermination()
  }
}