ES 官网提供了一套Spark写ES接口
参见 : https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html
在工作实践中使用 Spark Streaming 写 ES 发现性能太差了,经研究发现,这套接口基于es底层http的restful接口而实现,
我另辟蹊径,采用TCP通信方式,写ES,性能获得了极大提升。
经验分享给大家,
写ES 代码片段如下
resDF.foreachPartition { (iterRecords: Iterator[Row]) =>
val settings = Settings.settingsBuilder.put("cluster.name", "myES").build
val client = TransportClient.builder.settings(settings).build
client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("10.200.8.187"), 9300))
val bulkRequest = client.prepareBulk
iterRecords.foreach((row: Row) => {
val jsonMap = new util.HashMap[String, String]
index_field.foreach { case (index,