Structured Streaming之outputMode(complete和append)区别说明

本文深入探讨了Apache Spark Streaming中的两种输出模式:complete和append。详细对比了它们在处理流数据时的区别,尤其是在聚合操作上的不同表现。通过具体代码示例,展示了不同模式下可能遇到的异常情况,为开发者正确选择输出模式提供了指导。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.complete需要聚合,并将原先批次的数据和本次批次的数据一起聚合,而append是不能聚合的

2.若用append替换complete代码演示:

def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder().master("local[1]").getOrCreate()
        import spark.implicits._
        val wordCounts  = spark.readStream.text("D:\\tmp\\streaming\\struct")
            .as[String].flatMap(_.split(" "))
            .groupBy("value").count()


        val query = wordCounts.writeStream
                .foreach(new TestForeachWriter())
            .outputMode("complete")//complete  append
            .trigger(ProcessingTime("10 seconds"))
            .start()
        query.awaitTermination()

    }
//正常运行,若将complete改为append,将报以下错误
org.apache.spark.sql.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets;;

3.若用complete替换append代码演示:

def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder().master("local[1]").getOrCreate()
        import spark.implicits._
        val wordCounts  = spark.readStream.text("D:\\tmp\\streaming\\struct")
                    .as[String].flatMap(_.split(" ")).map(T1(_,1)).toDF()

        val query = wordCounts.writeStream
                .foreach(new TestForeachWriter())
            .outputMode("append")//complete  append
            .trigger(ProcessingTime("10 seconds"))
            .start()
        query.awaitTermination()

    }
    case class T1(value:String,num:Int)
//若用complete替换append,将报以下错误
org.apache.spark.sql.AnalysisException: Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;;

4. 源码:

/**
   * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
   *   - `append`:   only the new rows in the streaming DataFrame/Dataset will be written to
   *                 the sink
   *   - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink
   *                 every time these is some updates
   *
   * @since 2.0.0
   */
  def outputMode(outputMode: String): DataStreamWriter[T] = {
    this.outputMode = outputMode.toLowerCase match {
      case "append" =>
        OutputMode.Append
      case "complete" =>
        OutputMode.Complete
      case _ =>
        throw new IllegalArgumentException(s"Unknown output mode $outputMode. " +
          "Accepted output modes are 'append' and 'complete'")
    }
    this
  }

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值