Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-55052

Add AQEShuffleRead properties to Physical Plan Tree

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.2.0
    • 4.2.0
    • SQL

    Description

      AQEShuffleRead can have local / coalesced / skewed / coalesced and skewed properties when reading shuffle files. When Physical Plan Tree is complex, it is hard to track this info by correlating with AQEShuffleRead details such as which AQEShuffleRead has local read or skewed partition info etc. For example, following skew SMJ case, this helps to understand which SMJ leg has AQEShuffleRead with skew. This addition aims to help this kind of use-cases at physical plan tree level. Plan Tree details section per AQEShuffleRead node also shows these properties but when query plan tree is too complex (e.g: composed by 1000+ physical nodes), it is hard to correlate this information with AQEShuffleRead details.

      Current Physical Plan Tree:

      == Physical Plan ==
      AdaptiveSparkPlan (24)
      +- == Final Plan ==
         ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
         +- * Project (16)
            +- * SortMergeJoin(skew=true) Inner (15)
               :- * Sort (7)
               :  +- AQEShuffleRead (6)
               :     +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB, rowCount=1.00E+3)
               :        +- Exchange (4)
               :           +- * Project (3)
               :              +- * Filter (2)
               :                 +- * Range (1)
               +- * Sort (14)
                  +- AQEShuffleRead (13)
                     +- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB, rowCount=200)
                        +- Exchange (11)
                           +- * Project (10)
                              +- * Filter (9)
                                 +- * Range (8)

      New Physical Plan Tree:

      == Physical Plan ==
      AdaptiveSparkPlan (24)
      +- == Final Plan ==
         ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
         +- * Project (16)
            +- * SortMergeJoin(skew=true) Inner (15)
               :- * Sort (7)
               :  +- AQEShuffleRead (6), coalesced
               :     +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB, rowCount=1.00E+3)
               :        +- Exchange (4)
               :           +- * Project (3)
               :              +- * Filter (2)
               :                 +- * Range (1)
               +- * Sort (14)
                  +- AQEShuffleRead (13), coalesced and skewed
                     +- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB, rowCount=200)
                        +- Exchange (11)
                           +- * Project (10)
                              +- * Filter (9)
                                 +- * Range (8)

      Attachments

        Issue Links

          Activity

            People

              erenavsarogullari Eren Avsarogullari
              erenavsarogullari Eren Avsarogullari
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: