-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Spark 4.0: Add removed_delete_files_count to result of RewriteDataFilesProcedure #13657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 4.0: Add removed_delete_files_count to result of RewriteDataFilesProcedure #13657
Conversation
"failed_data_files_count", DataTypes.IntegerType, false, Metadata.empty()) | ||
"failed_data_files_count", DataTypes.IntegerType, false, Metadata.empty()), | ||
new StructField( | ||
"removed_delete_files_count", DataTypes.IntegerType, false, Metadata.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update rewrite_data_files
section in spark-procedures.md
?
https://iceberg.apache.org/docs/latest/spark-procedures/#output_8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. If we update the docs, do we also need to update the behavior for other Spark versions to keep things consistent? Otherwise, it might be confusing for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these will be follow-up PRs if this change is accepted by the community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to expose removedDeleteFilesCount
to the output. cc @dramaticlly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think change make senses, once we complete the backport of this to all spark versions (spark 3.4/3.5) then we can update the spark procedures documentation to reflect this new fields in the ouput
thanks @manuzhang for the improvement and everyone for the review |
…ataFilesProcedure Back-port of apache#13657
No description provided.