Skip to content

Spark 4.0: Add removed_delete_files_count to result of RewriteDataFilesProcedure #13657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 18, 2025

Conversation

manuzhang
Copy link
Collaborator

No description provided.

@github-actions github-actions bot added the spark label Jul 24, 2025
"failed_data_files_count", DataTypes.IntegerType, false, Metadata.empty())
"failed_data_files_count", DataTypes.IntegerType, false, Metadata.empty()),
new StructField(
"removed_delete_files_count", DataTypes.IntegerType, false, Metadata.empty())
Copy link
Contributor

@ebyhr ebyhr Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. If we update the docs, do we also need to update the behavior for other Spark versions to keep things consistent? Otherwise, it might be confusing for users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these will be follow-up PRs if this change is accepted by the community.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to expose removedDeleteFilesCount to the output. cc @dramaticlly

Copy link
Contributor

@dramaticlly dramaticlly Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think change make senses, once we complete the backport of this to all spark versions (spark 3.4/3.5) then we can update the spark procedures documentation to reflect this new fields in the ouput

@stevenzwu stevenzwu merged commit 0d26e1f into apache:main Aug 18, 2025
27 checks passed
@stevenzwu
Copy link
Contributor

thanks @manuzhang for the improvement and everyone for the review

manuzhang added a commit to manuzhang/iceberg that referenced this pull request Aug 19, 2025
stevenzwu pushed a commit that referenced this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants