Skip to content

fix(pipelines): ensure assets with same hash but different destinations are published separately #34790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mrlikl
Copy link
Contributor

@mrlikl mrlikl commented Jun 23, 2025

Issue

Closes #31070

Reason for this change

Assets are missing to be published in CDK pipelines when stacks with different synthesizers are used for the same account and region. When assets have identical content hashes but need to be published to different destinations (different publishing role ARNs), they were being incorrectly grouped together, causing assets to only be published to one destination instead of all required destinations.

Description of changes

• Modified publishAsset() method in packages/aws-cdk-lib/pipelines/lib/helpers-internal/pipeline-graph.ts
• Changed asset tracking key from using only stackAsset.assetId to a composite key:
${stackAsset.assetId}:${stackAsset.assetPublishingRoleArn || 'default'}
• This ensures assets with the same content hash, but different destinations are treated as separate publishing jobs

Describe any new or updated permissions being added

NA

Description of how you validated changes

Checked with the code in #31070 and made sure there are 2 asset stages, locally ran the asset commands and verified that they are being deployed to right buckets:

muralikl@b0be83688a18 cdk.out % cdk-assets --path "assembly-pipeline-asset-stack-Staging/pipelineassetstackStagingdevlambdastackEC748226.assets.json" --verbose publish "a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e:current_account-us-east-1"                   
verbose: Loaded manifest from assembly-pipeline-asset-stack-Staging/pipelineassetstackStagingdevlambdastackEC748226.assets.json: 2 assets found
verbose: Applied selection: 1 assets selected.
info   : [0%] start: Publishing LambdaFN/Code (current_account-us-east-1)
verbose: [0%] check: Check s3://cdk-dev-assets-123456789012-us-east-1/a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e.zip
verbose: [0%] build: Zip /Users/muralikl/Downloads/aws-cdk/packages/@aws-cdk-testing/framework-integ/cdk.out/asset.a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e -> assembly-pipeline-asset-stack-Staging/.cache/a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e.zip
verbose: [0%] upload: Upload s3://cdk-dev-assets-123456789012-us-east-1/a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e.zip
info   : [100%] success: Published LambdaFN/Code (current_account-us-east-1)

muralikl@b0be83688a18 cdk.out % cdk-assets --path "assembly-pipeline-asset-stack-Production/pipelineassetstackProductionprdlambdastack4E5ABBC0.assets.json" --verbose publish "a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e:current_account-us-west-2"
verbose: Loaded manifest from assembly-pipeline-asset-stack-Production/pipelineassetstackProductionprdlambdastack4E5ABBC0.assets.json: 2 assets found
verbose: Applied selection: 1 assets selected.
info   : [0%] start: Publishing LambdaFN/Code (current_account-us-west-2)
verbose: [0%] check: Check s3://cdk-hnb659fds-assets-123456789012-us-west-2/a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e.zip
verbose: [0%] build: Zip /Users/muralikl/Downloads/aws-cdk/packages/@aws-cdk-testing/framework-integ/cdk.out/asset.a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e -> assembly-pipeline-asset-stack-Production/.cache/a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e.zip
verbose: [0%] upload: Upload s3://cdk-hnb659fds-assets-123456789012-us-west-2/a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e.zip
info   : [100%] success: Published LambdaFN/Code (current_account-us-west-2)

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team June 23, 2025 12:06
@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p1 valued-contributor [Pilot] contributed between 6-12 PRs to the CDK labels Jun 23, 2025
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Jun 23, 2025
@matboros matboros self-assigned this Jun 27, 2025
@@ -119,7 +119,7 @@ test('Policy sizes do not exceed the maximum size', () => {

// WHEN
const regions = ['us-east-1', 'us-east-2', 'eu-west-1', 'eu-west-2', 'somethingelse1', 'somethingelse-2', 'yapregion', 'more-region'];
for (let i = 0; i < 70; i++) {
for (let i = 0; i < 60; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was facing a validation error when testing this after the change. The resources in the stack exceeded 500 limit

ValidationError: Number of resources in stack 'PipelineStack': 515 is greater than allowed maximum of 500: AWS::KMS::Key (1), AWS::KMS::Alias (1), AWS::S3::Bucket (1), AWS::S3::BucketPolicy (1), AWS::IAM::Role (145), AWS::IAM::Policy (145), AWS::IAM::ManagedPolicy (7), AWS::CodePipeline::Pipeline (1), AWS::CodePipeline::Webhook (1), AWS::CodeBuild::Project (212)

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 30, 2025

I'm struggling to understand the problem and the proposed solution.

The linked bug report says:

When a CDK pipeline with 2 stages targeting the same account and region but with different synthesizers, the assets are packaged only once.

What does "packaging" mean in this context?

The PR body says:

When assets have identical content hashes but need to be published to different destinations (different publishing role ARNs), they were being incorrectly grouped together

This means both are being published in the same CodeBuild project?

causing assets to only be published to one destination instead of all required destinations.

I'm not sure I follow why it would only be published once?

Can you go one level deeper and explain what you've diagnosed that is happening?

@rix0rrr rix0rrr self-assigned this Jun 30, 2025
@mrlikl
Copy link
Contributor Author

mrlikl commented Jul 1, 2025

@rix0rrr thank you for taking a look:

What does "packaging" mean in this context?

I meant packaging for the asset publishing step in the pipeline. The pipeline creates CodeBuild projects that run cdk-assets commands to upload assets to S3 buckets.

The current behaviour is - consider a pipeline with 2 stages, the only difference in the 2 stages is the qualifier. One uses dev the other uses default hnb659fds. The pipeline needs to publish assets to 2 buckets - cdk-dev-assets-123456789012-region and cdk-hnb659fds-assets-123456789012-region. However, since both the stages have the same Lambda code (same asset hash), the pipeline graph creates only ONE asset publishing node. So the asset gets published to only the first destination and deployment fails with "no such key" in the second stage.

So the change creates a composite key (asset id with publishing tole arn) as publishing role is different for above scenario cdk-dev-file-publishing-role-account-us-east-2. In the end, separate code build jobs are created for the same asset but having different destinations.

rix0rrr
rix0rrr previously requested changes Jul 1, 2025
let assetNode = this.assetNodes.get(stackAsset.assetId);
// Create a unique key that includes both asset ID and destination information
// This ensures assets with the same hash but different destinations are published separately
const assetKey = `${stackAsset.assetId}:${stackAsset.assetPublishingRoleArn || 'default'}`;
Copy link
Contributor

@rix0rrr rix0rrr Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the problem is that two assets in two different manifests have fully the same identifiers, a la 35b88d91656eb5908:111111-us-east-1, but they have different properties. In your case, they have different role ARNs to publish under.

This specific fix would work if the only difference between two assets was the role ARN. But it ignores other things that could be different between them. For example, the destination bucketName could be different. Are you also going to mix the bucket name into this identifier? Or the objectKey? You should mix in the objectKey as well. But that is only for file assets, don't forget to consider container image assets.

I think a simpler and more scalable approach would probably be to make sure the destination identifier of an asset depends on the destination's properties, not just their account and region. For example, by hashing the destination object. In that case, if the role between 2 otherwise identical assets is different, the identifier will be unique and all subsequent code will just automatically pick that up.

In this example, the asset identifiers would be something like

35b88d91656eb5908:111111-us-east-1-2b6edb
35b88d91656eb5908:111111-us-east-1-84a92f
                                    ^^^ purposely keeping the hash part here short to avoid overwhelming

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, ill test this out and revert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, we would still expect there to be one asset per asset, just that the destination list would contain both targets (both synthesizer buckets). As suck I believe, you would need to make sure the value in the assetSelector has the hash mentioned above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes with the change suggested by @rix0rrr, this is the behavior now. For the same asset, I see one fileasset stage created but multiple destinations:

{
    "version": "0.2",
    "phases": {
        "install": {
            "commands": [
                "npm install -g cdk-assets@latest"
            ]
        },
        "build": {
            "commands": [
                "cdk-assets --path "assembly-pipeline-asset-stack-Staging/pipelineassetstackStagingdevlambdastackEC748226.assets.json" --verbose publish "a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e:current_account-us-east-1-2519ad1f"",
                "cdk-assets --path "assembly-pipeline-asset-stack-Production/pipelineassetstackProductionprdlambdastack4E5ABBC0.assets.json" --verbose publish "a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e:current_account-us-east-1-0b44228e""
            ]
        }
    }
}

for

export class LambdaStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);
    new lambda.Function(this, 'LambdaFN', {
      runtime: lambda.Runtime.PYTHON_3_10,
      handler: 'index.handler',
      code: props?.env?.account == 'xxxxxxxxxxx' ? lambda.Code.fromAsset(path.join(__dirname, 'testhelpers', 'assets', 'test-docker-asset')) : lambda.Code.fromAsset(path.join(__dirname, 'testhelpers', 'assets')),
    });
  }
}

there are 2 PipelineAssetsFileAsset codebuild projects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we tweak the destination ID to not accidentally be the same as a logically different destination in the same account and region (by means of appending a hash), no other changes will be necessary. All the rest will sort itself out automatically.

It's just that right now we are deduplicating based on false information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am testing by appending a hash created using the stack-name alongside and it is working as expectd:

  private manifestEnvName(stack: Stack): string {
    const account = resolvedOr(stack.account, 'current_account');
    const region = resolvedOr(stack.region, 'current_region');
    const destinationProps = {
      account,
      region,
      stackName: stack.stackName,
    };
    const destinationHash = crypto.createHash('sha256')
      .update(JSON.stringify(destinationProps))
      .digest('hex')
      .slice(0, 8);

    return `${account}-${region}-${destinationHash}`;
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! If don't mix this into manifestEnvName but do the hash calculation based on destinationProps in the place where manifestEnvName is called, I think we're done.

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Jul 1, 2025
@matboros matboros removed their assignment Jul 3, 2025
@mrlikl mrlikl requested a review from a team as a code owner July 3, 2025 18:48
@mergify mergify bot dismissed rix0rrr’s stale review July 3, 2025 18:48

Pull request has been modified.

@mrlikl mrlikl requested a review from rix0rrr July 3, 2025 18:49
const destinationProps = {
account,
region,
stackName: stack.stackName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing in the stackName will make it so that assets that are duplicated between stacks must be uploaded twice. That is a stronger guarantee than we need. Try the actual destination's properties instead.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: adac7d0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. effort/medium Medium work item – several days of effort p1 valued-contributor [Pilot] contributed between 6-12 PRs to the CDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(pipelines): Assets are missing to be packaged since StackSynthesizer being ignored
4 participants