-
Notifications
You must be signed in to change notification settings - Fork 4.2k
fix(pipelines): ensure assets with same hash but different destinations are published separately #34790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -119,7 +119,7 @@ test('Policy sizes do not exceed the maximum size', () => { | |||
|
|||
// WHEN | |||
const regions = ['us-east-1', 'us-east-2', 'eu-west-1', 'eu-west-2', 'somethingelse1', 'somethingelse-2', 'yapregion', 'more-region']; | |||
for (let i = 0; i < 70; i++) { | |||
for (let i = 0; i < 60; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was facing a validation error when testing this after the change. The resources in the stack exceeded 500 limit
ValidationError: Number of resources in stack 'PipelineStack': 515 is greater than allowed maximum of 500: AWS::KMS::Key (1), AWS::KMS::Alias (1), AWS::S3::Bucket (1), AWS::S3::BucketPolicy (1), AWS::IAM::Role (145), AWS::IAM::Policy (145), AWS::IAM::ManagedPolicy (7), AWS::CodePipeline::Pipeline (1), AWS::CodePipeline::Webhook (1), AWS::CodeBuild::Project (212)
I'm struggling to understand the problem and the proposed solution. The linked bug report says:
What does "packaging" mean in this context? The PR body says:
This means both are being published in the same CodeBuild project?
I'm not sure I follow why it would only be published once? Can you go one level deeper and explain what you've diagnosed that is happening? |
@rix0rrr thank you for taking a look:
I meant packaging for the asset publishing step in the pipeline. The pipeline creates CodeBuild projects that run The current behaviour is - consider a pipeline with 2 stages, the only difference in the 2 stages is the qualifier. One uses So the change creates a composite key (asset id with publishing tole arn) as publishing role is different for above scenario |
let assetNode = this.assetNodes.get(stackAsset.assetId); | ||
// Create a unique key that includes both asset ID and destination information | ||
// This ensures assets with the same hash but different destinations are published separately | ||
const assetKey = `${stackAsset.assetId}:${stackAsset.assetPublishingRoleArn || 'default'}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the problem is that two assets in two different manifests have fully the same identifiers, a la 35b88d91656eb5908:111111-us-east-1
, but they have different properties. In your case, they have different role ARNs to publish under.
This specific fix would work if the only difference between two assets was the role ARN. But it ignores other things that could be different between them. For example, the destination bucketName
could be different. Are you also going to mix the bucket name into this identifier? Or the objectKey
? You should mix in the objectKey
as well. But that is only for file assets, don't forget to consider container image assets.
I think a simpler and more scalable approach would probably be to make sure the destination identifier of an asset depends on the destination's properties, not just their account and region. For example, by hashing the destination object. In that case, if the role between 2 otherwise identical assets is different, the identifier will be unique and all subsequent code will just automatically pick that up.
In this example, the asset identifiers would be something like
35b88d91656eb5908:111111-us-east-1-2b6edb
35b88d91656eb5908:111111-us-east-1-84a92f
^^^ purposely keeping the hash part here short to avoid overwhelming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would do that here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, ill test this out and revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to be clear, we would still expect there to be one asset per asset, just that the destination list would contain both targets (both synthesizer buckets). As suck I believe, you would need to make sure the value in the assetSelector has the hash mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes with the change suggested by @rix0rrr, this is the behavior now. For the same asset, I see one fileasset stage created but multiple destinations:
{
"version": "0.2",
"phases": {
"install": {
"commands": [
"npm install -g cdk-assets@latest"
]
},
"build": {
"commands": [
"cdk-assets --path "assembly-pipeline-asset-stack-Staging/pipelineassetstackStagingdevlambdastackEC748226.assets.json" --verbose publish "a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e:current_account-us-east-1-2519ad1f"",
"cdk-assets --path "assembly-pipeline-asset-stack-Production/pipelineassetstackProductionprdlambdastack4E5ABBC0.assets.json" --verbose publish "a26bd817a0dac44954b5caf83f5880a96f831e43b56157224e073b49f236eb4e:current_account-us-east-1-0b44228e""
]
}
}
}
for
export class LambdaStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
new lambda.Function(this, 'LambdaFN', {
runtime: lambda.Runtime.PYTHON_3_10,
handler: 'index.handler',
code: props?.env?.account == 'xxxxxxxxxxx' ? lambda.Code.fromAsset(path.join(__dirname, 'testhelpers', 'assets', 'test-docker-asset')) : lambda.Code.fromAsset(path.join(__dirname, 'testhelpers', 'assets')),
});
}
}
there are 2 PipelineAssetsFileAsset codebuild projects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we tweak the destination ID to not accidentally be the same as a logically different destination in the same account and region (by means of appending a hash), no other changes will be necessary. All the rest will sort itself out automatically.
It's just that right now we are deduplicating based on false information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am testing by appending a hash created using the stack-name alongside and it is working as expectd:
private manifestEnvName(stack: Stack): string {
const account = resolvedOr(stack.account, 'current_account');
const region = resolvedOr(stack.region, 'current_region');
const destinationProps = {
account,
region,
stackName: stack.stackName,
};
const destinationHash = crypto.createHash('sha256')
.update(JSON.stringify(destinationProps))
.digest('hex')
.slice(0, 8);
return `${account}-${region}-${destinationHash}`;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! If don't mix this into manifestEnvName
but do the hash calculation based on destinationProps
in the place where manifestEnvName
is called, I think we're done.
const destinationProps = { | ||
account, | ||
region, | ||
stackName: stack.stackName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing in the stackName
will make it so that assets that are duplicated between stacks must be uploaded twice. That is a stronger guarantee than we need. Try the actual destination's properties instead.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Issue
Closes #31070
Reason for this change
Assets are missing to be published in CDK pipelines when stacks with different synthesizers are used for the same account and region. When assets have identical content hashes but need to be published to different destinations (different publishing role ARNs), they were being incorrectly grouped together, causing assets to only be published to one destination instead of all required destinations.
Description of changes
• Modified publishAsset() method in packages/aws-cdk-lib/pipelines/lib/helpers-internal/pipeline-graph.ts
• Changed asset tracking key from using only stackAsset.assetId to a composite key:
${stackAsset.assetId}:${stackAsset.assetPublishingRoleArn || 'default'}
• This ensures assets with the same content hash, but different destinations are treated as separate publishing jobs
Describe any new or updated permissions being added
NA
Description of how you validated changes
Checked with the code in #31070 and made sure there are 2 asset stages, locally ran the asset commands and verified that they are being deployed to right buckets:
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license