Skip to content

Reduce hashing during v2 transitive graph walks#4109

Merged
stuhood merged 2 commits into
pantsbuild:masterfrom
twitter:stuhood/reduce-hashing-in-transitive-walks
Dec 1, 2016
Merged

Reduce hashing during v2 transitive graph walks#4109
stuhood merged 2 commits into
pantsbuild:masterfrom
twitter:stuhood/reduce-hashing-in-transitive-walks

Conversation

@stuhood

@stuhood stuhood commented Dec 1, 2016

Copy link
Copy Markdown
Member

Problem

Transitive graph walks in the v2 engine involve doing lots of deduping merges of collections of HydratedTarget objects. Pre-change, this was taking up about 70% of the total runtime of ./pants --enable-v2-engine list $target for a highly connected $target.

Solution

OrderedSet is significantly slower for individual dedupe operations (particularly when it is converted back into a tuple afterward), so we switch to deduping a generator using a throwaway set and collecting it into a tuple.

Additionally, because HydratedTarget objects are guaranteed to have an Address, we implement equality/hash checks using an Address lifted from the inner structs.

Result

The runtime of ./pants --enable-v2-engine list $target is improved by approximately 2x.

…d a faster hashcode impl for HydratedTarget.

@kwlzn kwlzn left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@JieGhost JieGhost left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Remind me of a similar change I made in fs.py. What I found there is same to your finding, ie, using set is much faster than OrderedSet.

@stuhood stuhood merged commit ce807a2 into pantsbuild:master Dec 1, 2016
lenucksi pushed a commit to lenucksi/pants that referenced this pull request Apr 25, 2017
### Problem

Transitive graph walks in the v2 engine involve doing lots of deduping merges of collections of `HydratedTarget` objects. Pre-change, this was taking up about 70% of the total runtime of `./pants --enable-v2-engine list $target` for a highly connected `$target`.

### Solution

OrderedSet is significantly slower for individual dedupe operations (particularly when it is converted back into a tuple afterward), so we switch to deduping a generator using a throwaway set and collecting it into a tuple.

Additionally, because `HydratedTarget` objects are guaranteed to have an Address, we implement equality/hash checks using an Address lifted from the inner structs.

### Result

The runtime of `./pants --enable-v2-engine list $target` is improved by approximately 2x.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants