Skip to content

[mypyc] Optimize str.encode with specializations for common used encodings #18232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 3, 2024

Conversation

svalentin
Copy link
Collaborator

Tested with:

import time
start = time.time()
for i in range(20000000):
    "test".encode('utf-8')
print(time.time() - start)

With PR applied and running mypyc, python3 -c "import test" runs in:
0.5383486747741699
0.5224344730377197
0.555696964263916

Without PR applied:
0.7315819263458252
0.7105758190155029
0.7471706867218018

Similar times observed for "ascii"

@svalentin svalentin requested a review from JukkaL December 2, 2024 18:19
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some cases aren't covered by the logic. Suggested some test cases that should help.

s.encode('utf-8', 'strict')
s.encode('utf-8', errors='strict')
s.encode('utf-8', 'backslashreplace')
s.encode(encoding='ascii')
s.encode('ascii', 'backslashreplace')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test cases where the specialization shouldn't be applied. Examples: s.encode(x), s.encode('a', x), s.encode('utf8', errors=x) and s.encode(errors=x) where x is not a literal.

Test cases where we have two keyword args: s.encode(encoding=..., errors=...) and s.encode(errors=..., encoding=...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the logic to work out the args better and added more tests. Please take another look!

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@JukkaL JukkaL merged commit e731185 into python:master Dec 3, 2024
13 checks passed
@svalentin svalentin deleted the mypyc-str-encode branch December 17, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants