Skip to content

[mypyc] Faster min #10265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 12, 2021
Merged

[mypyc] Faster min #10265

merged 17 commits into from
Nov 12, 2021

Conversation

ChetanKhanna
Copy link
Contributor

Description

I have tried to specialize min(x, y) in this PR as per mypyc/mypyc#773

Test Plan

I ran the complete test suite locally as well as https://github.com/mypyc/mypyc-benchmarks
However, I'm still unsure as to how am I supposed to report benchmark results. After running the benchmark runner on both master and this branch for 5 time, I got the following results (averaged):
master: 11.7058x faster
current branch: 11.8452x faster

and expr.arg_kinds == [ARG_POS, ARG_POS]):
x, y = builder.accept(expr.args[0]), builder.accept(expr.args[1])
comparison = builder.binary_op(x, y, '<', expr.line)
if comparison == Integer(1, bool_rprimitive):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right. The if statement runs during compilation, but it should instead generate an if statement that is executed when the compiled code is run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this. However, I posted a question below. Any suggestions please?

def check_min_int() -> None:
x: int = 200
y: int = 30
assert min(x, y) == 30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test the case where the first argument is the minimum. I think that it's going to fail.

To help debug this, also add an irbuild test case for this. This way you can validate that the code you are generating is what you expect it to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why you think that way. I've updated the logic a bit to get a better IR. I tested locally and the other case seem to work fine as well. I'll add that test case before pushing next time.

@TH3CHARLie
Copy link
Collaborator

If I recall correctly, the specializer logic would pick the first specializer of a function name. Because there is already a registered builtins.min for generator call, I doubt you actually specialize the call and the performance numbers look like noise to me. As Jukka suggested, write a simple IR test and see if the IR matches your desired case.

@ChetanKhanna
Copy link
Contributor Author

Okay, I think I'll have to dig in deeper then. I'll keep on updating this thread with all the progress. Thank you :)

@ChetanKhanna
Copy link
Contributor Author

Hello! Okay so I was looking into it and I have a couple of doubts:-

  1. Writing an irbuild test helped. However, I feel there is an extra block in the output and I'm not sure how to avoid it from getting generated while compiling. Here's the output and I think block L6 shouldn't be there. If anyone could suggest how to possibly handle this please?
def f(x, y):
      x, y :: int
      r0 :: int64
      r1 :: bit
      r2 :: int64
      r3, r4, r5 :: bit
      r6 :: bool
      r7 :: bit
      r8 :: object
      r9 :: str
      r10, r11, r12, r13 :: object
      r14 :: int
  L0:
      r0 = x & 1
      r1 = r0 == 0
      r2 = y & 
      r3 = r2 == 0
      r4 = r1 & r3
      if r4 goto L1 else goto L2 :: bool
  L1:
      r5 = x < y :: signed
      r6 = r5
      goto L3
  L2:
      r7 = CPyTagged_IsLt_(x, y)
      r6 = r7
  L3:
      if r6 goto L4 else goto L5 :: bool
  L4:
      return x
  L5:
      return y
  L6:
      r8 = builtins :: module
      r9 = 'min'
      r10 = CPyObject_GetAttr(r8, r9)
      r11 = box(int, x)
      r12 = box(int, y)
      r13 = PyObject_CallFunctionObjArgs(r10, r11, r12, 0)
      r14 = unbox(int, r13)
      return r14
  1. As @TH3CHARLie said that builtins.min is already registered and shouldn't be registered again on the new function. Then should the new function be shifted inside the translate_safe_generator_call which originally registers min?

@TH3CHARLie
Copy link
Collaborator

For Q1: I suggest you update your local commit so we can figure out why L6 appears. Your guess is correct, it shouldn't be here, it's a generic implementation that is usually slower than direct integer comparison.

For Q2: my advice is to make the specializers from a map of <name, function> to <name, list of functions> and update the lookup logic accordingly. But this seems a little bit heavy, not sure how @JukkaL thinks.

@ChetanKhanna
Copy link
Contributor Author

For Q1: I suggest you update your local commit so we can figure out why L6 appears. Your guess is correct, it shouldn't be here, it's a generic implementation that is usually slower than direct integer comparison.

I've pushed my local commits, although they are still WIP

For Q2: my advice is to make the specializers from a map of <name, function> to <name, list of functions> and update the lookup logic accordingly. But this seems a little bit heavy, not sure how @JukkaL thinks.

Okay, that makes sense. I can proceed that way if that's okay.

@97littleleaf11
Copy link
Collaborator

97littleleaf11 commented Apr 7, 2021

For Q2: my advice is to make the specializers from a map of <name, function> to <name, list of functions> and update the lookup logic accordingly. But this seems a little bit heavy, not sure how @JukkaL thinks.

Agree with this idea. Recent speed-up commits and issues have and will change a large range of code structure. Maybe we can open a new issue to discuss this?

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 7, 2021

Maybe we can open a new issue to discuss this?

This is a good idea. My first impression is that having a list of functions is probably fine, but it's worth thinking about it a bit more.


@specialize_function('builtins.min')
def faster_min(builder: IRBuilder, expr: CallExpr, callee: RefExpr) -> None:
if (len(expr.args) > 0
Copy link
Collaborator

@97littleleaf11 97littleleaf11 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value should be Optional[Value]. Then the call translation functions in irbuild/expression.py can get the correct result and stop further trying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was super helpful, thanks for pointing out 😄 I should have been more careful.

@ChetanKhanna ChetanKhanna changed the title [WIP][mypyc] Faster min [mypyc] Faster min May 10, 2021
@97littleleaf11
Copy link
Collaborator

lgtm! This PR brings a promising speed-up:

running min_max_pair
..........
interpreted: 0.456433s (avg of 5 iterations; stdev 3.1%)
compiled:    0.264439s (avg of 5 iterations; stdev 0.33%)

compiled is 1.726x faster

Master branch is about 0.9x.


[case testMin]
def check_min_int() -> None:
x: int = 200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think functions with prefix check wouldn't trigger a run test? Maybe you should change it to test_*. Also, please add more run tests about other types, such as strings, floats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having some issues with my system right now. I'll try to update the branch by the weekend.

@ChetanKhanna
Copy link
Contributor Author

If this is good, then I can the max counterpart as well.

Copy link
Collaborator

@97littleleaf11 97littleleaf11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing! I have several suggesions on testing. The specializer looks good to me and you could support max in this PR.

btw, we don't recommend force-push since it would break previous reviews.

@@ -20,3 +20,9 @@ def test_abs() -> None:
assert abs(44324.732) == 44324.732
assert abs(-23.4) == 23.4
assert abs(-43.44e-4) == 43.44e-4

[case testFloatMin]
def test_float_min() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think merging these test cases into one might be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@overload
def min(x: float, y: float) -> float: ...
@overload
def min(x: str, y: str) -> str: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use T (already defined in this file) to make them simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

def test_str_min() -> None:
x: str = 'aaa'
y: str = 'bbb'
assert min(x, y) == 'aaa'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also test the minimun of user defined class which has __lt__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the inverse here as well.

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 11, 2021

Please also test heterogeneous operands for min (e.g. Any and int; also int and Any). The int operands needs to be coerced to object. Another interesting case is int and bool.

Finally, test Any and Any with a few different types, and test Any and Any with incompatible types (e.g. int and str).

@ChetanKhanna
Copy link
Contributor Author

Sorry for the late update. I tried all the tests as @JukkaL suggested, they all failed to compile. I get an error like:
error: assignment to ‘CPyTagged’ {aka ‘long unsigned int’} from ‘PyObject *’ {aka ‘struct _object *’} makes integer from pointer without a cast

Pushing the commit so that everyone can see as well.

The int operands needs to be coerced to object.

Do we need to explicitly perform coercion in the new faster_min function?

@ChetanKhanna
Copy link
Contributor Author

I made changes to faster_min, the tests are running fine now.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments (not a full review).

def test_str_min() -> None:
x: str = 'aaa'
y: str = 'bbb'
assert min(x, y) == 'aaa'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the inverse here as well.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I checked that min is now much faster than previously -- it can be over 10x faster than before.

I have just a two minor comments. Feel free to merge once you've addressed them.

Extending this to support max should be pretty easy and a good follow-up PR :-)

and expr.arg_kinds == [ARG_POS, ARG_POS]):
x, y = builder.accept(expr.args[0]), builder.accept(expr.args[1])
result = Register(builder.node_type(expr))
comparison = builder.binary_op(x, y, '<', expr.line)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a quick experiment, it seems that CPython does actually evaluate y < x instead of x < y when doing min(x, y). Can you double check this and update the implementation accordingly if that's the case? (This is not a big deal, but it's better to remain as close to CPython as possible.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out! I checked the bltinmodule/min_max() in CPython and it does eval y < x when doing min(x, y). To be more specific, it evaluates arguments from last to first.

@97littleleaf11 97littleleaf11 merged commit 0b4cb1e into python:master Nov 12, 2021
TH3CHARLie pushed a commit that referenced this pull request Nov 12, 2021
Description
Closes mypyc/mypyc#773, follows up to #10265
tushar-deepsource pushed a commit to DeepSourceCorp/mypy that referenced this pull request Jan 20, 2022
Speeds up `min(x, y)` using a specializer. 

Co-authored-by: 97littleleaf11 <[email protected]>
tushar-deepsource pushed a commit to DeepSourceCorp/mypy that referenced this pull request Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants