-
Notifications
You must be signed in to change notification settings - Fork 71
migrate push_queries to push_columns #553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
mypy errors: |
seanmacavaney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably use a few unit tests for push_columns and pop_columns directly.
| if keep_original: | ||
| inp['query'] = inp['query_0'] | ||
| return inp | ||
| def per_element(i: IterDictRecord) -> IterDictRecord: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to try to make this a bit more efficient in the IterDict case, since it's doing a lot of work for every record. But it's not a regression, so not necessary to address in this PR.
| return push_columns_dict(inp, keep_original=keep_original, base_column="query") | ||
|
|
||
|
|
||
| def pop_columns(df: pd.DataFrame, base_column="query") -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a pop for the dict case? This also isn't a regression, just surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
| row = row.copy() | ||
| if "query" in row: | ||
| row = pt.model.push_queries_dict(row, inplace=True, keep_original=True) | ||
| row = pt.model.push_queries_dict(row, keep_original=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be inplace to avoid making an extra copy, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i disabled a lot of inplace=True, to avoid the temptation for downstream users.
however, there are places where it used to make sense, e.g. if a row was new anyway. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel particularly strongly
in terrierteam/pyterrier_rag#24, a generic form of push_queries etc was introduced by @Parry-Parry. This merges those methods upstream.
At the same time, I've removed support for inplace editing of the dataframe.