Skip to content

Conversation

@cmacdonald
Copy link
Contributor

in terrierteam/pyterrier_rag#24, a generic form of push_queries etc was introduced by @Parry-Parry. This merges those methods upstream.

At the same time, I've removed support for inplace editing of the dataframe.

@cmacdonald cmacdonald changed the title migrate push_queries migrate push_queries to push_columns May 9, 2025
@cmacdonald
Copy link
Contributor Author

mypy errors:
pyterrier/model.py:191: error: Incompatible return value type (got "map[Any]", expected "Tuple[str, int]") [return-value]
pyterrier/apply_base.py:379: error: Incompatible types in assignment (expression has type "Union[Dict[str, Any], Iterable[Dict[str, Any]]]", variable has type "Dict[str, Any]") [assignment]

@cmacdonald cmacdonald requested a review from seanmacavaney May 13, 2025 12:11
Copy link
Collaborator

@seanmacavaney seanmacavaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably use a few unit tests for push_columns and pop_columns directly.

if keep_original:
inp['query'] = inp['query_0']
return inp
def per_element(i: IterDictRecord) -> IterDictRecord:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to try to make this a bit more efficient in the IterDict case, since it's doing a lot of work for every record. But it's not a regression, so not necessary to address in this PR.

return push_columns_dict(inp, keep_original=keep_original, base_column="query")


def pop_columns(df: pd.DataFrame, base_column="query") -> pd.DataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a pop for the dict case? This also isn't a regression, just surprising.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

row = row.copy()
if "query" in row:
row = pt.model.push_queries_dict(row, inplace=True, keep_original=True)
row = pt.model.push_queries_dict(row, keep_original=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be inplace to avoid making an extra copy, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i disabled a lot of inplace=True, to avoid the temptation for downstream users.

however, there are places where it used to make sense, e.g. if a row was new anyway. thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel particularly strongly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants