-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Approximation framework to support numeric search_after
queries
#18896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Approximation framework to support numeric search_after
queries
#18896
Conversation
Signed-off-by: Prudhvi Godithi <[email protected]>
Signed-off-by: Prudhvi Godithi <[email protected]>
Signed-off-by: Prudhvi Godithi <[email protected]>
Signed-off-by: Prudhvi Godithi <[email protected]>
Signed-off-by: Prudhvi Godithi <[email protected]>
Adding @kkewwei @msfroh @harshavamsi to please take a look. |
final byte[] effectiveLower = computeEffectiveLowerBound(); | ||
final byte[] effectiveUpper = computeEffectiveUpperBound(); | ||
|
||
this.pointRangeQuery = new PointRangeQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to re-instantiate this PointRangeQuery
as there is a logic in ApproximatePointRangeQuery.scorerSupplier
to use pointRangeQueryWeight.scorerSupplier
when the size requested is greater than the segment docs.
if (size > values.size()) {
return pointRangeQueryWeight.scorerSupplier(context);
}
Without the re-instantiate, it will continue to use the original range bounds and not the new computed search_after
range bounds which leads to incorrect search results.
I'm open for thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Lucene PointRangeQuery now uses ConstantScoreScorerSupplier which uses DenseConjunctionBulkScorer on a specific cost. Overall I see a code matchAll
which can give good advantage. So looks like its good for us to have this if condition in ApproximatePointRangeQuery
.
{"run-benchmark-test": "id_4"} |
{"run-benchmark-test": "id_13"} |
❌ Gradle check result for a68c544: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/3963/ . Final results will be published once the job is completed. |
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/3964/ . Final results will be published once the job is completed. |
Signed-off-by: Prudhvi Godithi <[email protected]>
❌ Gradle check result for f8e0670: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/3964/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/139/
|
Signed-off-by: Prudhvi Godithi <[email protected]>
❌ Gradle check result for c5ddc30: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/3963/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/140/
|
Signed-off-by: Prudhvi Godithi <[email protected]>
❌ Gradle check result for 2f95b6e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Prudhvi Godithi <[email protected]>
❌ Gradle check result for a7b5001: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
This PR targets to support
search_after
queries from the solution described in this issue #18546.ASC
sort and upper bound forDESC
sort. Example:Adds new
byte[] encodePoint(Object value, boolean roundUp);
inNumericPointEncoder
as todaysearch_after
value is returned as object and requires converting it into the appropriate field specific byte[].search_after
withASC
sort we want values greater than thesearch_after
value and forDESC
sort we want values less than thesearch_after
.Handles the search_after conversation for dates with date math rounding operations, here is the sample query. Note the existing
parseToLong
is for date math rounding operations and not for creatingsearch_after
bounds.Coming from this PR https://github.com/opensearch-project/OpenSearch/pull/18763/files where approximation code path is disabled for multi sort, hence
search_after
should work as expected when dealing with multiple tie breakersearch_after
valuesRegarding benchmarks following are the queries we should see improvement from the existing workloads
desc_sort_with_after_timestamp
,asc_sort_with_after_timestamp
desc_sort_with_after_timestamp
,asc_sort_with_after_timestamp
asc_sort_with_after_population
,desc_sort_with_after_geonameid
,asc_sort_with_after_geonameid
Description
[Describe what this change achieves]
Related Issues
Resolves #18546
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.