SlideShare a Scribd company logo
(incubating)
OQL and Indexing
OQL
It is a SQL-like language with extended functionality for querying complex objects, object attributes and methods.
Only a subset of the OQL features are supported.
Advantages of OQL:
● You can query on any arbitrary object
● You can navigate object collections
● You can invoke methods and access the behavior of objects
● You are not required to declare types. Since you do not need type definitions, you can work across multiple
languages
● You are not constrained by a schema
Commonly used Keywords
SELECT * or field projection
FROM “select * from /users”
WHERE “select * from /users where id = 0”
AND “select * from /users where id > 0 and age > 21”
OR “select * from /users where id != 0 or age < 21”
AS “select * from /users as u where u.id <> 0” , “select * from /users u where u.id > 0”
COUNT “select count(*) from /users”
DISTINCT “select distinct(*) from /users”, “select distinct(name) from /users
IN “select * from /users u where u.id in set (0, 1, 2)”,
“select * from /users u where u.id in (select id from /employees e)”
LIMIT “select * from /users u limit 5”
LIKE “select * from /users u where u.name like ‘%a%’”
NOT “select * from /users u where u.name NOT (id = 2)”
ORDER BY “select * from /users u where u.name = ‘Joe’ order by u.id”
TO_DATE (parsed using SimpleDateFormat) to_date('05/09/10', 'yy/dd/yy') to_date('050910', 'yyddMM')
That’s not all! More keywords and information can be found in the Geode Documentation
Geode Specific Keywords
IS_DEFINED
● Query function. Returns TRUE if the expression does not evaluate to UNDEFINED.
IS_UNDEFINED
● Query function. Returns TRUE if the expression evaluates to UNDEFINED. In most queries, undefined values are
not included in the query results. The IS_UNDEFINED function allows undefined values to be included, so you can
identify element with undefined values.
Geode Specific Keywords Continued
<trace> “<trace> select * from /users u where u.id = 0”
Example log output:
No Indexes used:
● [info 2015/05/26 10:25:35.102 PDT Server <main> tid=0x1] Query Executed in 9.619656 ms; rowCount =
99; indexesUsed(0) "select * from /users u where id > 0 and status='active'"
One index used:
● [info 2015/05/26 10:25:35.317 PDT Server <main> tid=0x1] Query Executed in 1.5342 ms; rowCount =
199; indexesUsed(1):sampleIndex-1(Results: 199) "select count * from /users u where u.id > 0"
When more than one index is used:
● [info 2015/05/26 10:25:35.673 PDT Serve <main> tid=0x1] Query Executed in 2.43847 ms; rowCount =
199; indexesUsed(2):sampleIndex-2(Results: 100),sampleIndex-1(Results: 199) "select * from /users u
where u.id > 0 OR u.status='active'"
System.setProperty("gemfire.Query.VERBOSE","true");
<hint ‘indexName’> or <hint ‘indexName1’, ‘indexName2’>
Example:“<hint ‘nameIndex’>select * from /users u where u.name = ‘Joe’ and u.age > 10”
Query Bind Parameters
What
Similar to a SQL prepared statement
Parameters start with a ‘$’ and a number starting from 1
Examples:
String queryString = “SELECT DISTINCT * FROM /exampleRegion p WHERE p.status = $1 and p.symbol = $2”;
...
Object[] params = {“sold”, “abc”}
SelectResults results = (SelectResults)query.execute(params);
Possible Exceptions
QueryParameterCountInvalidException
TypeMismatchException
Bind region as a parameter
● Binding region parameter requires actual region object and not the string name
“SELECT DISTINCT * FROM $1 p WHERE p.status = $2”
Field visibility and Method Invocation
The query engine tries to evaluate the value using the public field value, if public field is not found makes a get call
using field name (having its first character uppercase).
Examples:
SELECT DISTINCT * FROM /users u where u.firstName = 'Joe'
SELECT DISTINCT * FROM /users u where u.getFirstName() = 'Joe'
SELECT DISTINCT * FROM /users u where u.combineFullName() = ‘Joe’s Full Name’
Type conversions
The Geode query engine will implicitly do the following conversions
Binary Numeric Promotion
The query processor performs binary numeric promotion on the operands of the following operators:
● Operators <, <=, >, and >=, = and <>
1. If either operand is of type double, the other is converted to double
2. If either operand is of type float, the other is converted to float
3. If either operand is of type long, the other is converted to long
4. Both operands are converted to type int char
Temporal Type Conversion
java.util.Date , java.sql.Date , java.sql.Time , and java.sql.Timestamp are treated as nanosecond comparisons
Enum Conversion are not done implicitly, a toString() call is needed
Query Evaluation of Float.NaN and Double.NaN
Float.NaN and Double.NaN are not evaluated as primitives; instead, they are compared in the same manner used as
the JDK methods Float.compareTo and Double.compareTo
Query a Partitioned Region
Operations summary:
1.) “Coordinating” node calculates where all data resides
2.) Creates and executes tasks to query data on remote nodes
a.) Each node will execute the query, using any indexes the node currently has
3.) Executes query on local node
4.) On failure, will recalculate where failed data now resides
5.) Executes tasks to query data on remote nodes that failed/where data now resides
6.) Combines data and returns
Query Monitor
Query Timeout -
Set the system property - gemfire.Cache.MAX_QUERY_EXECUTION_TIME (default is disabled and set to -1)
ResourceManager - Monitoring Queries for Low Memory
Helps prevent out of memory exceptions when querying or creating indexes.
This feature is automatically enabled when you set a critical-heap-percentage attribute for the resource-manager
element in cache.xml or by using cache.getResourceManager().setCriticalHeapPercentage(float heapPercentage) API.
If set, timeout is now set to 5 hours if one has not been set.
Queries will be cancelled with QueryExecutionLowMemoryExcepton and InvalidIndexException
Set the system property - gemfire.cache.DISABLE_QUERY_MONITOR_FOR_LOW_MEMORY to true to disable.
Partitioned Region Queries and Low Memory
Partitioned region queries are likely causes for out-of-memory exceptions. If query monitoring is enabled, partitioned
region queries drop or ignore results that are being gathered by other servers if the executing server is low in
memory.
Indexing
Why use an index?
● Significantly improve querying speeds.
● No longer iterate through the entire region when a matching index can be used
Additional Info:
● Indexed fields must implement Comparable
● Provide simple way to index on fields, nested object fields, nested collection of objects/fields and nested maps
Types:
● Functional Index
● Functional (Compact) Index
● Map index
● Hash Index
● Primary Key Index
Functional Index
A sorted index, internally represented as a tuple and copy of the value
How to create
qs.createIndex(“indexName”, “d.name”, “/users u, u.dependents d”); //(List or Set)
qs.createIndex(“indexName”, “d.name”, “/users u, u.dependents.values d”); //(Map)
Representation
Key Values
Sonny | Collection: [(User:Joe, Sonny)]
Cheryl | Collection: [(User:Joe, Cheryl), (User:John, Cheryl)]
Example query
“select * from /users u, u.dependents d where d.name = ‘Sonny’”
Restrictions:
Cannot be created on overflow regions
Functional Index (Compact)
Memory savings over the non compact index at the expense of doing extra work during index maintenance.
How to create
qs.createIndex(“user names”, “u.name”, “/users u”);
qs.createIndex(“user names”, “u.nestedObject.fieldName”, “/users u”);
Representation
Key Values
Joe | Region Entry
John | [Region Entry, Region Entry]
Jerry | Collection(Region Entry, Region Entry)
Restrictions:
Index maintenance is synchronous
Only when there is one iterator in the from clause (example: /users u)
Additional Info:
What about updates in progress?
What about “in place modification”
Key Index
Creating a key index makes the query service aware of the relationship between the values in the region and the keys
in the region.
This allows the query service to translate a query using a key into a get.
How to create:
qs.createKeyIndex(“indexName”, “u.id”, “/users u”);
Example Query:
“select * from /users u where u.id = 1”
Restrictions:
Equality comparisons only
Hash Index
The good
Saves on memory due to not storing index key values
Hash values are computed from index key
The bad
Slower maintenance and query times
Only a slight savings in memory
Name is a bit misleading
Representation
Array: [ RE, RE, null, RE, REMOVED, null, RE, ...]
How to create
qs.createHashIndex(“indexName”, “u.name”, “/users u”);
Restrictions:
Only equality based queries
Single iterator
Map Index
Allows indexing a map field of an object
How to create:
qs.createIndex("indexName", "u.name[*]", "/users u");
qs.createIndex("indexName", "u.name['first', 'middle']", "/users u");
In Gfsh:
gfsh>create index --name="IndexName" --expression="u.name[‘first’, 'middle']" --region="/users u"
Example of query:
“SELECT * FROM /users u WHERE u.name['first'] = 'John' OR u.name['last'] = 'Smith'”
Gotcha:
Using u.name.get(‘first’) will not create or query the map index.
Map Index...
‘first’
‘middle’
‘last’
Keys
Range Index
Key Value
Joe Collection: [(User: Joe Bob, Joe)]
John Collection:[(User:John Jacob Schmidt, John)]
Jerry Collection:[(User:Jerry Schmidt, Jerry)]
Range Index
Key Value
Jacob Collection:[User: John Jacob Schmidt, Jacob)]
Range Index
Key Value
Bob Collection: [(User: Joe Bob, Bob)]
Schmidt Collection:[(User:John Jacob Schmidt, Schmidt),
(User:Jerry Schmidt, Schmidt)]]
Values
Multiple Index Creation
Creating an multiple indexes on a populated region requires iterating that region for each index
This has significant impact when we have overflow regions
Same mechanism used when cache is brought up internally
Example of multiple index creation:
Cache cache = new CacheFactory().create();
QueryService queryService = cache.getQueryService();
queryService.defineIndex("name1", "indexExpr1", "regionPath1");
queryService.defineIndex("name2", "indexExpr2", "regionPath2");
queryService.defineHashIndex("name3", "indexExpr3", "regionPath2");
queryService.defineKeyIndex("name4", "indexExpr4", "regionPath2");
List<Index> indexes = queryService.createDefinedIndexes();
To clear any defined indexes that have not been created yet
queryService.clearDefinedIndexes();
Querying with Functions
Benefits:
● Allows targeting specific nodes by filtering by partitioning key
● Closer to data
● Logic and computation on results from node, possibly less to send back
Drawbacks:
● More work for users (writing the function)
● More work for users (registering the function)
Equijoin Queries
Restrictions:
● Must be colocated
Problems:
● Slow due to cartesian
● Memory usage due to temporary joined result sets
Some improvements are coming:
● Significantly reduce join time for single iterator filters where indexes can be used:
“select * from /users u, /employees e where u.name = ‘John’ and u.id = e.id”
“select * from /users u, /employees e where u.name = ‘John’ and u.age > 21 and u.id = e.id”
“select * from /users u, /employees e, /office o where u.name = ‘John’” and u.id = e.id and e.location = o.location”
General Tips/Tricks
● From clause of the query and index expression should match
● For AND operators, put the more selective filter first in the query
● Whenever possible, provide a hint to allow the query engine to prefer a specific index

More Related Content

What's hot (18)

PPTX
บทที่ 4 การเพิ่มข้อมูลลงฐานข้อมูล
Priew Chakrit
 
PDF
KAAccessControl
WO Community
 
PDF
Java Programming - 08 java threading
Danairat Thanabodithammachari
 
DOC
Advanced Hibernate Notes
Kaniska Mandal
 
PDF
New methods for exploiting ORM injections in Java applications
Mikhail Egorov
 
PPTX
บทที่4
Waritsara Sonchan
 
PPT
hibernate with JPA
Mohammad Faizan
 
PPT
Executing Sql Commands
phanleson
 
PDF
Activator and Reactive at Play NYC meetup
Henrik Engström
 
PPT
Jdbc
smvdurajesh
 
PDF
Java script
Yoga Raja
 
PPTX
JSON
Yoga Raja
 
DOC
Ad java prac sol set
Iram Ramrajkar
 
PPT
Introduction to hibernate
hr1383
 
PPT
Fast querying indexing for performance (4)
MongoDB
 
PDF
Introduction to Active Record - Silicon Valley Ruby Conference 2007
Rabble .
 
PDF
ORM2Pwn: Exploiting injections in Hibernate ORM
Mikhail Egorov
 
บทที่ 4 การเพิ่มข้อมูลลงฐานข้อมูล
Priew Chakrit
 
KAAccessControl
WO Community
 
Java Programming - 08 java threading
Danairat Thanabodithammachari
 
Advanced Hibernate Notes
Kaniska Mandal
 
New methods for exploiting ORM injections in Java applications
Mikhail Egorov
 
บทที่4
Waritsara Sonchan
 
hibernate with JPA
Mohammad Faizan
 
Executing Sql Commands
phanleson
 
Activator and Reactive at Play NYC meetup
Henrik Engström
 
Java script
Yoga Raja
 
JSON
Yoga Raja
 
Ad java prac sol set
Iram Ramrajkar
 
Introduction to hibernate
hr1383
 
Fast querying indexing for performance (4)
MongoDB
 
Introduction to Active Record - Silicon Valley Ruby Conference 2007
Rabble .
 
ORM2Pwn: Exploiting injections in Hibernate ORM
Mikhail Egorov
 

Similar to OQL querying and indexes with Apache Geode (incubating) (20)

PDF
6 tips for improving ruby performance
Engine Yard
 
PPTX
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
PPTX
Elasticsearch an overview
Amit Juneja
 
PDF
Elements for an iOS Backend
Laurent Cerveau
 
PDF
Performance Test Driven Development with Oracle Coherence
aragozin
 
KEY
Hibernate Performance Tuning (JEEConf 2012)
Sander Mak (@Sander_Mak)
 
PPTX
Hadoop cluster performance profiler
Ihor Bobak
 
PPTX
GDSC Backend Bootcamp.pptx
SaaraBansode
 
PPTX
Introduction to AngularJs
murtazahaveliwala
 
PPTX
IT talk SPb "Full text search for lazy guys"
DataArt
 
PPTX
Apache Solr for begginers
Alexander Tokarev
 
PPTX
05_DP_300T00A_Optimize.pptx
KareemBullard1
 
PDF
[2015/2016] Local data storage for web-based mobile apps
Ivano Malavolta
 
PPTX
SFDC Advanced Apex
Sujit Kumar
 
PDF
Profiling Mondrian MDX Requests in a Production Environment
Raimonds Simanovskis
 
PDF
Data access
Joshua Yoon
 
PPTX
My SQL Skills Killed the Server
devObjective
 
PPTX
Sql killedserver
ColdFusionConference
 
PPTX
Mastering Test Automation: How To Use Selenium Successfully
SpringPeople
 
PDF
540slidesofnodejsbackendhopeitworkforu.pdf
hamzadamani7
 
6 tips for improving ruby performance
Engine Yard
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Elasticsearch an overview
Amit Juneja
 
Elements for an iOS Backend
Laurent Cerveau
 
Performance Test Driven Development with Oracle Coherence
aragozin
 
Hibernate Performance Tuning (JEEConf 2012)
Sander Mak (@Sander_Mak)
 
Hadoop cluster performance profiler
Ihor Bobak
 
GDSC Backend Bootcamp.pptx
SaaraBansode
 
Introduction to AngularJs
murtazahaveliwala
 
IT talk SPb "Full text search for lazy guys"
DataArt
 
Apache Solr for begginers
Alexander Tokarev
 
05_DP_300T00A_Optimize.pptx
KareemBullard1
 
[2015/2016] Local data storage for web-based mobile apps
Ivano Malavolta
 
SFDC Advanced Apex
Sujit Kumar
 
Profiling Mondrian MDX Requests in a Production Environment
Raimonds Simanovskis
 
Data access
Joshua Yoon
 
My SQL Skills Killed the Server
devObjective
 
Sql killedserver
ColdFusionConference
 
Mastering Test Automation: How To Use Selenium Successfully
SpringPeople
 
540slidesofnodejsbackendhopeitworkforu.pdf
hamzadamani7
 
Ad

Recently uploaded (20)

PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPT
Brief History of Python by Learning Python in three hours
adanechb21
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PDF
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
PDF
Troubleshooting Virtual Threads in Java!
Tier1 app
 
PDF
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PDF
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Brief History of Python by Learning Python in three hours
adanechb21
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
Troubleshooting Virtual Threads in Java!
Tier1 app
 
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Ad

OQL querying and indexes with Apache Geode (incubating)

  • 2. OQL It is a SQL-like language with extended functionality for querying complex objects, object attributes and methods. Only a subset of the OQL features are supported. Advantages of OQL: ● You can query on any arbitrary object ● You can navigate object collections ● You can invoke methods and access the behavior of objects ● You are not required to declare types. Since you do not need type definitions, you can work across multiple languages ● You are not constrained by a schema
  • 3. Commonly used Keywords SELECT * or field projection FROM “select * from /users” WHERE “select * from /users where id = 0” AND “select * from /users where id > 0 and age > 21” OR “select * from /users where id != 0 or age < 21” AS “select * from /users as u where u.id <> 0” , “select * from /users u where u.id > 0” COUNT “select count(*) from /users” DISTINCT “select distinct(*) from /users”, “select distinct(name) from /users IN “select * from /users u where u.id in set (0, 1, 2)”, “select * from /users u where u.id in (select id from /employees e)” LIMIT “select * from /users u limit 5” LIKE “select * from /users u where u.name like ‘%a%’” NOT “select * from /users u where u.name NOT (id = 2)” ORDER BY “select * from /users u where u.name = ‘Joe’ order by u.id” TO_DATE (parsed using SimpleDateFormat) to_date('05/09/10', 'yy/dd/yy') to_date('050910', 'yyddMM') That’s not all! More keywords and information can be found in the Geode Documentation
  • 4. Geode Specific Keywords IS_DEFINED ● Query function. Returns TRUE if the expression does not evaluate to UNDEFINED. IS_UNDEFINED ● Query function. Returns TRUE if the expression evaluates to UNDEFINED. In most queries, undefined values are not included in the query results. The IS_UNDEFINED function allows undefined values to be included, so you can identify element with undefined values.
  • 5. Geode Specific Keywords Continued <trace> “<trace> select * from /users u where u.id = 0” Example log output: No Indexes used: ● [info 2015/05/26 10:25:35.102 PDT Server <main> tid=0x1] Query Executed in 9.619656 ms; rowCount = 99; indexesUsed(0) "select * from /users u where id > 0 and status='active'" One index used: ● [info 2015/05/26 10:25:35.317 PDT Server <main> tid=0x1] Query Executed in 1.5342 ms; rowCount = 199; indexesUsed(1):sampleIndex-1(Results: 199) "select count * from /users u where u.id > 0" When more than one index is used: ● [info 2015/05/26 10:25:35.673 PDT Serve <main> tid=0x1] Query Executed in 2.43847 ms; rowCount = 199; indexesUsed(2):sampleIndex-2(Results: 100),sampleIndex-1(Results: 199) "select * from /users u where u.id > 0 OR u.status='active'" System.setProperty("gemfire.Query.VERBOSE","true"); <hint ‘indexName’> or <hint ‘indexName1’, ‘indexName2’> Example:“<hint ‘nameIndex’>select * from /users u where u.name = ‘Joe’ and u.age > 10”
  • 6. Query Bind Parameters What Similar to a SQL prepared statement Parameters start with a ‘$’ and a number starting from 1 Examples: String queryString = “SELECT DISTINCT * FROM /exampleRegion p WHERE p.status = $1 and p.symbol = $2”; ... Object[] params = {“sold”, “abc”} SelectResults results = (SelectResults)query.execute(params); Possible Exceptions QueryParameterCountInvalidException TypeMismatchException Bind region as a parameter ● Binding region parameter requires actual region object and not the string name “SELECT DISTINCT * FROM $1 p WHERE p.status = $2”
  • 7. Field visibility and Method Invocation The query engine tries to evaluate the value using the public field value, if public field is not found makes a get call using field name (having its first character uppercase). Examples: SELECT DISTINCT * FROM /users u where u.firstName = 'Joe' SELECT DISTINCT * FROM /users u where u.getFirstName() = 'Joe' SELECT DISTINCT * FROM /users u where u.combineFullName() = ‘Joe’s Full Name’
  • 8. Type conversions The Geode query engine will implicitly do the following conversions Binary Numeric Promotion The query processor performs binary numeric promotion on the operands of the following operators: ● Operators <, <=, >, and >=, = and <> 1. If either operand is of type double, the other is converted to double 2. If either operand is of type float, the other is converted to float 3. If either operand is of type long, the other is converted to long 4. Both operands are converted to type int char Temporal Type Conversion java.util.Date , java.sql.Date , java.sql.Time , and java.sql.Timestamp are treated as nanosecond comparisons Enum Conversion are not done implicitly, a toString() call is needed Query Evaluation of Float.NaN and Double.NaN Float.NaN and Double.NaN are not evaluated as primitives; instead, they are compared in the same manner used as the JDK methods Float.compareTo and Double.compareTo
  • 9. Query a Partitioned Region Operations summary: 1.) “Coordinating” node calculates where all data resides 2.) Creates and executes tasks to query data on remote nodes a.) Each node will execute the query, using any indexes the node currently has 3.) Executes query on local node 4.) On failure, will recalculate where failed data now resides 5.) Executes tasks to query data on remote nodes that failed/where data now resides 6.) Combines data and returns
  • 10. Query Monitor Query Timeout - Set the system property - gemfire.Cache.MAX_QUERY_EXECUTION_TIME (default is disabled and set to -1) ResourceManager - Monitoring Queries for Low Memory Helps prevent out of memory exceptions when querying or creating indexes. This feature is automatically enabled when you set a critical-heap-percentage attribute for the resource-manager element in cache.xml or by using cache.getResourceManager().setCriticalHeapPercentage(float heapPercentage) API. If set, timeout is now set to 5 hours if one has not been set. Queries will be cancelled with QueryExecutionLowMemoryExcepton and InvalidIndexException Set the system property - gemfire.cache.DISABLE_QUERY_MONITOR_FOR_LOW_MEMORY to true to disable. Partitioned Region Queries and Low Memory Partitioned region queries are likely causes for out-of-memory exceptions. If query monitoring is enabled, partitioned region queries drop or ignore results that are being gathered by other servers if the executing server is low in memory.
  • 11. Indexing Why use an index? ● Significantly improve querying speeds. ● No longer iterate through the entire region when a matching index can be used Additional Info: ● Indexed fields must implement Comparable ● Provide simple way to index on fields, nested object fields, nested collection of objects/fields and nested maps Types: ● Functional Index ● Functional (Compact) Index ● Map index ● Hash Index ● Primary Key Index
  • 12. Functional Index A sorted index, internally represented as a tuple and copy of the value How to create qs.createIndex(“indexName”, “d.name”, “/users u, u.dependents d”); //(List or Set) qs.createIndex(“indexName”, “d.name”, “/users u, u.dependents.values d”); //(Map) Representation Key Values Sonny | Collection: [(User:Joe, Sonny)] Cheryl | Collection: [(User:Joe, Cheryl), (User:John, Cheryl)] Example query “select * from /users u, u.dependents d where d.name = ‘Sonny’” Restrictions: Cannot be created on overflow regions
  • 13. Functional Index (Compact) Memory savings over the non compact index at the expense of doing extra work during index maintenance. How to create qs.createIndex(“user names”, “u.name”, “/users u”); qs.createIndex(“user names”, “u.nestedObject.fieldName”, “/users u”); Representation Key Values Joe | Region Entry John | [Region Entry, Region Entry] Jerry | Collection(Region Entry, Region Entry) Restrictions: Index maintenance is synchronous Only when there is one iterator in the from clause (example: /users u) Additional Info: What about updates in progress? What about “in place modification”
  • 14. Key Index Creating a key index makes the query service aware of the relationship between the values in the region and the keys in the region. This allows the query service to translate a query using a key into a get. How to create: qs.createKeyIndex(“indexName”, “u.id”, “/users u”); Example Query: “select * from /users u where u.id = 1” Restrictions: Equality comparisons only
  • 15. Hash Index The good Saves on memory due to not storing index key values Hash values are computed from index key The bad Slower maintenance and query times Only a slight savings in memory Name is a bit misleading Representation Array: [ RE, RE, null, RE, REMOVED, null, RE, ...] How to create qs.createHashIndex(“indexName”, “u.name”, “/users u”); Restrictions: Only equality based queries Single iterator
  • 16. Map Index Allows indexing a map field of an object How to create: qs.createIndex("indexName", "u.name[*]", "/users u"); qs.createIndex("indexName", "u.name['first', 'middle']", "/users u"); In Gfsh: gfsh>create index --name="IndexName" --expression="u.name[‘first’, 'middle']" --region="/users u" Example of query: “SELECT * FROM /users u WHERE u.name['first'] = 'John' OR u.name['last'] = 'Smith'” Gotcha: Using u.name.get(‘first’) will not create or query the map index.
  • 17. Map Index... ‘first’ ‘middle’ ‘last’ Keys Range Index Key Value Joe Collection: [(User: Joe Bob, Joe)] John Collection:[(User:John Jacob Schmidt, John)] Jerry Collection:[(User:Jerry Schmidt, Jerry)] Range Index Key Value Jacob Collection:[User: John Jacob Schmidt, Jacob)] Range Index Key Value Bob Collection: [(User: Joe Bob, Bob)] Schmidt Collection:[(User:John Jacob Schmidt, Schmidt), (User:Jerry Schmidt, Schmidt)]] Values
  • 18. Multiple Index Creation Creating an multiple indexes on a populated region requires iterating that region for each index This has significant impact when we have overflow regions Same mechanism used when cache is brought up internally Example of multiple index creation: Cache cache = new CacheFactory().create(); QueryService queryService = cache.getQueryService(); queryService.defineIndex("name1", "indexExpr1", "regionPath1"); queryService.defineIndex("name2", "indexExpr2", "regionPath2"); queryService.defineHashIndex("name3", "indexExpr3", "regionPath2"); queryService.defineKeyIndex("name4", "indexExpr4", "regionPath2"); List<Index> indexes = queryService.createDefinedIndexes(); To clear any defined indexes that have not been created yet queryService.clearDefinedIndexes();
  • 19. Querying with Functions Benefits: ● Allows targeting specific nodes by filtering by partitioning key ● Closer to data ● Logic and computation on results from node, possibly less to send back Drawbacks: ● More work for users (writing the function) ● More work for users (registering the function)
  • 20. Equijoin Queries Restrictions: ● Must be colocated Problems: ● Slow due to cartesian ● Memory usage due to temporary joined result sets Some improvements are coming: ● Significantly reduce join time for single iterator filters where indexes can be used: “select * from /users u, /employees e where u.name = ‘John’ and u.id = e.id” “select * from /users u, /employees e where u.name = ‘John’ and u.age > 21 and u.id = e.id” “select * from /users u, /employees e, /office o where u.name = ‘John’” and u.id = e.id and e.location = o.location”
  • 21. General Tips/Tricks ● From clause of the query and index expression should match ● For AND operators, put the more selective filter first in the query ● Whenever possible, provide a hint to allow the query engine to prefer a specific index