SlideShare a Scribd company logo
BUILD A SUPERCHARGED DATAMART
WITH SOLR
Elliott Cordo
Chief Architect, Caserta Concepts
What is Solr
• Solr is an open source search platform
• Based on Core Lucene search technology
• Bundled up with an API, Management Tools, UI, scalability
But isn’t it just a search engine
• Although Solr was primarily architected as a search
engine, there is no reason you can’t use it as a database
• Search based application movement promotes search
engine as a data store
• Search has long been a “cheap” option fast and
interactive queries NoSQL and Hadoop datastores
So why would we use it
• Solr is fast –
• expect low ms response times on simple lookups
• properly tuned even complex queries will take less than 100ms
• Solr scales
• High concurrency
• Scales horizontally and vertically (larger hardware)
And it has the best query flexibility of any
NOSQL store!
..and in many cases RDBMS
• Grouping and Aggregation via Facets
• Fuzzy Search
• Equality and Range queries
• Geospatial capabilities
• HIGHLY extensible!
Another Datastore to Manage??
• Polygot persistence/polygot programming
• Feature/function will drive which technology should be
used
• Use the right tool for the job: Relational, MPP, Hadoop,
Graph, KV, NOSQL
Thankfully Solr is pretty easy to learn and manage!
When it works well
• Search is front and center  end users need to fuzzy
search dimensional attributes
• Flexible /Sparse schema
• Need for speed -> faster queries for more user
engagement
• Concurrency -> fast queries on client facing or open web
Use Cases
• Real time analytics  ingest incomming events from
Flume/Logstash/Custom app
• Supplement NOSQL, MPP, or Hadoop analytics
• Web facing analytics DB
What it doesn’t do well
• Joins across collections/cores (tables)
• Complex arbitrary queries
• Limited integration to standard ETL and BI frameworks
How do you get data in?
• A robust API
• Modules and libraries for just about any programing
language
• Index data in any DB via JDBC
• Pull in XML and Delimited files with Simple Posting Tool
• Flume/Logstash
NOTE: that like it NOSQL cousins, data needs to be
Flattened!
How do you interact with Solr?
HTTP, Nice concise query language
http://localhost:8983/solr/collection1/select?q=city:Yuma&wt=json&indent=true
And what does the response look like:
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"indent":"true",
"q":"city:Yuma",
"wt":"json"}},
"response":{"numFound":6,"start":0,"docs":[
{
"review_id":"JhUliQTD9iyGWov2nv-ZJA",
"stars":2,
"review_date":"2009-09-23T00:00:00Z",
"business_id":"gKRUdbTPBZ7kwBRCeZDDWA",
"business_name":"Wingate By Wyndham",
"city":"Yuma",
"state":"AZ",
"longitude":"-112.09343969999999",
"latitude":"33.434925100000001",
"user_id":"AqlZdDD7NK1fpQi9ltqIXQ",
"user_name":"Studl",
"_version_":1475098783569149953},
So what about analytic queries
select city, count(1)
from reviews
where state=‘AZ’
group by city
http://localhost:8983/solr/collection1/select?q=state:AZ&wt=json&indent
=true&facet=true&facet.field=city&rows=0&facet.mincount=1&facet.limit
=-1
More query fun
select city, count(1)
from reviews
where state=‘AZ’
and review_date between ‘2012-03-01’ and ‘2012-03-06’
group by city
having count(1)>=20
http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%
5B2012-03-01T23:59:59.999Z TO 2012-03-
06T00:00:00Z%5D&wt=json&indent=true&facet=true&facet.field=city&r
ows=0&facet.mincount=20&facet.limit=-1
Facet stats give you aggregation
http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012-
03-01T23:59:59.999Z%20TO%202012-03-
06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount=
1&facet.limit=-1&stats=true&stats.field=stars&stats.facet=city
"stats":{
"stats_fields":{
"stars":{
"min":1.0,
"max":5.0,
"count":991,
"missing":0,
"sum":3685.0,
"sumOfSquares":15313.0,
"mean":3.7184661957618568,
"stddev":1.2754290498612053,
"facets":{
"city":{
"Peoria":{
"min":1.0,
"max":5.0,
"count":14,
"missing":0,
"sum":54.0,
"sumOfSquares":234.0,
"mean":3.857142857142857,
"stddev":1.4064216928154862,
"facets":{}},
"Goodyear":{
"min":2.0,
"max":5.0,
"count":7,
….
Facet pivots too!
http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012-
03-01T23:59:59.999Z%20TO%202012-03-
06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount=
1&facet.limit=-1&facet.pivot=city,business_name
"facet_pivot":{
"city,business_name":[{
"field":"city",
"value":"Anthem",
"count":3,
"pivot":[{
"field":"business_name",
"value":"Outlets At Anthem",
"count":1},
{
"field":"business_name",
"value":"Q to U BBQ",
"count":1},
{
"field":"business_name",
"value":"Shanghai Club",
"count":1}]},
{
"field":"city",
"value":"Apache Junction",
"count":1,
"pivot":[{
"field":"business_name",
"value":"Lost Dutchman State Park",
"count":1}]},
UI please!
• Roll your own  it’s not that hard
• How about using Python Flask to render a Solr Response
to D3 or Google Charts
• Sometimes a custom solution is the best option
And the really easy way
Banana – A Solr port of Kibana!
Why should Elasticache fans have all the fun?
And it’s open source!
https://github.com/LucidWorks/banana
Banana
• An AngularJS app (pure javascript, runs in any
browser)
• Make a pretty dashboard with no development in a
couple minutes
• Very user friendly, users can create their own
content
https://github.com/Caserta-Concepts/Solr-Datamart
elliott@casertaconcepts.com

More Related Content

PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
PPTX
Building a data driven search application with LucidWorks SiLK
Lucidworks (Archived)
 
PDF
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
PDF
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Lucidworks
 
PDF
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
PPT
Configuring elasticsearch for performance and scale
Bharvi Dixit
 
PDF
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Lucidworks
 
PPTX
Webinar: Solr & Fusion for Big Data
Lucidworks
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
Building a data driven search application with LucidWorks SiLK
Lucidworks (Archived)
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Lucidworks
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
Configuring elasticsearch for performance and scale
Bharvi Dixit
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Lucidworks
 
Webinar: Solr & Fusion for Big Data
Lucidworks
 

What's hot (20)

PDF
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
PPTX
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
DataWorks Summit
 
PPTX
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Spark Summit
 
PPTX
Big Data Pipeline and Analytics Platform
Sudhir Tonse
 
PDF
Solr for Data Science
Grant Ingersoll
 
PDF
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
PDF
Technologies, Data Analytics Service and Enterprise Business
SATOSHI TAGOMORI
 
KEY
Cascalog at May Bay Area Hadoop User Group
nathanmarz
 
PDF
Managed Search: Presented by Jacob Graves, Getty Images
Lucidworks
 
ODP
Get involved with the Apache Software Foundation
Shalin Shekhar Mangar
 
PPTX
Real time analytics using Hadoop and Elasticsearch
Abhishek Andhavarapu
 
PPTX
Real Time search using Spark and Elasticsearch
Sigmoid
 
PDF
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks
 
PDF
Solr4 nosql search_server_2013
Lucidworks (Archived)
 
PDF
Demystifying Data Engineering
nathanmarz
 
PPTX
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
Lucidworks
 
PDF
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Lucidworks
 
PDF
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Lucidworks
 
PDF
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
PPTX
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
DataWorks Summit
 
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Spark Summit
 
Big Data Pipeline and Analytics Platform
Sudhir Tonse
 
Solr for Data Science
Grant Ingersoll
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
Technologies, Data Analytics Service and Enterprise Business
SATOSHI TAGOMORI
 
Cascalog at May Bay Area Hadoop User Group
nathanmarz
 
Managed Search: Presented by Jacob Graves, Getty Images
Lucidworks
 
Get involved with the Apache Software Foundation
Shalin Shekhar Mangar
 
Real time analytics using Hadoop and Elasticsearch
Abhishek Andhavarapu
 
Real Time search using Spark and Elasticsearch
Sigmoid
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks
 
Solr4 nosql search_server_2013
Lucidworks (Archived)
 
Demystifying Data Engineering
nathanmarz
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
Lucidworks
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Lucidworks
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Lucidworks
 
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
Ad

Similar to Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr (20)

PPTX
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
PPTX
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
PPTX
Oracle OpenWo2014 review part 03 three_paa_s_database
Getting value from IoT, Integration and Data Analytics
 
PDF
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
PDF
Search On Hadoop Frontier Meetup
gregchanan
 
PDF
Search On Hadoop
bigdatagurus_meetup
 
PDF
Solr + Hadoop = Big Data Search
Mark Miller
 
PPTX
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
PPTX
Apache drill
MapR Technologies
 
PDF
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
PPT
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
PDF
Technologies for Data Analytics Platform
N Masahiro
 
PDF
Integrating Hadoop & Solr
Lucidworks (Archived)
 
PPTX
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Kyle Banerjee
 
PPTX
Jethro for tableau webinar (11 15)
Remy Rosenbaum
 
PPTX
No sql and sql - open analytics summit
Open Analytics
 
PDF
Open Source SQL Databases
Emanuel Calvo
 
PPTX
Practical Machine Learning for Smarter Search with Spark+Solr
Jake Mannix
 
PPTX
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
 
PDF
Search onhadoopsfhug081413
gregchanan
 
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Getting value from IoT, Integration and Data Analytics
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
Search On Hadoop Frontier Meetup
gregchanan
 
Search On Hadoop
bigdatagurus_meetup
 
Solr + Hadoop = Big Data Search
Mark Miller
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Apache drill
MapR Technologies
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Technologies for Data Analytics Platform
N Masahiro
 
Integrating Hadoop & Solr
Lucidworks (Archived)
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Kyle Banerjee
 
Jethro for tableau webinar (11 15)
Remy Rosenbaum
 
No sql and sql - open analytics summit
Open Analytics
 
Open Source SQL Databases
Emanuel Calvo
 
Practical Machine Learning for Smarter Search with Spark+Solr
Jake Mannix
 
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
 
Search onhadoopsfhug081413
gregchanan
 
Ad

More from Caserta (20)

PPTX
Using Machine Learning & Spark to Power Data-Driven Marketing
Caserta
 
PPTX
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Caserta
 
PDF
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 
PDF
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Caserta
 
PDF
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Caserta
 
PPTX
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
PDF
Introduction to Data Science (Data Summit, 2017)
Caserta
 
PDF
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Caserta
 
PDF
The Rise of the CDO in Today's Enterprise
Caserta
 
PDF
Building a New Platform for Customer Analytics
Caserta
 
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Caserta
 
PDF
You're the New CDO, Now What?
Caserta
 
PDF
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
PDF
Making Big Data Easy for Everyone
Caserta
 
PDF
Benefits of the Azure Cloud
Caserta
 
PDF
Big Data Analytics on the Cloud
Caserta
 
PDF
Intro to Data Science on Hadoop
Caserta
 
PDF
The Emerging Role of the Data Lake
Caserta
 
PDF
Not Your Father's Database by Databricks
Caserta
 
PDF
Mastering Customer Data on Apache Spark
Caserta
 
Using Machine Learning & Spark to Power Data-Driven Marketing
Caserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
Introduction to Data Science (Data Summit, 2017)
Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Caserta
 
The Rise of the CDO in Today's Enterprise
Caserta
 
Building a New Platform for Customer Analytics
Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Caserta
 
You're the New CDO, Now What?
Caserta
 
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
Making Big Data Easy for Everyone
Caserta
 
Benefits of the Azure Cloud
Caserta
 
Big Data Analytics on the Cloud
Caserta
 
Intro to Data Science on Hadoop
Caserta
 
The Emerging Role of the Data Lake
Caserta
 
Not Your Father's Database by Databricks
Caserta
 
Mastering Customer Data on Apache Spark
Caserta
 

Recently uploaded (20)

PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
The Future of Artificial Intelligence (AI)
Mukul
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr

  • 1. BUILD A SUPERCHARGED DATAMART WITH SOLR Elliott Cordo Chief Architect, Caserta Concepts
  • 2. What is Solr • Solr is an open source search platform • Based on Core Lucene search technology • Bundled up with an API, Management Tools, UI, scalability
  • 3. But isn’t it just a search engine • Although Solr was primarily architected as a search engine, there is no reason you can’t use it as a database • Search based application movement promotes search engine as a data store • Search has long been a “cheap” option fast and interactive queries NoSQL and Hadoop datastores
  • 4. So why would we use it • Solr is fast – • expect low ms response times on simple lookups • properly tuned even complex queries will take less than 100ms • Solr scales • High concurrency • Scales horizontally and vertically (larger hardware)
  • 5. And it has the best query flexibility of any NOSQL store! ..and in many cases RDBMS • Grouping and Aggregation via Facets • Fuzzy Search • Equality and Range queries • Geospatial capabilities • HIGHLY extensible!
  • 6. Another Datastore to Manage?? • Polygot persistence/polygot programming • Feature/function will drive which technology should be used • Use the right tool for the job: Relational, MPP, Hadoop, Graph, KV, NOSQL Thankfully Solr is pretty easy to learn and manage!
  • 7. When it works well • Search is front and center  end users need to fuzzy search dimensional attributes • Flexible /Sparse schema • Need for speed -> faster queries for more user engagement • Concurrency -> fast queries on client facing or open web
  • 8. Use Cases • Real time analytics  ingest incomming events from Flume/Logstash/Custom app • Supplement NOSQL, MPP, or Hadoop analytics • Web facing analytics DB
  • 9. What it doesn’t do well • Joins across collections/cores (tables) • Complex arbitrary queries • Limited integration to standard ETL and BI frameworks
  • 10. How do you get data in? • A robust API • Modules and libraries for just about any programing language • Index data in any DB via JDBC • Pull in XML and Delimited files with Simple Posting Tool • Flume/Logstash NOTE: that like it NOSQL cousins, data needs to be Flattened!
  • 11. How do you interact with Solr? HTTP, Nice concise query language http://localhost:8983/solr/collection1/select?q=city:Yuma&wt=json&indent=true And what does the response look like: "responseHeader":{ "status":0, "QTime":1, "params":{ "indent":"true", "q":"city:Yuma", "wt":"json"}}, "response":{"numFound":6,"start":0,"docs":[ { "review_id":"JhUliQTD9iyGWov2nv-ZJA", "stars":2, "review_date":"2009-09-23T00:00:00Z", "business_id":"gKRUdbTPBZ7kwBRCeZDDWA", "business_name":"Wingate By Wyndham", "city":"Yuma", "state":"AZ", "longitude":"-112.09343969999999", "latitude":"33.434925100000001", "user_id":"AqlZdDD7NK1fpQi9ltqIXQ", "user_name":"Studl", "_version_":1475098783569149953},
  • 12. So what about analytic queries select city, count(1) from reviews where state=‘AZ’ group by city http://localhost:8983/solr/collection1/select?q=state:AZ&wt=json&indent =true&facet=true&facet.field=city&rows=0&facet.mincount=1&facet.limit =-1
  • 13. More query fun select city, count(1) from reviews where state=‘AZ’ and review_date between ‘2012-03-01’ and ‘2012-03-06’ group by city having count(1)>=20 http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:% 5B2012-03-01T23:59:59.999Z TO 2012-03- 06T00:00:00Z%5D&wt=json&indent=true&facet=true&facet.field=city&r ows=0&facet.mincount=20&facet.limit=-1
  • 14. Facet stats give you aggregation http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012- 03-01T23:59:59.999Z%20TO%202012-03- 06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount= 1&facet.limit=-1&stats=true&stats.field=stars&stats.facet=city "stats":{ "stats_fields":{ "stars":{ "min":1.0, "max":5.0, "count":991, "missing":0, "sum":3685.0, "sumOfSquares":15313.0, "mean":3.7184661957618568, "stddev":1.2754290498612053, "facets":{ "city":{ "Peoria":{ "min":1.0, "max":5.0, "count":14, "missing":0, "sum":54.0, "sumOfSquares":234.0, "mean":3.857142857142857, "stddev":1.4064216928154862, "facets":{}}, "Goodyear":{ "min":2.0, "max":5.0, "count":7, ….
  • 15. Facet pivots too! http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012- 03-01T23:59:59.999Z%20TO%202012-03- 06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount= 1&facet.limit=-1&facet.pivot=city,business_name "facet_pivot":{ "city,business_name":[{ "field":"city", "value":"Anthem", "count":3, "pivot":[{ "field":"business_name", "value":"Outlets At Anthem", "count":1}, { "field":"business_name", "value":"Q to U BBQ", "count":1}, { "field":"business_name", "value":"Shanghai Club", "count":1}]}, { "field":"city", "value":"Apache Junction", "count":1, "pivot":[{ "field":"business_name", "value":"Lost Dutchman State Park", "count":1}]},
  • 16. UI please! • Roll your own  it’s not that hard • How about using Python Flask to render a Solr Response to D3 or Google Charts • Sometimes a custom solution is the best option
  • 17. And the really easy way Banana – A Solr port of Kibana! Why should Elasticache fans have all the fun? And it’s open source! https://github.com/LucidWorks/banana
  • 18. Banana • An AngularJS app (pure javascript, runs in any browser) • Make a pretty dashboard with no development in a couple minutes • Very user friendly, users can create their own content