SlideShare a Scribd company logo
MySQL Anti-Queries
and Sphinx Search
Percona Live, MySQL Users Conference
Santa Clara, 2013
PALOMINODB
OPERATIONAL EXCELLENCE
FOR DATABASES
Vlad Fedorkov
www.palominodb.com
Let’s meet each other
• Performance geek
• Working for PalominoDB
o http://www.palominodb.com
• Twitter @vfedorkov
• Now fighting with clouds
What is MySQL
• Database we love
• Database we use
o So sometimes we hate it too
o Depends on how much data you have
• Most important
o Database that improving
 Thanks to Percona, Oracle and MariaDB!
 Special thanks to MySQL community
• Does it work in every case?
MySQL today
• Software
o Can’t complain about concurrency anymore
o Warm up is much-much better today
o Optimizer doing much better now
o Replication is now multi-threaded
• Environment
o Flash storage makes IO very fast and concurrent
o Cheap and fast RAM
o Awesome cloud services available
o Galera, Tungsten, MySQL cluster
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
What’s wrong with MySQL?
• One-by-one:
o ACID compliance is inconveniently slow
o B-Tree index is not perfect for some queries
o Query execution is single threaded
o Full-text indexes are not that fast
o Replication wants to keep data consistent
o lot more inconvenient things just ask me!
• So maybe some queries are bad, not MySQL
itself?
MySQL as a microscope
• Excellent for specific operations
o Very fast at primary key lookup
o Fast enough on index lookups
 some range scans too
• Makes sure your data is safe
• Optimized for transactional (InnoDB)
workload
Full-table scans
• SELECT * FROM table WHERE myfunc(a) = 5
• SELECT * FROM table WHERE a NOT IN
(1,2,3)
o WHERE a <> 5
• SELECT * FROM table WHERE c LIKE ‘%ing’
• SELECT COUNT(*) FROM …
• Reading the whole table is expensive
• Buffer pool flooded with rarely used data
• Indexes are not helpful
Inefficient index usage
• SELECT * FROM table WHERE enabled=0
o archived, deleted
• SELECT * FROM table WHERE sex=1
o WHERE age_range=5
• SELECT * FROM table WHERE state=CA
o WHERE city=‘New York’
• Better than full-table scan but still inefficient
Some sorting operations
• ORDER BY against non indexed field
• ORDER BY a DESC, b ASC, c DESC
• ORDER BY RAND() ASC
o ORDER BY RAND() DESC works much better
Temporary table operations
• Some operations forces MySQL to create
o In memory table
o On disk table
• Subqueries
• GROUP BY
• DISTINCT + ORDER BY
o If you have TEXT or BLOB field MySQL will create on
disk table at all times
These queries like nails
C’mon, you can do that!!!
… or use something more convenient?
What do we need?
• Avoid full-scans as much as we can
o Make them faster when we can’t get rid of them
• Avoid slow IO operations
o Put most data to memory
 It is cheap now
Let’s meet Sphinx
• Stand alone daemon
o Written in C++ and very fast
• Works without MySQL
o At all
• Can be connected using mysql client
o PHP/Ruby/JDBC/.NET/etc
• Runs on lots of platforms
o Linux/Windows/Mac/everywhere
Sphinx
Pros
• Highly scalable
• Keeps vital data in
memory
• Very fast in scans
• Powerful full-text syntax
• High availability support
Cons (yet)
• Don’t have B-tree indexes
• No out-of-the-box replication
from MySQL
• Not completely ACID
Full-table scans
• Sphinx keeps attributes in memory at all
times
• It scale queries
o Well actually you have to tell how
 But only once
• Limit amount of documents to walk though
• You can utilize Full-Text search power
o As a nice bonus 
Full-Text functions
• And, Or
o hello | world, hello & world
• Not
o hello -world
• Per-field search
o @title hello @body world
• Field combination
o @(title, body) hello world
• Search within first N
o @body[50] hello
• Phrase search
o “hello world”
• Per-field weights
• Proximity search
o “hello world”~10
• Distance support
o hello NEAR/10 world
• Quorum matching
o "the world is a wonderful
place"/3
• Exact form modifier
o “raining =cats and =dogs”
• Strict order
• Sentence / Zone / Paragraph
• Custom documents weighting
& ranking
SphinxQL.
mysql> SELECT id, ...
-> FROM myisam_table
-> WHERE MATCH(title, content_ft)
-> AGAINST ('I love sphinx') LIMIT 10;
...
10 rows in set (1.18 sec)
mysql> SELECT * FROM sphinx_index
-> WHERE MATCH('I love Sphinx') LIMIT 10;
...
10 rows in set (0.05 sec)
MySQL
Sphinx
Just like MySQL
$ mysql -h 0 -P 9306
Welcome to the MySQL monitor. Commands
end with ; or g.
Your MySQL connection id is 1
Server version: 2.1.0-id64-dev (r3028)
Type 'help;' or 'h' for help. Type 'c'
to clear the current input statement.
mysql>
What is SphinxQL?
• SQL-based query language in Sphinx
• Works without MySQL!
o MySQL client library required
 JDCB, .NET, PHP, Ruby
• Required different port
• Almost same syntax as in MySQL
o But not exactly the same
SQL & SphinxQL
● WITHIN GROUP ORDER BY
● OPTION support for fine tuning
● weights, matches and query time control
● SHOW META query information
● CALL SNIPPETS let you create snippets
● CALL KEYWORDS for statistics
Extended syntax
mysql> SELECT …, YEAR(ts) as yr
-> FROM sphinx_index
-> WHERE MATCH('I love Sphinx')
-> GROUP BY yr
-> WITHIN GROUP ORDER BY rating DESC
-> ORDER BY yr DESC
-> LIMIT 5
-> OPTION field_weights=(title=100, content=1);
+---------+--------+------------+------------+------+----------+--------+
| id | weight | channel_id | ts | yr | @groupby | @count |
+---------+--------+------------+------------+------+----------+--------+
| 7637682 | 101652 | 358842 | 1112905663 | 2005 | 2005 | 14 |
| 6598265 | 101612 | 454928 | 1102858275 | 2004 | 2004 | 27 |
| 7139960 | 1642 | 403287 | 1070220903 | 2003 | 2003 | 8 |
| 5340114 | 1612 | 537694 | 1020213442 | 2002 | 2002 | 1 |
| 5744405 | 1588 | 507895 | 995415111 | 2001 | 2001 | 1 |
+---------+--------+------------+------------+------+----------+--------+
5 rows in set (0.00 sec)
GEO-Distance support
• Bumping up local results
o Requires coordinates for each document
o Two pairs of float values (Latitude, Longitude)
• GEODIST(Lat, Long, Lat2, Long2) in Sphinx
SELECT *, GEODIST(docs_lat, doc_long, %d1, %d2) as dist,
FROM sphinx_index
ORDER BY dist DESC
LIMIT 0, 20
Range support
• Price ranges (items, offers)
• Date range (blog posts and news articles)
• Ratings, review points
• INTERVAL(field, x0, x1, …, xN)
SELECT
INTERVAL(item_price, 0, 20, 50, 90) as range,
@count
FROM my_sphinx_products
GROUP BY range
ORDER BY range ASC;
Extended services
• Drill-down (narrow search, faceted search)
• Typos correction
• Search string autocompletion
• Related documents
Misspells correction service
• Provides correct search phrase
o “Did you mean” service
• Allows to replace user’s search on the fly
o if we’re sure it’s a typo
 “ophone”, “uphone”, etc
o Saves time and makes website look smart
• Based on your actual database
o Effective if you DO have correct words in index
Autocompletion service
• Suggest search queries as user types
o Show most popular queries
o Promote searches that leads to desired pages
o Might include misspells correction
Related search
• Improving visitor experience
o Providing easier access to useful pages
o Keep customer on the website
o Increasing sales and server’s load average
• Based on documents similarity
o Different for shopping items and texts
o Ends up in data mining
Excerpts (snippets)
• BuildExcerpts() or CALL SNIPPETS
• Options
o before_match (<b>)
o after_match (</b>)
o chunk_separator (…)
o limit
o around
o force_all_words
Installation: How?
● From binary packages
● http://sphinxsearch.com/downloads/
● From source
● http://sphinxsearch.googlecode.com/svn/
● configure && make && make install
– Make sure to use --enable-id64
– for huge document collection
– already included in pre-compiled packages
Architecture sample
Initial configuration: indexing
• Where to look for data?
• How to process it?
• Where to store index?
Where to look for the data?
● MySQL
● PostgreSQL
● MSSQL
● ODBC source
● XML pipe
MySQL source
source data_source
{
…
sql_query = 
SELECT id, channel_id, ts, title,
content 
FROM mytable
sql_attr_uint = channel_id
sql_attr_timestamp = ts
…
}
A complete version
source data_source
{
type = mysql
sql_host = localhost
sql_user = my_user
sql_pass = my******
sql_db = test
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, channel_id, ts, title, content 
FROM mytable 
WHERE id>=$start and id<=$end
sql_attr_uint = channel_id
sql_attr_timestamp = ts
sql_query_range = SELECT MIN(id), MAX(id) FROM mytable
sql_range_step = 1000
}
How to process. Index config.
index my_sphinx_index
{
source = data_source
path = /my/index/path/idx
html_strip = 1
morphology = stem_en
stopwords = stopwords.txt
charset_type = utf-8
}
Indexer configuration
indexer
{
mem_limit = 512M
max_iops = 40
max_iosize = 1048576
}
Running indexer
$ ./indexer my_sphinx_index
Sphinx 2.0.2-dev (r2824)
Copyright (c) 2001-2010, Andrew Aksyonoff
Copyright (c) 2008-2010, Sphinx Technologies Inc (http://sph...
using config file './sphinx.conf'...
indexing index 'my_sphinx_index'...
collected 999944 docs, 1318.1 MB
sorted 224.2 Mhits, 100.0% done
total 999944 docs, 1318101119 bytes
total 158.080 sec, 8338160 bytes/sec, 6325.53 docs/sec
total 33 reads, 4.671 sec, 17032.9 kb/call avg, 141.5 msec/call
total 361 writes, 20.889 sec, 3566.1 kb/call avg, 57.8 msec/call
Index files
$ ls -lah idx*
-rw-r--r-- 1 vlad vlad 12M 2010-12-22 09:01 idx.spa
-rw-r--r-- 1 vlad vlad 334M 2010-12-22 09:01 idx.spd
-rw-r--r-- 1 vlad vlad 438 2010-12-22 09:01 idx.sph
-rw-r--r-- 1 vlad vlad 13M 2010-12-22 09:01 idx.spi
-rw-r--r-- 1 vlad vlad 0 2010-12-22 09:01 idx.spk
-rw-r--r-- 1 vlad vlad 0 2011-05-13 09:25 idx.spl
-rw-r--r-- 1 vlad vlad 0 2010-12-22 09:01 idx.spm
-rw-r--r-- 1 vlad vlad 111M 2010-12-22 09:01 idx.spp
-rw-r--r-- 1 vlad vlad 1 2010-12-22 09:01 idx.sps
$
Next Steps
• Run the daemon
• Connect to daemon
• Run search query
Configuring searchd
searchd
{
listen = localhost:9312
listen = localhost:9306:mysql4
query_log = query.log
query_log_format = sphinxql
pid_file = searchd.pid
}
Running sphinx daemon!
$ ../bin/searchd -c sphinx.conf
Sphinx 2.0.2-dev (r2824)
Copyright (c) 2001-2010, Andrew Aksyonoff
Copyright (c) 2008-2010, Sphinx Technologies
Inc (http://sphinxsearch.com)
using config file 'sphinx.conf'...
listening on 127.0.0.1:9312
listening on 127.0.0.1:9306
precaching index ‘idx'
precached 1 indexes in 0.028 sec
Connecting to Sphinx
• Sphinx API
o PHP, Python, Java, Ruby, C is included in distro
o .NET, Rails (via Thinking Sphinx) via third party libs
• SphinxQL
o MySQL-compatible protocol
• SphinxSE
o Storage engine for MySQL
Sphinx applications
• Find relevant documents
o Items in store(s)
o Articles in blog/forum/news/etc website(s)
o Pictures or photos
 By text, description, GEO-data, publish time, etc
o Friends
 In social networks or dating websites
• Offload main database from heavy queries
• Build advanced search and search-based
services
Another way to speed up is scaling
• Combine diffrent indexes
o Main + Delta
o Ondisk + RT
o Distributed and local
 Don't forget about dist_threads!
• Use parallel indexing
OnDisk indexes
Bright side of scaling
• Faster search
• Better load control
• Hardware utilization
• High availability
Dark side
• Hardware faults
• Network issues
• Balancing issues
o Search time related to slowest search chunk
• Complicated operations
ToDo for attendees
• Please rate this talk!
• Submit your feedback to @vfedorkov
• Visit PalominoDB booth!
• Andrew Aksenoff’s talk about new Sphinx
features, don’t miss it.
o Tomorrow, 4:30pm - 5:20pm @ Ballroom E
• Questions!
Thank you!
Twitter @vfedorkov
Website: http://palominodb.com

More Related Content

What's hot (20)

PDF
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Lucidworks
 
PDF
Gizzard, DAL and more
fulin tang
 
KEY
Apache Solr - Enterprise search platform
Tommaso Teofili
 
PPT
Finite State Queries In Lucene
otisg
 
PDF
Flexible search in Apache Jackrabbit Oak
Tommaso Teofili
 
PDF
Not Just ORM: Powerful Hibernate ORM Features and Capabilities
Brett Meyer
 
PDF
Tuning Linux for your database FLOSSUK 2016
Colin Charles
 
PPTX
Cassandra and Clojure
nickmbailey
 
PDF
Redis everywhere - PHP London
Ricard Clau
 
ODP
Introduction to Apache solr
Knoldus Inc.
 
PDF
Rails 6 Multi-DB 実戦投入
kiyots
 
PDF
Databases in the hosted cloud
Colin Charles
 
PDF
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Hakka Labs
 
PDF
Solr Indexing and Analysis Tricks
Erik Hatcher
 
PPTX
Day 2 - Intro to Rails
Barry Jones
 
PPT
Elastic search apache_solr
macrochen
 
KEY
You know, for search. Querying 24 Billion Documents in 900ms
Jodok Batlogg
 
PPTX
Day 4 - Models
Barry Jones
 
PPT
DSpace UI Prototype Challenge: Spring Boot + Thymeleaf
Tim Donohue
 
PPTX
Apache Solr Workshop
JSGB
 
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Lucidworks
 
Gizzard, DAL and more
fulin tang
 
Apache Solr - Enterprise search platform
Tommaso Teofili
 
Finite State Queries In Lucene
otisg
 
Flexible search in Apache Jackrabbit Oak
Tommaso Teofili
 
Not Just ORM: Powerful Hibernate ORM Features and Capabilities
Brett Meyer
 
Tuning Linux for your database FLOSSUK 2016
Colin Charles
 
Cassandra and Clojure
nickmbailey
 
Redis everywhere - PHP London
Ricard Clau
 
Introduction to Apache solr
Knoldus Inc.
 
Rails 6 Multi-DB 実戦投入
kiyots
 
Databases in the hosted cloud
Colin Charles
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Hakka Labs
 
Solr Indexing and Analysis Tricks
Erik Hatcher
 
Day 2 - Intro to Rails
Barry Jones
 
Elastic search apache_solr
macrochen
 
You know, for search. Querying 24 Billion Documents in 900ms
Jodok Batlogg
 
Day 4 - Models
Barry Jones
 
DSpace UI Prototype Challenge: Spring Boot + Thymeleaf
Tim Donohue
 
Apache Solr Workshop
JSGB
 

Viewers also liked (20)

PDF
Advanced fulltext search with Sphinx
Adrian Nuta
 
PDF
Real time fulltext search with sphinx
Adrian Nuta
 
PDF
Zurich2007 MySQL Query Optimization
Hiệp Lê Tuấn
 
PDF
Advanced MySQL Query Tuning
Alexander Rubin
 
PPT
Building High Performance MySql Query Systems And Analytic Applications
guest40cda0b
 
PDF
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
Dave Stokes
 
PDF
56 Query Optimization
MYXPLAIN
 
PDF
Mysql query optimization
Baohua Cai
 
PDF
Query Optimization with MySQL 5.6: Old and New Tricks
MYXPLAIN
 
PPTX
Tunning sql query
vuhaininh88
 
PDF
MySQL Query tuning 101
Sveta Smirnova
 
PDF
ďżźAdvanced MySQL Query and Schema Tuning
MYXPLAIN
 
PDF
MySQL Query Optimization
Morgan Tocker
 
PPT
My sql optimization
PrelovacMedia
 
PDF
Webinar 2013 advanced_query_tuning
晓 周
 
PDF
MySQL Query Optimization.
Remote MySQL DBA
 
PDF
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
 
PDF
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Jaime Crespo
 
PDF
MySQL Query Optimization (Basics)
Karthik .P.R
 
PDF
Sql query patterns, optimized
Karwin Software Solutions LLC
 
Advanced fulltext search with Sphinx
Adrian Nuta
 
Real time fulltext search with sphinx
Adrian Nuta
 
Zurich2007 MySQL Query Optimization
Hiệp Lê Tuấn
 
Advanced MySQL Query Tuning
Alexander Rubin
 
Building High Performance MySql Query Systems And Analytic Applications
guest40cda0b
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
Dave Stokes
 
56 Query Optimization
MYXPLAIN
 
Mysql query optimization
Baohua Cai
 
Query Optimization with MySQL 5.6: Old and New Tricks
MYXPLAIN
 
Tunning sql query
vuhaininh88
 
MySQL Query tuning 101
Sveta Smirnova
 
ďżźAdvanced MySQL Query and Schema Tuning
MYXPLAIN
 
MySQL Query Optimization
Morgan Tocker
 
My sql optimization
PrelovacMedia
 
Webinar 2013 advanced_query_tuning
晓 周
 
MySQL Query Optimization.
Remote MySQL DBA
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
 
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Jaime Crespo
 
MySQL Query Optimization (Basics)
Karthik .P.R
 
Sql query patterns, optimized
Karwin Software Solutions LLC
 
Ad

Similar to MYSQL Query Anti-Patterns That Can Be Moved to Sphinx (20)

PDF
Sphinx new
rit2010
 
PPT
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf Conference
 
PDF
Using Sphinx for Search in PHP
Mike Lively
 
PDF
Plugin Opensql2008 Sphinx
Liu Lizhi
 
PDF
PostgreSQL and Sphinx pgcon 2013
Emanuel Calvo
 
PDF
MariaDB with SphinxSE
Colin Charles
 
PPT
SphinxSE with MySQL
Ritesh Puthran
 
PPTX
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
PPTX
Sphinx
shinsantiger
 
PPTX
Percona Live London 2014: Serve out any page with an HA Sphinx environment
spil-engineering
 
PPTX
Sphinx2
shinsantiger
 
PPT
Xapian vs sphinx
panjunyong
 
PDF
Sphinx: Leveraging Scalable Search in Drupal
elliando dias
 
PDF
MySQL Indexing
BADR
 
PDF
15 MySQL Basics #burningkeyboards
Denis Ristic
 
PDF
Scaling / optimizing search on netlog
removed_8e0e1d901e47de676f36b9b89e06dc97
 
PDF
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
PDF
U C2007 My S Q L Performance Cookbook
guestae36d0
 
PPTX
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
PPT
Phpconf2008 Sphinx En
Murugan Krishnamoorthy
 
Sphinx new
rit2010
 
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf Conference
 
Using Sphinx for Search in PHP
Mike Lively
 
Plugin Opensql2008 Sphinx
Liu Lizhi
 
PostgreSQL and Sphinx pgcon 2013
Emanuel Calvo
 
MariaDB with SphinxSE
Colin Charles
 
SphinxSE with MySQL
Ritesh Puthran
 
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
Sphinx
shinsantiger
 
Percona Live London 2014: Serve out any page with an HA Sphinx environment
spil-engineering
 
Sphinx2
shinsantiger
 
Xapian vs sphinx
panjunyong
 
Sphinx: Leveraging Scalable Search in Drupal
elliando dias
 
MySQL Indexing
BADR
 
15 MySQL Basics #burningkeyboards
Denis Ristic
 
Scaling / optimizing search on netlog
removed_8e0e1d901e47de676f36b9b89e06dc97
 
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
U C2007 My S Q L Performance Cookbook
guestae36d0
 
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
Phpconf2008 Sphinx En
Murugan Krishnamoorthy
 
Ad

More from Pythian (9)

PDF
DB Engineering - From Antiquated to Engineer
Pythian
 
PDF
TechTalk v2.0 - Performance tuning Cassandra + AWS
Pythian
 
PDF
Percona Live 2014 - Scaling MySQL in AWS
Pythian
 
PDF
MySQL administration in Amazon RDS
Pythian
 
PDF
Maximizing SQL Reviews and Tuning with pt-query-digest
Pythian
 
PDF
Online Schema Changes for Maximizing Uptime
Pythian
 
PDF
MYSQL Patterns in Amazon - Make the Cloud Work For You
Pythian
 
PDF
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Pythian
 
ODP
Pdb my sql backup london percona live 2012
Pythian
 
DB Engineering - From Antiquated to Engineer
Pythian
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
Pythian
 
Percona Live 2014 - Scaling MySQL in AWS
Pythian
 
MySQL administration in Amazon RDS
Pythian
 
Maximizing SQL Reviews and Tuning with pt-query-digest
Pythian
 
Online Schema Changes for Maximizing Uptime
Pythian
 
MYSQL Patterns in Amazon - Make the Cloud Work For You
Pythian
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Pythian
 
Pdb my sql backup london percona live 2012
Pythian
 

Recently uploaded (20)

PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 

MYSQL Query Anti-Patterns That Can Be Moved to Sphinx

  • 1. MySQL Anti-Queries and Sphinx Search Percona Live, MySQL Users Conference Santa Clara, 2013 PALOMINODB OPERATIONAL EXCELLENCE FOR DATABASES Vlad Fedorkov www.palominodb.com
  • 2. Let’s meet each other • Performance geek • Working for PalominoDB o http://www.palominodb.com • Twitter @vfedorkov • Now fighting with clouds
  • 3. What is MySQL • Database we love • Database we use o So sometimes we hate it too o Depends on how much data you have • Most important o Database that improving  Thanks to Percona, Oracle and MariaDB!  Special thanks to MySQL community • Does it work in every case?
  • 4. MySQL today • Software o Can’t complain about concurrency anymore o Warm up is much-much better today o Optimizer doing much better now o Replication is now multi-threaded • Environment o Flash storage makes IO very fast and concurrent o Cheap and fast RAM o Awesome cloud services available o Galera, Tungsten, MySQL cluster
  • 6. What’s wrong with MySQL? • One-by-one: o ACID compliance is inconveniently slow o B-Tree index is not perfect for some queries o Query execution is single threaded o Full-text indexes are not that fast o Replication wants to keep data consistent o lot more inconvenient things just ask me! • So maybe some queries are bad, not MySQL itself?
  • 7. MySQL as a microscope • Excellent for specific operations o Very fast at primary key lookup o Fast enough on index lookups  some range scans too • Makes sure your data is safe • Optimized for transactional (InnoDB) workload
  • 8. Full-table scans • SELECT * FROM table WHERE myfunc(a) = 5 • SELECT * FROM table WHERE a NOT IN (1,2,3) o WHERE a <> 5 • SELECT * FROM table WHERE c LIKE ‘%ing’ • SELECT COUNT(*) FROM … • Reading the whole table is expensive • Buffer pool flooded with rarely used data • Indexes are not helpful
  • 9. Inefficient index usage • SELECT * FROM table WHERE enabled=0 o archived, deleted • SELECT * FROM table WHERE sex=1 o WHERE age_range=5 • SELECT * FROM table WHERE state=CA o WHERE city=‘New York’ • Better than full-table scan but still inefficient
  • 10. Some sorting operations • ORDER BY against non indexed field • ORDER BY a DESC, b ASC, c DESC • ORDER BY RAND() ASC o ORDER BY RAND() DESC works much better
  • 11. Temporary table operations • Some operations forces MySQL to create o In memory table o On disk table • Subqueries • GROUP BY • DISTINCT + ORDER BY o If you have TEXT or BLOB field MySQL will create on disk table at all times
  • 13. C’mon, you can do that!!!
  • 14. … or use something more convenient?
  • 15. What do we need? • Avoid full-scans as much as we can o Make them faster when we can’t get rid of them • Avoid slow IO operations o Put most data to memory  It is cheap now
  • 16. Let’s meet Sphinx • Stand alone daemon o Written in C++ and very fast • Works without MySQL o At all • Can be connected using mysql client o PHP/Ruby/JDBC/.NET/etc • Runs on lots of platforms o Linux/Windows/Mac/everywhere
  • 17. Sphinx Pros • Highly scalable • Keeps vital data in memory • Very fast in scans • Powerful full-text syntax • High availability support Cons (yet) • Don’t have B-tree indexes • No out-of-the-box replication from MySQL • Not completely ACID
  • 18. Full-table scans • Sphinx keeps attributes in memory at all times • It scale queries o Well actually you have to tell how  But only once • Limit amount of documents to walk though • You can utilize Full-Text search power o As a nice bonus 
  • 19. Full-Text functions • And, Or o hello | world, hello & world • Not o hello -world • Per-field search o @title hello @body world • Field combination o @(title, body) hello world • Search within first N o @body[50] hello • Phrase search o “hello world” • Per-field weights • Proximity search o “hello world”~10 • Distance support o hello NEAR/10 world • Quorum matching o "the world is a wonderful place"/3 • Exact form modifier o “raining =cats and =dogs” • Strict order • Sentence / Zone / Paragraph • Custom documents weighting & ranking
  • 20. SphinxQL. mysql> SELECT id, ... -> FROM myisam_table -> WHERE MATCH(title, content_ft) -> AGAINST ('I love sphinx') LIMIT 10; ... 10 rows in set (1.18 sec) mysql> SELECT * FROM sphinx_index -> WHERE MATCH('I love Sphinx') LIMIT 10; ... 10 rows in set (0.05 sec) MySQL Sphinx
  • 21. Just like MySQL $ mysql -h 0 -P 9306 Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 1 Server version: 2.1.0-id64-dev (r3028) Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>
  • 22. What is SphinxQL? • SQL-based query language in Sphinx • Works without MySQL! o MySQL client library required  JDCB, .NET, PHP, Ruby • Required different port • Almost same syntax as in MySQL o But not exactly the same
  • 23. SQL & SphinxQL ● WITHIN GROUP ORDER BY ● OPTION support for fine tuning ● weights, matches and query time control ● SHOW META query information ● CALL SNIPPETS let you create snippets ● CALL KEYWORDS for statistics
  • 24. Extended syntax mysql> SELECT …, YEAR(ts) as yr -> FROM sphinx_index -> WHERE MATCH('I love Sphinx') -> GROUP BY yr -> WITHIN GROUP ORDER BY rating DESC -> ORDER BY yr DESC -> LIMIT 5 -> OPTION field_weights=(title=100, content=1); +---------+--------+------------+------------+------+----------+--------+ | id | weight | channel_id | ts | yr | @groupby | @count | +---------+--------+------------+------------+------+----------+--------+ | 7637682 | 101652 | 358842 | 1112905663 | 2005 | 2005 | 14 | | 6598265 | 101612 | 454928 | 1102858275 | 2004 | 2004 | 27 | | 7139960 | 1642 | 403287 | 1070220903 | 2003 | 2003 | 8 | | 5340114 | 1612 | 537694 | 1020213442 | 2002 | 2002 | 1 | | 5744405 | 1588 | 507895 | 995415111 | 2001 | 2001 | 1 | +---------+--------+------------+------------+------+----------+--------+ 5 rows in set (0.00 sec)
  • 25. GEO-Distance support • Bumping up local results o Requires coordinates for each document o Two pairs of float values (Latitude, Longitude) • GEODIST(Lat, Long, Lat2, Long2) in Sphinx SELECT *, GEODIST(docs_lat, doc_long, %d1, %d2) as dist, FROM sphinx_index ORDER BY dist DESC LIMIT 0, 20
  • 26. Range support • Price ranges (items, offers) • Date range (blog posts and news articles) • Ratings, review points • INTERVAL(field, x0, x1, …, xN) SELECT INTERVAL(item_price, 0, 20, 50, 90) as range, @count FROM my_sphinx_products GROUP BY range ORDER BY range ASC;
  • 27. Extended services • Drill-down (narrow search, faceted search) • Typos correction • Search string autocompletion • Related documents
  • 28. Misspells correction service • Provides correct search phrase o “Did you mean” service • Allows to replace user’s search on the fly o if we’re sure it’s a typo  “ophone”, “uphone”, etc o Saves time and makes website look smart • Based on your actual database o Effective if you DO have correct words in index
  • 29. Autocompletion service • Suggest search queries as user types o Show most popular queries o Promote searches that leads to desired pages o Might include misspells correction
  • 30. Related search • Improving visitor experience o Providing easier access to useful pages o Keep customer on the website o Increasing sales and server’s load average • Based on documents similarity o Different for shopping items and texts o Ends up in data mining
  • 31. Excerpts (snippets) • BuildExcerpts() or CALL SNIPPETS • Options o before_match (<b>) o after_match (</b>) o chunk_separator (…) o limit o around o force_all_words
  • 32. Installation: How? ● From binary packages ● http://sphinxsearch.com/downloads/ ● From source ● http://sphinxsearch.googlecode.com/svn/ ● configure && make && make install – Make sure to use --enable-id64 – for huge document collection – already included in pre-compiled packages
  • 34. Initial configuration: indexing • Where to look for data? • How to process it? • Where to store index?
  • 35. Where to look for the data? ● MySQL ● PostgreSQL ● MSSQL ● ODBC source ● XML pipe
  • 36. MySQL source source data_source { … sql_query = SELECT id, channel_id, ts, title, content FROM mytable sql_attr_uint = channel_id sql_attr_timestamp = ts … }
  • 37. A complete version source data_source { type = mysql sql_host = localhost sql_user = my_user sql_pass = my****** sql_db = test sql_query_pre = SET NAMES utf8 sql_query = SELECT id, channel_id, ts, title, content FROM mytable WHERE id>=$start and id<=$end sql_attr_uint = channel_id sql_attr_timestamp = ts sql_query_range = SELECT MIN(id), MAX(id) FROM mytable sql_range_step = 1000 }
  • 38. How to process. Index config. index my_sphinx_index { source = data_source path = /my/index/path/idx html_strip = 1 morphology = stem_en stopwords = stopwords.txt charset_type = utf-8 }
  • 39. Indexer configuration indexer { mem_limit = 512M max_iops = 40 max_iosize = 1048576 }
  • 40. Running indexer $ ./indexer my_sphinx_index Sphinx 2.0.2-dev (r2824) Copyright (c) 2001-2010, Andrew Aksyonoff Copyright (c) 2008-2010, Sphinx Technologies Inc (http://sph... using config file './sphinx.conf'... indexing index 'my_sphinx_index'... collected 999944 docs, 1318.1 MB sorted 224.2 Mhits, 100.0% done total 999944 docs, 1318101119 bytes total 158.080 sec, 8338160 bytes/sec, 6325.53 docs/sec total 33 reads, 4.671 sec, 17032.9 kb/call avg, 141.5 msec/call total 361 writes, 20.889 sec, 3566.1 kb/call avg, 57.8 msec/call
  • 41. Index files $ ls -lah idx* -rw-r--r-- 1 vlad vlad 12M 2010-12-22 09:01 idx.spa -rw-r--r-- 1 vlad vlad 334M 2010-12-22 09:01 idx.spd -rw-r--r-- 1 vlad vlad 438 2010-12-22 09:01 idx.sph -rw-r--r-- 1 vlad vlad 13M 2010-12-22 09:01 idx.spi -rw-r--r-- 1 vlad vlad 0 2010-12-22 09:01 idx.spk -rw-r--r-- 1 vlad vlad 0 2011-05-13 09:25 idx.spl -rw-r--r-- 1 vlad vlad 0 2010-12-22 09:01 idx.spm -rw-r--r-- 1 vlad vlad 111M 2010-12-22 09:01 idx.spp -rw-r--r-- 1 vlad vlad 1 2010-12-22 09:01 idx.sps $
  • 42. Next Steps • Run the daemon • Connect to daemon • Run search query
  • 43. Configuring searchd searchd { listen = localhost:9312 listen = localhost:9306:mysql4 query_log = query.log query_log_format = sphinxql pid_file = searchd.pid }
  • 44. Running sphinx daemon! $ ../bin/searchd -c sphinx.conf Sphinx 2.0.2-dev (r2824) Copyright (c) 2001-2010, Andrew Aksyonoff Copyright (c) 2008-2010, Sphinx Technologies Inc (http://sphinxsearch.com) using config file 'sphinx.conf'... listening on 127.0.0.1:9312 listening on 127.0.0.1:9306 precaching index ‘idx' precached 1 indexes in 0.028 sec
  • 45. Connecting to Sphinx • Sphinx API o PHP, Python, Java, Ruby, C is included in distro o .NET, Rails (via Thinking Sphinx) via third party libs • SphinxQL o MySQL-compatible protocol • SphinxSE o Storage engine for MySQL
  • 46. Sphinx applications • Find relevant documents o Items in store(s) o Articles in blog/forum/news/etc website(s) o Pictures or photos  By text, description, GEO-data, publish time, etc o Friends  In social networks or dating websites • Offload main database from heavy queries • Build advanced search and search-based services
  • 47. Another way to speed up is scaling • Combine diffrent indexes o Main + Delta o Ondisk + RT o Distributed and local  Don't forget about dist_threads! • Use parallel indexing
  • 49. Bright side of scaling • Faster search • Better load control • Hardware utilization • High availability
  • 50. Dark side • Hardware faults • Network issues • Balancing issues o Search time related to slowest search chunk • Complicated operations
  • 51. ToDo for attendees • Please rate this talk! • Submit your feedback to @vfedorkov • Visit PalominoDB booth! • Andrew Aksenoff’s talk about new Sphinx features, don’t miss it. o Tomorrow, 4:30pm - 5:20pm @ Ballroom E • Questions!
  • 52. Thank you! Twitter @vfedorkov Website: http://palominodb.com