SlideShare a Scribd company logo
PgConf EU 2014 presents 
Javier Ramirez 
* in * 
PostgreSQL 
Full-text search 
demystified 
@supercoco9 
https://teowaki.com
The problem
our architecture
Postgresql search demystified
One does not simply 
SELECT * from stuff where 
content ilike '%postgresql%'
Postgresql search demystified
Postgresql search demystified
Basic search features 
* stemmers (run, runner, running) 
* unaccented (josé, jose) 
* results highlighting 
* rank results by relevance
Nice to have features 
* partial searches 
* search operators (OR, AND...) 
* synonyms (postgres, postgresql, pgsql) 
* thesaurus (OS=Operating System) 
* fast, and space-efficient 
* debugging
Good News: 
PostgreSQL supports all 
the requested features
Bad News: 
unless you already know about search 
engines, the official docs are not obvious
How a search engine works 
* An indexing phase 
* A search phase
The indexing phase 
Convert the input text to tokens
The search phase 
Match the search terms to 
the indexed tokens
indexing in depth 
* choose an index format 
* tokenize the words 
* apply token analysis/filters 
* discard unwanted tokens
the index format 
* r-tree (GIST in PostgreSQL) 
* inverse indexes (GIN in PostgreSQL) 
* dynamic/distributed indexes
dynamic indexes: segmentation 
* sometimes the token index is 
segmented to allow faster updates 
* consolidate segments to speed-up 
search and account for deletions
tokenizing 
* parse/strip/convert format 
* normalize terms (unaccent, ascii, 
charsets, case folding, number precision..)
token analysis/filters 
* find synonyms 
* expand thesaurus 
* stem (maybe in different languages)
more token analysis/filters 
* eliminate stopwords 
* store word distance/frequency 
* store the full contents of some fields 
* store some fields as attributes/facets
“the index file” is really 
* a token file, probably segmented/distributed 
* some dictionary files: synonyms, thesaurus, 
stopwords, stems/lexems (in different languages) 
* word distance/frequency info 
* attributes/original field files 
* optional geospatial index 
* auxiliary files: word/sentence boundaries, meta-info, 
parser definitions, datasource definitions...
the hardest 
part is now 
over
searching in depth 
* tokenize/analyse 
* prepare operators 
* retrieve information 
* rank the results 
* highlight the matched parts
searching in depth: tokenize 
normalize, tokenize, and analyse 
the original search term 
the result would be a tokenized, stemmed, 
“synonymised” term, without stopwords
searching in depth: operators 
* partial search 
* logical/geospatial/range operators 
* in-sentence/in-paragraph/word distance 
* faceting/grouping
searching in depth: retrieval 
Go through the token index files, use the 
attributes and geospatial files if necessary 
for operators and/or grouping 
You might need to do this in a distributed way
searching in depth: ranking 
algorithm to sort the most relevant results: 
* field weights 
* word frequency/density 
* geospatial or timestamp ranking 
* ad-hoc ranking strategies
searching in depth: highlighting 
Mark the matching parts of the results 
It can be tricky/slow if you are not storing the full contents 
in your indexes
PostgreSQL as a 
full-text 
search engine
search features 
* index format configuration 
* partial search 
* word boundaries parser (not configurable) 
* stemmers/synonyms/thesaurus/stopwords 
* full-text logical operators 
* attributes/geo/timestamp/range (using SQL) 
* ranking strategies 
* highlighting 
* debugging/testing commands
indexing in postgresql 
you don't actually need an index to use full-text search in PostgreSQL 
but unless your db is very small, you want to have one 
Choose GIST or GIN (faster search, slower indexing, 
larger index size) 
CREATE INDEX pgweb_idx ON pgweb USING 
gin(to_tsvector(config_name, body));
Two new things 
CREATE INDEX ... USING gin(to_tsvector (config_name, body)); 
* to_tsvector: postgresql way of saying “tokenize” 
* config_name: tokenizing/analysis rule set
Configuration 
CREATE TEXT SEARCH CONFIGURATION 
public.teowaki ( COPY = pg_catalog.english );
Configuration 
CREATE TEXT SEARCH DICTIONARY english_ispell ( 
TEMPLATE = ispell, 
DictFile = en_us, 
AffFile = en_us, 
StopWords = spanglish 
); 
CREATE TEXT SEARCH DICTIONARY spanish_ispell ( 
TEMPLATE = ispell, 
DictFile = es_any, 
AffFile = es_any, 
StopWords = spanish 
);
Configuration 
CREATE TEXT SEARCH DICTIONARY english_stem ( 
TEMPLATE = snowball, 
Language = english, 
StopWords = english 
); 
CREATE TEXT SEARCH DICTIONARY spanish_stem ( 
TEMPLATE= snowball, 
Language = spanish, 
Stopwords = spanish 
);
Configuration 
Parser. 
Word boundaries
Configuration 
Assign dictionaries (in specific to generic order) 
ALTER TEXT SEARCH CONFIGURATION teowaki 
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, 
hword_part 
WITH english_ispell, spanish_ispell, spanish_stem, unaccent, english_stem; 
ALTER TEXT SEARCH CONFIGURATION teowaki 
DROP MAPPING FOR email, url, url_path, sfloat, float;
debugging 
select * from ts_debug('teowaki', 'I am searching unas 
b squedas ú con postgresql database'); 
also ts_lexize and ts_parser
tokenizing 
tokens + position (stopwords are removed, tokens are folded)
searching 
SELECT guid, description from wakis where 
to_tsvector('teowaki',description) 
@@ to_tsquery('teowaki','postgres');
searching 
SELECT guid, description from wakis where 
to_tsvector('teowaki',description) 
@@ to_tsquery('teowaki','postgres:*');
operators 
SELECT guid, description from wakis where 
to_tsvector('teowaki',description) 
@@ to_tsquery('teowaki','postgres | mysql');
ranking weights 
SELECT setweight(to_tsvector(coalesce(name,'')),'A') || 
setweight(to_tsvector(coalesce(description,'')),'B') 
from wakis limit 1;
search by weight
ranking 
SELECT name, ts_rank(to_tsvector(name), query) rank 
from wakis, to_tsquery('postgres | indexes') query 
where to_tsvector(name) @@ query order by rank DESC; 
also ts_rank_cd
highlighting 
SELECT ts_headline(name, query) from wakis, 
to_tsquery('teowaki', 'game|play') query 
where to_tsvector('teowaki', name) @@ query;
USE POSTGRESQL 
FOR EVERYTHING
When PostgreSQL is not good 
* You need to index files (PDF, Odx...) 
* Your index is very big (slow reindex) 
* You need a distributed index 
* You need complex tokenizers 
* You need advanced rankers
When PostgreSQL is not good 
* You want a REST API 
* You want sentence/ proximity/ range/ 
more complex operators 
* You want search auto completion 
* You want advanced features (alerts...)
But it has been 
perfect for us so far. 
Our users don't care 
which search engine 
we use, as long as 
it works.
PgConf EU 2014 presents 
Javier Ramirez 
* in * 
PostgreSQL 
Full-text search 
demystified 
@supercoco9 
https://teowaki.com

More Related Content

What's hot (20)

PDF
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
PPTX
MongoDB-SESSION03
Jainul Musani
 
PDF
[2 d1] elasticsearch 성능 최적화
Henry Jeong
 
PPTX
Getting started with Elasticsearch and .NET
Tomas Jansson
 
PPTX
Morphia: Simplifying Persistence for Java and MongoDB
Jeff Yemin
 
PDF
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Ontico
 
PPT
Fast querying indexing for performance (4)
MongoDB
 
PDF
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
PDF
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
Donghyeok Kang
 
PDF
How to Use JSON in MySQL Wrong
Karwin Software Solutions LLC
 
PPTX
Webinar: Index Tuning and Evaluation
MongoDB
 
PDF
Elastic search 검색
HyeonSeok Choi
 
PDF
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Ontico
 
PPTX
Indexing and Query Optimization
MongoDB
 
PDF
Spark with Elasticsearch - umd version 2014
Holden Karau
 
PPTX
Indexing & Query Optimization
MongoDB
 
PDF
Ts archiving
Confiz
 
PDF
MongoDB World 2016: Deciphering .explain() Output
MongoDB
 
PDF
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
NAVER D2
 
PDF
Mastering PostgreSQL Administration
EDB
 
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
MongoDB-SESSION03
Jainul Musani
 
[2 d1] elasticsearch 성능 최적화
Henry Jeong
 
Getting started with Elasticsearch and .NET
Tomas Jansson
 
Morphia: Simplifying Persistence for Java and MongoDB
Jeff Yemin
 
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Ontico
 
Fast querying indexing for performance (4)
MongoDB
 
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
Donghyeok Kang
 
How to Use JSON in MySQL Wrong
Karwin Software Solutions LLC
 
Webinar: Index Tuning and Evaluation
MongoDB
 
Elastic search 검색
HyeonSeok Choi
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Ontico
 
Indexing and Query Optimization
MongoDB
 
Spark with Elasticsearch - umd version 2014
Holden Karau
 
Indexing & Query Optimization
MongoDB
 
Ts archiving
Confiz
 
MongoDB World 2016: Deciphering .explain() Output
MongoDB
 
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
NAVER D2
 
Mastering PostgreSQL Administration
EDB
 

Similar to Postgresql search demystified (20)

PDF
Pgbr 2013 fts
Emanuel Calvo
 
PDF
Full Text Search in PostgreSQL
Aleksander Alekseev
 
PDF
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Ontico
 
PPTX
PostgreSQL - It's kind've a nifty database
Barry Jones
 
PDF
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
Emanuel Calvo
 
PDF
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Jamey Hanson
 
PDF
PostgreSQL and Sphinx pgcon 2013
Emanuel Calvo
 
PDF
Better Full Text Search in PostgreSQL
Artur Zakirov
 
PPTX
Full Text search in Django with Postgres
syerram
 
PDF
What is the best full text search engine for Python?
Andrii Soldatenko
 
PDF
fts.pdf
AltairFonseca3
 
PDF
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Ruby Meditation
 
PDF
PyCon Russian 2015 - Dive into full text search with python.
Andrii Soldatenko
 
PDF
The State of (Full) Text Search in PostgreSQL 12
Jimmy Angelakos
 
PDF
Of Haystacks And Needles
ZendCon
 
PDF
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
it-people
 
PDF
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
PDF
nGram full text search (by 이성욱)
I Goo Lee.
 
PDF
Accelerating Local Search with PostgreSQL (KNN-Search)
Jonathan Katz
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
Pgbr 2013 fts
Emanuel Calvo
 
Full Text Search in PostgreSQL
Aleksander Alekseev
 
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Ontico
 
PostgreSQL - It's kind've a nifty database
Barry Jones
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
Emanuel Calvo
 
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Jamey Hanson
 
PostgreSQL and Sphinx pgcon 2013
Emanuel Calvo
 
Better Full Text Search in PostgreSQL
Artur Zakirov
 
Full Text search in Django with Postgres
syerram
 
What is the best full text search engine for Python?
Andrii Soldatenko
 
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Ruby Meditation
 
PyCon Russian 2015 - Dive into full text search with python.
Andrii Soldatenko
 
The State of (Full) Text Search in PostgreSQL 12
Jimmy Angelakos
 
Of Haystacks And Needles
ZendCon
 
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
it-people
 
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
nGram full text search (by 이성욱)
I Goo Lee.
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Jonathan Katz
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
Ad

More from javier ramirez (20)

PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
PDF
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
PDF
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
PDF
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
PDF
¿Se puede vivir del open source? T3chfest
javier ramirez
 
PDF
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
 
PDF
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
PDF
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
PDF
Your Database Cannot Do this (well)
javier ramirez
 
PDF
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
PDF
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
PDF
QuestDB-Community-Call-20220728
javier ramirez
 
PDF
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
 
PDF
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
PDF
Servicios e infraestructura de AWS y la próxima región en Aragón
javier ramirez
 
PPTX
Primeros pasos en desarrollo serverless
javier ramirez
 
PDF
How AWS is reinventing the cloud
javier ramirez
 
PDF
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
 
PDF
Getting started with streaming analytics
javier ramirez
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
¿Se puede vivir del open source? T3chfest
javier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
Your Database Cannot Do this (well)
javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
QuestDB-Community-Call-20220728
javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
javier ramirez
 
Primeros pasos en desarrollo serverless
javier ramirez
 
How AWS is reinventing the cloud
javier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
 
Getting started with streaming analytics
javier ramirez
 
Ad

Recently uploaded (20)

PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PPTX
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 

Postgresql search demystified

  • 1. PgConf EU 2014 presents Javier Ramirez * in * PostgreSQL Full-text search demystified @supercoco9 https://teowaki.com
  • 5. One does not simply SELECT * from stuff where content ilike '%postgresql%'
  • 8. Basic search features * stemmers (run, runner, running) * unaccented (josé, jose) * results highlighting * rank results by relevance
  • 9. Nice to have features * partial searches * search operators (OR, AND...) * synonyms (postgres, postgresql, pgsql) * thesaurus (OS=Operating System) * fast, and space-efficient * debugging
  • 10. Good News: PostgreSQL supports all the requested features
  • 11. Bad News: unless you already know about search engines, the official docs are not obvious
  • 12. How a search engine works * An indexing phase * A search phase
  • 13. The indexing phase Convert the input text to tokens
  • 14. The search phase Match the search terms to the indexed tokens
  • 15. indexing in depth * choose an index format * tokenize the words * apply token analysis/filters * discard unwanted tokens
  • 16. the index format * r-tree (GIST in PostgreSQL) * inverse indexes (GIN in PostgreSQL) * dynamic/distributed indexes
  • 17. dynamic indexes: segmentation * sometimes the token index is segmented to allow faster updates * consolidate segments to speed-up search and account for deletions
  • 18. tokenizing * parse/strip/convert format * normalize terms (unaccent, ascii, charsets, case folding, number precision..)
  • 19. token analysis/filters * find synonyms * expand thesaurus * stem (maybe in different languages)
  • 20. more token analysis/filters * eliminate stopwords * store word distance/frequency * store the full contents of some fields * store some fields as attributes/facets
  • 21. “the index file” is really * a token file, probably segmented/distributed * some dictionary files: synonyms, thesaurus, stopwords, stems/lexems (in different languages) * word distance/frequency info * attributes/original field files * optional geospatial index * auxiliary files: word/sentence boundaries, meta-info, parser definitions, datasource definitions...
  • 22. the hardest part is now over
  • 23. searching in depth * tokenize/analyse * prepare operators * retrieve information * rank the results * highlight the matched parts
  • 24. searching in depth: tokenize normalize, tokenize, and analyse the original search term the result would be a tokenized, stemmed, “synonymised” term, without stopwords
  • 25. searching in depth: operators * partial search * logical/geospatial/range operators * in-sentence/in-paragraph/word distance * faceting/grouping
  • 26. searching in depth: retrieval Go through the token index files, use the attributes and geospatial files if necessary for operators and/or grouping You might need to do this in a distributed way
  • 27. searching in depth: ranking algorithm to sort the most relevant results: * field weights * word frequency/density * geospatial or timestamp ranking * ad-hoc ranking strategies
  • 28. searching in depth: highlighting Mark the matching parts of the results It can be tricky/slow if you are not storing the full contents in your indexes
  • 29. PostgreSQL as a full-text search engine
  • 30. search features * index format configuration * partial search * word boundaries parser (not configurable) * stemmers/synonyms/thesaurus/stopwords * full-text logical operators * attributes/geo/timestamp/range (using SQL) * ranking strategies * highlighting * debugging/testing commands
  • 31. indexing in postgresql you don't actually need an index to use full-text search in PostgreSQL but unless your db is very small, you want to have one Choose GIST or GIN (faster search, slower indexing, larger index size) CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(config_name, body));
  • 32. Two new things CREATE INDEX ... USING gin(to_tsvector (config_name, body)); * to_tsvector: postgresql way of saying “tokenize” * config_name: tokenizing/analysis rule set
  • 33. Configuration CREATE TEXT SEARCH CONFIGURATION public.teowaki ( COPY = pg_catalog.english );
  • 34. Configuration CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = en_us, AffFile = en_us, StopWords = spanglish ); CREATE TEXT SEARCH DICTIONARY spanish_ispell ( TEMPLATE = ispell, DictFile = es_any, AffFile = es_any, StopWords = spanish );
  • 35. Configuration CREATE TEXT SEARCH DICTIONARY english_stem ( TEMPLATE = snowball, Language = english, StopWords = english ); CREATE TEXT SEARCH DICTIONARY spanish_stem ( TEMPLATE= snowball, Language = spanish, Stopwords = spanish );
  • 37. Configuration Assign dictionaries (in specific to generic order) ALTER TEXT SEARCH CONFIGURATION teowaki ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH english_ispell, spanish_ispell, spanish_stem, unaccent, english_stem; ALTER TEXT SEARCH CONFIGURATION teowaki DROP MAPPING FOR email, url, url_path, sfloat, float;
  • 38. debugging select * from ts_debug('teowaki', 'I am searching unas b squedas ú con postgresql database'); also ts_lexize and ts_parser
  • 39. tokenizing tokens + position (stopwords are removed, tokens are folded)
  • 40. searching SELECT guid, description from wakis where to_tsvector('teowaki',description) @@ to_tsquery('teowaki','postgres');
  • 41. searching SELECT guid, description from wakis where to_tsvector('teowaki',description) @@ to_tsquery('teowaki','postgres:*');
  • 42. operators SELECT guid, description from wakis where to_tsvector('teowaki',description) @@ to_tsquery('teowaki','postgres | mysql');
  • 43. ranking weights SELECT setweight(to_tsvector(coalesce(name,'')),'A') || setweight(to_tsvector(coalesce(description,'')),'B') from wakis limit 1;
  • 45. ranking SELECT name, ts_rank(to_tsvector(name), query) rank from wakis, to_tsquery('postgres | indexes') query where to_tsvector(name) @@ query order by rank DESC; also ts_rank_cd
  • 46. highlighting SELECT ts_headline(name, query) from wakis, to_tsquery('teowaki', 'game|play') query where to_tsvector('teowaki', name) @@ query;
  • 47. USE POSTGRESQL FOR EVERYTHING
  • 48. When PostgreSQL is not good * You need to index files (PDF, Odx...) * Your index is very big (slow reindex) * You need a distributed index * You need complex tokenizers * You need advanced rankers
  • 49. When PostgreSQL is not good * You want a REST API * You want sentence/ proximity/ range/ more complex operators * You want search auto completion * You want advanced features (alerts...)
  • 50. But it has been perfect for us so far. Our users don't care which search engine we use, as long as it works.
  • 51. PgConf EU 2014 presents Javier Ramirez * in * PostgreSQL Full-text search demystified @supercoco9 https://teowaki.com