SlideShare a Scribd company logo
Automotive Information Research driven by Apache Solr
Mario-Leander Reimer
Chief Technologist, QAware GmbH
mario-leander.reimer@qaware.de
@LeanderReimer
2
01
Agenda
Reverse Data Engineering and Exploration with MIR
Aftersales Information Research with AIR
Architecture, Requirements, Challenges
Solutions for the Problem of Combinatorial Explosion
Data Consistency and Timeliness
BOM Explosions and Demand Forecasts with ZEBRA
Automotive Information Research driven by Apache Solr
Reverse Data Engineering and Exploration with MIR
5
02
How do we find the originating data silo for the desired data?
System A System B System C System D
Vehicle data
Other data
Where to find the vehicle data?
60 potential systems with 5000 entities.
6
03
How do we find the hidden relations between the systems?
How is the data linked to each other?
400.000 potential relations.
Vehicle data
Other data
System A System B System C System D
Parts
Documents
7
01
Reverse Data Engineering and Analysis with MIR and Solr
MIR manages the meta information, data models and record descriptions about the all
our source systems (RDBMS, XML, SOAP, …)
MIR allows to navigate and search the metadata, easy drill into the metadata using facets
MIR also manages the target data model and Solr schema description
Search
Results
Tree view of
systems, tables
and attributes
Drill down
via facets
Wildcard
Search
Found potential
synonyms for the
chassis number
Aftersales Information Research with AIR
10
01
Find the right information in less than 3 clicks.
The initial situation:
Users had to use up to 7 different applications for their daily work.
Systems were not really integrated nicely.
Finding the correct information was laborious and error prone.
The project vision:
Combine the data into a consistent information network.
Make the information network and its data searchable and navigable.
Replace existing application with one easy to use application.
11
01
12
01
„But Apache Solr is only a full-text search engine. You have
to use an Oracle database for your application data.“
– Anonymous IT person
14
01
Solr outperformed Oracle in query time as well as index size.
SELECT * FROM VEHICLE WHERE VIN='V%'
INFO_TYPE:VEHICLE AND VIN:V*
SELECT * FROM MEASURE WHERE TEXT='engine'
INFO_TYPE:MEASURE AND TEXT:engine
SELECT * FROM VEHICLE WHERE VIN='%X%'
INFO_TYPE:VEHICLE AND VIN:*X*
| 038 ms | 000 ms | 000 ms
| 383 ms | 384 ms | 383 ms
| 092 ms | 000 ms | 000 ms
| 389 ms | 387 ms | 386 ms
| 039 ms | 000 ms | 000 ms
| 859 ms | 379 ms | 383 ms
Disk space: 132 MB Solr vs. 385 MB OracleTest data set: 150.000 records
The dirt race use case:
•No internet connection
•Low-End Devices
16
01
Solr and AIR on Raspberry Pi Model B as PoC worked like a charm!
Running Debian Linux + JDK8
Jetty Servlet Container with the
Solr und AIR web apps deployed
A reduced offline data set with
~1.5 Mio Solr Documents
Model B Hardware Specs:
ARMv6 CPU 700Mhz
512MB RAM
32GB SD Card
And now try this
with Oracle!
17
01
A careful schema design is crucial for your Solr performance.
18
01
Naive denormalization quickly leads to combinatorial explosion!
33.071.137
Vehicles14.830.197
Flat Rate Units
1.678.667
Packages
5.078.411
FRU Groups
18.573
Repair
Instructions
648.129
Technical
Documents
55.000
Parts
648.129
Measures
41.385
Types
6.180
Fault Indications
Relationship
Navigation
19
01
Multi-value typed fields can efficiently store 1..n relations, but
may result in false positives.
{
"INFO_TYPE":"AWPOS_GROUP",
"NUMMER" :[ "1134190" , "1235590" ]
"BAUSTAND" :["1969-12-31T23:00:00Z","1975-12-31T23:00:00Z"]
"E_SERIES" :[ "F10" , "E30" ]
}
In case this doesn‘t matter, perform a post filtering of the results in your application.
Alternative: current Solr versions support nested child documents. Use instead.
Index 0 Index 1
fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:F10
fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:E30
20
01
Technical documents and their validity were expressed and stored
in a binary representation.
Validity expressions may have up to 46 characteristics
Validity expressions use 5 different boolean operators (AND, NOT, …)
Validity expessions can be nested and complex
Some characteristics are dynamic and not even known at index time
The solution: transform the validity expressions into the equivalent
ternary JavaScript terms and evaluate these terms at query time using
a custom function query filter.
21
01
Binary validity expression example.
Type(53078923) = ‚Brand‘, Value(53086475) = ‚BMW PKW‘
Type(53088651) = ‚E-Series‘, Value(53161483) = ‚F10‘
Type(64555275) = ‚Transmission‘, Value(53161483) = ‚MECH‘
22
01
Transformation of the binary validity terms into their JavaScript
equivalent at index time.
((BRAND=='BMW PKW')&&(E_SERIES=='F10')&&(TRANSMISSION=='MECH'))
AND(Brand='BMW PKW', E-Series='F10'‚ Transmission='MECH')
{
"INFO_TYPE": "TECHNISCHES_DOKUMENT",
"DOKUMENT_TITEL": "Getriebe aus- und einbauen",
"DOKUMENT_ART": " reparaturanleitung",
"VALIDITY": "((BRAND=='BMW PKW')&&((E_SERIES=='F10')&&(...))",
„BRAND": [„BMW PKW"]
}
23
01
The JavaScript validity term is evaluated at query time using a
custom function query.
&fq=INFO_TYPE:TECHNISCHES_DOKUMENT
&fq=DOKUMENT_ART:reparaturanleitung
&fq={!frange l=1 u=1 incl=true incu=true cache=false cost=500}
jsTerm(VALIDITY,eyJNT1RPUl9LUkFGVFNUT0ZGQVJUX01PVE9SQVJCRUlUU
1ZFUkZBSFJFTiI6IkIiLCJFX01BU0NISU5FX0tSQUZUU1RPRkZBUlQiOm51bG
wsIlNJQ0hFUkhFSVRTRkFIUlpFVUciOiIwIiwiQU5UUklFQiI6IkFXRCIsIkV
kJBVVJFSUhFIjoiWCcifQ==)
Base64decode
{
"BRAND":"BMW PKW",
"E_SERIES":"F10",
"TRANSMISSION":"MECH"
}
http://qaware.blogspot.de/2014/11/how-to-write-postfilter-for-solr-49.html
24
01
Custom ETL combined with Continuous Delivery and DevOps
ensure data consistency and timeliness.
BOM Explosions and Demand Forecasts with ZEBRA
26
01
Bills of Materials (BOMs) explained
27
01
BOMs are required for …
Production planning Forecasting Demand Scenario-based PlanningSimulations
28
01
The Big Picture of ZEBRA
Parts /
abstract
demands
Orders /
actual
demands
Analytics
BOMs /
dependent
demands
Demand
Resolver
Production
Planning
7 Mio.2 Mio. 21 Mrd.
29
01
The most essential Solr optimizations in ZEBRA
Bulk RequestHandler
Binary DocValue support
Boolean interpreter as postfilter
Mass data binary response format
Search components with custom
JOIN algorithm
Solving thousands of
orders with one request
Be able to store data
effective using our own
JOIN implementation.
Speed up the access to
persisted data dramatically
using binary doc values.
0111 0111
Use the standard Solr cinary
codec with an optimized data-
model that reduce the amount 

of data by a factor of 8.
Computing
BOM
explosions
Enable Solr with custom post filters
to filter documents using stored
boolean expessions.
30
01
Low Level Optimizations can yield great boosts in performance
October 14 January 15 May 15 October 15
4,9 ms 0,28 ms
24 ms
TimetocalculatetheBoMforoneorder
0,08 ms
Scoring (-8%)
Default Query Parser (-25%)
Stat-Cache (-8%)
String DocValues (-28%)
Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time.
X
X
X
X
Solr has become a powerful tool for building enterprise
and data analytics applications. Be creative!
&
Mario-Leander Reimer
Chief Technologist, QAware GmbH
mario-leander.reimer@qaware.de
https://www.qaware.de
https://slideshare.net/MarioLeanderReimer/
https://speakerdeck.com/lreimer/
https://twitter.com/leanderreimer/

More Related Content

PDF
Spark with Cassandra by Christopher Batey
Spark Summit
 
PDF
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
PDF
Apache Cassandra and Python for Analyzing Streaming Big Data
prajods
 
PDF
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Databricks
 
PDF
Druid meetup 4th_sql_on_druid
Yousun Jeong
 
PPTX
Intro to Apache Spark
Mammoth Data
 
PDF
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
PPTX
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Spark with Cassandra by Christopher Batey
Spark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
Apache Cassandra and Python for Analyzing Streaming Big Data
prajods
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Databricks
 
Druid meetup 4th_sql_on_druid
Yousun Jeong
 
Intro to Apache Spark
Mammoth Data
 
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 

What's hot (20)

PDF
Spark Summit - Stratio Streaming
Stratio
 
PDF
SMACK Stack 1.1
Joe Stein
 
PDF
Adding Complex Data to Spark Stack by Tug Grall
Spark Summit
 
PDF
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit
 
PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
PDF
Assessing Graph Solutions for Apache Spark
Databricks
 
PPTX
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
PDF
Engineering fast indexes
Daniel Lemire
 
PPTX
Apache Lens at Hadoop meetup
amarsri
 
PDF
Realtime Reporting using Spark Streaming
Santosh Sahoo
 
PDF
Cassandra & Spark for IoT
Matthias Niehoff
 
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
PDF
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Databricks
 
PDF
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Spark Summit
 
PDF
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
 
PDF
Cassandra spark connector
Duyhai Doan
 
PDF
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Stratio
 
PDF
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Databricks
 
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Spark Summit - Stratio Streaming
Stratio
 
SMACK Stack 1.1
Joe Stein
 
Adding Complex Data to Spark Stack by Tug Grall
Spark Summit
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Assessing Graph Solutions for Apache Spark
Databricks
 
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
Engineering fast indexes
Daniel Lemire
 
Apache Lens at Hadoop meetup
amarsri
 
Realtime Reporting using Spark Streaming
Santosh Sahoo
 
Cassandra & Spark for IoT
Matthias Niehoff
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Databricks
 
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Spark Summit
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
 
Cassandra spark connector
Duyhai Doan
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Stratio
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Databricks
 
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Ad

Viewers also liked (20)

PDF
Vamp - The anti-fragilitiy platform for digital services
QAware GmbH
 
PDF
Azure Functions - Get rid of your servers, use functions!
QAware GmbH
 
PDF
A Hitchhiker's Guide to the Cloud Native Stack
QAware GmbH
 
PDF
Chronix as Long-Term Storage for Prometheus
QAware GmbH
 
PDF
Microservices @ Work - A Practice Report of Developing Microservices
QAware GmbH
 
PDF
Lightweight developer provisioning with gradle and seu as-code
QAware GmbH
 
PDF
JEE on DC/OS - MesosCon Europe
QAware GmbH
 
PDF
Leveraging the Power of Solr with Spark
QAware GmbH
 
PDF
Automotive Information Research driven by Apache Solr
QAware GmbH
 
PDF
Secure Architecture and Programming 101
QAware GmbH
 
PDF
Der Cloud Native Stack in a Nutshell
QAware GmbH
 
PDF
Per Anhalter durch den Cloud Native Stack (extended edition)
QAware GmbH
 
PDF
Developing Skills for Amazon Echo
QAware GmbH
 
PDF
Chronix Time Series Database - The New Time Series Kid on the Block
QAware GmbH
 
PDF
Real Time BOM Explosions with Apache Solr and Spark
QAware GmbH
 
PDF
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
QAware GmbH
 
PDF
Kubernetes 101 and Fun
QAware GmbH
 
PDF
Hands-on K8s: Deployments, Pods and Fun
QAware GmbH
 
PDF
Cloud Native Unleashed
QAware GmbH
 
PDF
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Mario-Leander Reimer
 
Vamp - The anti-fragilitiy platform for digital services
QAware GmbH
 
Azure Functions - Get rid of your servers, use functions!
QAware GmbH
 
A Hitchhiker's Guide to the Cloud Native Stack
QAware GmbH
 
Chronix as Long-Term Storage for Prometheus
QAware GmbH
 
Microservices @ Work - A Practice Report of Developing Microservices
QAware GmbH
 
Lightweight developer provisioning with gradle and seu as-code
QAware GmbH
 
JEE on DC/OS - MesosCon Europe
QAware GmbH
 
Leveraging the Power of Solr with Spark
QAware GmbH
 
Automotive Information Research driven by Apache Solr
QAware GmbH
 
Secure Architecture and Programming 101
QAware GmbH
 
Der Cloud Native Stack in a Nutshell
QAware GmbH
 
Per Anhalter durch den Cloud Native Stack (extended edition)
QAware GmbH
 
Developing Skills for Amazon Echo
QAware GmbH
 
Chronix Time Series Database - The New Time Series Kid on the Block
QAware GmbH
 
Real Time BOM Explosions with Apache Solr and Spark
QAware GmbH
 
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
QAware GmbH
 
Kubernetes 101 and Fun
QAware GmbH
 
Hands-on K8s: Deployments, Pods and Fun
QAware GmbH
 
Cloud Native Unleashed
QAware GmbH
 
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Mario-Leander Reimer
 
Ad

Similar to Automotive Information Research driven by Apache Solr (20)

PDF
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Lucidworks
 
PDF
Search-based business intelligence and reverse data engineering with Apache Solr
Mario-Leander Reimer
 
PDF
20200402 oracle cloud infrastructure data science
Kenichi Sonoda
 
PPTX
MongoDB and the Internet of Things
MongoDB
 
PDF
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
PDF
Azure Machine Learning and Data Journeys
Luca Mauri
 
PDF
Bogdan Kecman INIT Presentation
arhismece
 
PPT
Databaseconcepts
dilipkkr
 
PDF
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Georg Knon
 
PPTX
Integrate Office365 with On-premise ERP
Edwin Kanis
 
PDF
Big Data Expo 2015 - MapR Impacting Business As It Happens
BigDataExpo
 
PDF
Deep_dive_on_Amazon_Neptune_DAT361.pdf
ShaikAsif83
 
PDF
As You Seek – How Search Enables Big Data Analytics
Inside Analysis
 
PDF
Ebs dba con4696_pdf_4696_0001
jucaab
 
PDF
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Sandesh Rao
 
PDF
Spark Summit EU talk by Michael Nitschinger
Spark Summit
 
PPTX
Log I am your father
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark: The Analytics Operating System
Adarsh Pannu
 
PPTX
CCI2017 - Azure Virtual Machine & Networking - Marco Gumini
walk2talk srl
 
PPTX
Turbocharged Data - Leveraging Azure Data Explorer for Real-Time Insights fro...
Callon Campbell
 
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Lucidworks
 
Search-based business intelligence and reverse data engineering with Apache Solr
Mario-Leander Reimer
 
20200402 oracle cloud infrastructure data science
Kenichi Sonoda
 
MongoDB and the Internet of Things
MongoDB
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
Azure Machine Learning and Data Journeys
Luca Mauri
 
Bogdan Kecman INIT Presentation
arhismece
 
Databaseconcepts
dilipkkr
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Georg Knon
 
Integrate Office365 with On-premise ERP
Edwin Kanis
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
BigDataExpo
 
Deep_dive_on_Amazon_Neptune_DAT361.pdf
ShaikAsif83
 
As You Seek – How Search Enables Big Data Analytics
Inside Analysis
 
Ebs dba con4696_pdf_4696_0001
jucaab
 
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Sandesh Rao
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit
 
Log I am your father
DataWorks Summit/Hadoop Summit
 
Apache Spark: The Analytics Operating System
Adarsh Pannu
 
CCI2017 - Azure Virtual Machine & Networking - Marco Gumini
walk2talk srl
 
Turbocharged Data - Leveraging Azure Data Explorer for Real-Time Insights fro...
Callon Campbell
 

More from QAware GmbH (20)

PDF
Frontends mit Hilfe von KI entwickeln.pdf
QAware GmbH
 
PDF
Mit ChatGPT Dinosaurier besiegen - Möglichkeiten und Grenzen von LLM für die ...
QAware GmbH
 
PDF
50 Shades of K8s Autoscaling #JavaLand24.pdf
QAware GmbH
 
PDF
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
QAware GmbH
 
PPTX
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
QAware GmbH
 
PDF
Down the Ivory Tower towards Agile Architecture
QAware GmbH
 
PDF
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
QAware GmbH
 
PDF
Make Developers Fly: Principles for Platform Engineering
QAware GmbH
 
PDF
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
QAware GmbH
 
PDF
Was kommt nach den SPAs
QAware GmbH
 
PDF
Cloud Migration mit KI: der Turbo
QAware GmbH
 
PDF
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
QAware GmbH
 
PDF
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
QAware GmbH
 
PDF
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
QAware GmbH
 
PDF
Kubernetes with Cilium in AWS - Experience Report!
QAware GmbH
 
PDF
50 Shades of K8s Autoscaling
QAware GmbH
 
PDF
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
QAware GmbH
 
PDF
Service Mesh Pain & Gain. Experiences from a client project.
QAware GmbH
 
PDF
50 Shades of K8s Autoscaling
QAware GmbH
 
PDF
Blue turns green! Approaches and technologies for sustainable K8s clusters.
QAware GmbH
 
Frontends mit Hilfe von KI entwickeln.pdf
QAware GmbH
 
Mit ChatGPT Dinosaurier besiegen - Möglichkeiten und Grenzen von LLM für die ...
QAware GmbH
 
50 Shades of K8s Autoscaling #JavaLand24.pdf
QAware GmbH
 
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
QAware GmbH
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
QAware GmbH
 
Down the Ivory Tower towards Agile Architecture
QAware GmbH
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
QAware GmbH
 
Make Developers Fly: Principles for Platform Engineering
QAware GmbH
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
QAware GmbH
 
Was kommt nach den SPAs
QAware GmbH
 
Cloud Migration mit KI: der Turbo
QAware GmbH
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
QAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
QAware GmbH
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
QAware GmbH
 
Kubernetes with Cilium in AWS - Experience Report!
QAware GmbH
 
50 Shades of K8s Autoscaling
QAware GmbH
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
QAware GmbH
 
Service Mesh Pain & Gain. Experiences from a client project.
QAware GmbH
 
50 Shades of K8s Autoscaling
QAware GmbH
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
QAware GmbH
 

Recently uploaded (20)

PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 

Automotive Information Research driven by Apache Solr

  • 1. Automotive Information Research driven by Apache Solr Mario-Leander Reimer Chief Technologist, QAware GmbH [email protected] @LeanderReimer
  • 2. 2 01 Agenda Reverse Data Engineering and Exploration with MIR Aftersales Information Research with AIR Architecture, Requirements, Challenges Solutions for the Problem of Combinatorial Explosion Data Consistency and Timeliness BOM Explosions and Demand Forecasts with ZEBRA
  • 4. Reverse Data Engineering and Exploration with MIR
  • 5. 5 02 How do we find the originating data silo for the desired data? System A System B System C System D Vehicle data Other data Where to find the vehicle data? 60 potential systems with 5000 entities.
  • 6. 6 03 How do we find the hidden relations between the systems? How is the data linked to each other? 400.000 potential relations. Vehicle data Other data System A System B System C System D Parts Documents
  • 7. 7 01 Reverse Data Engineering and Analysis with MIR and Solr MIR manages the meta information, data models and record descriptions about the all our source systems (RDBMS, XML, SOAP, …) MIR allows to navigate and search the metadata, easy drill into the metadata using facets MIR also manages the target data model and Solr schema description
  • 8. Search Results Tree view of systems, tables and attributes Drill down via facets Wildcard Search Found potential synonyms for the chassis number
  • 10. 10 01 Find the right information in less than 3 clicks. The initial situation: Users had to use up to 7 different applications for their daily work. Systems were not really integrated nicely. Finding the correct information was laborious and error prone. The project vision: Combine the data into a consistent information network. Make the information network and its data searchable and navigable. Replace existing application with one easy to use application.
  • 11. 11 01
  • 12. 12 01
  • 13. „But Apache Solr is only a full-text search engine. You have to use an Oracle database for your application data.“ – Anonymous IT person
  • 14. 14 01 Solr outperformed Oracle in query time as well as index size. SELECT * FROM VEHICLE WHERE VIN='V%' INFO_TYPE:VEHICLE AND VIN:V* SELECT * FROM MEASURE WHERE TEXT='engine' INFO_TYPE:MEASURE AND TEXT:engine SELECT * FROM VEHICLE WHERE VIN='%X%' INFO_TYPE:VEHICLE AND VIN:*X* | 038 ms | 000 ms | 000 ms | 383 ms | 384 ms | 383 ms | 092 ms | 000 ms | 000 ms | 389 ms | 387 ms | 386 ms | 039 ms | 000 ms | 000 ms | 859 ms | 379 ms | 383 ms Disk space: 132 MB Solr vs. 385 MB OracleTest data set: 150.000 records
  • 15. The dirt race use case: •No internet connection •Low-End Devices
  • 16. 16 01 Solr and AIR on Raspberry Pi Model B as PoC worked like a charm! Running Debian Linux + JDK8 Jetty Servlet Container with the Solr und AIR web apps deployed A reduced offline data set with ~1.5 Mio Solr Documents Model B Hardware Specs: ARMv6 CPU 700Mhz 512MB RAM 32GB SD Card And now try this with Oracle!
  • 17. 17 01 A careful schema design is crucial for your Solr performance.
  • 18. 18 01 Naive denormalization quickly leads to combinatorial explosion! 33.071.137 Vehicles14.830.197 Flat Rate Units 1.678.667 Packages 5.078.411 FRU Groups 18.573 Repair Instructions 648.129 Technical Documents 55.000 Parts 648.129 Measures 41.385 Types 6.180 Fault Indications Relationship Navigation
  • 19. 19 01 Multi-value typed fields can efficiently store 1..n relations, but may result in false positives. { "INFO_TYPE":"AWPOS_GROUP", "NUMMER" :[ "1134190" , "1235590" ] "BAUSTAND" :["1969-12-31T23:00:00Z","1975-12-31T23:00:00Z"] "E_SERIES" :[ "F10" , "E30" ] } In case this doesn‘t matter, perform a post filtering of the results in your application. Alternative: current Solr versions support nested child documents. Use instead. Index 0 Index 1 fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:F10 fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:E30
  • 20. 20 01 Technical documents and their validity were expressed and stored in a binary representation. Validity expressions may have up to 46 characteristics Validity expressions use 5 different boolean operators (AND, NOT, …) Validity expessions can be nested and complex Some characteristics are dynamic and not even known at index time The solution: transform the validity expressions into the equivalent ternary JavaScript terms and evaluate these terms at query time using a custom function query filter.
  • 21. 21 01 Binary validity expression example. Type(53078923) = ‚Brand‘, Value(53086475) = ‚BMW PKW‘ Type(53088651) = ‚E-Series‘, Value(53161483) = ‚F10‘ Type(64555275) = ‚Transmission‘, Value(53161483) = ‚MECH‘
  • 22. 22 01 Transformation of the binary validity terms into their JavaScript equivalent at index time. ((BRAND=='BMW PKW')&&(E_SERIES=='F10')&&(TRANSMISSION=='MECH')) AND(Brand='BMW PKW', E-Series='F10'‚ Transmission='MECH') { "INFO_TYPE": "TECHNISCHES_DOKUMENT", "DOKUMENT_TITEL": "Getriebe aus- und einbauen", "DOKUMENT_ART": " reparaturanleitung", "VALIDITY": "((BRAND=='BMW PKW')&&((E_SERIES=='F10')&&(...))", „BRAND": [„BMW PKW"] }
  • 23. 23 01 The JavaScript validity term is evaluated at query time using a custom function query. &fq=INFO_TYPE:TECHNISCHES_DOKUMENT &fq=DOKUMENT_ART:reparaturanleitung &fq={!frange l=1 u=1 incl=true incu=true cache=false cost=500} jsTerm(VALIDITY,eyJNT1RPUl9LUkFGVFNUT0ZGQVJUX01PVE9SQVJCRUlUU 1ZFUkZBSFJFTiI6IkIiLCJFX01BU0NISU5FX0tSQUZUU1RPRkZBUlQiOm51bG wsIlNJQ0hFUkhFSVRTRkFIUlpFVUciOiIwIiwiQU5UUklFQiI6IkFXRCIsIkV kJBVVJFSUhFIjoiWCcifQ==) Base64decode { "BRAND":"BMW PKW", "E_SERIES":"F10", "TRANSMISSION":"MECH" } http://qaware.blogspot.de/2014/11/how-to-write-postfilter-for-solr-49.html
  • 24. 24 01 Custom ETL combined with Continuous Delivery and DevOps ensure data consistency and timeliness.
  • 25. BOM Explosions and Demand Forecasts with ZEBRA
  • 26. 26 01 Bills of Materials (BOMs) explained
  • 27. 27 01 BOMs are required for … Production planning Forecasting Demand Scenario-based PlanningSimulations
  • 28. 28 01 The Big Picture of ZEBRA Parts / abstract demands Orders / actual demands Analytics BOMs / dependent demands Demand Resolver Production Planning 7 Mio.2 Mio. 21 Mrd.
  • 29. 29 01 The most essential Solr optimizations in ZEBRA Bulk RequestHandler Binary DocValue support Boolean interpreter as postfilter Mass data binary response format Search components with custom JOIN algorithm Solving thousands of orders with one request Be able to store data effective using our own JOIN implementation. Speed up the access to persisted data dramatically using binary doc values. 0111 0111 Use the standard Solr cinary codec with an optimized data- model that reduce the amount 
 of data by a factor of 8. Computing BOM explosions Enable Solr with custom post filters to filter documents using stored boolean expessions.
  • 30. 30 01 Low Level Optimizations can yield great boosts in performance October 14 January 15 May 15 October 15 4,9 ms 0,28 ms 24 ms TimetocalculatetheBoMforoneorder 0,08 ms Scoring (-8%) Default Query Parser (-25%) Stat-Cache (-8%) String DocValues (-28%) Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time. X X X X
  • 31. Solr has become a powerful tool for building enterprise and data analytics applications. Be creative!
  • 32. & Mario-Leander Reimer Chief Technologist, QAware GmbH [email protected] https://www.qaware.de https://slideshare.net/MarioLeanderReimer/ https://speakerdeck.com/lreimer/ https://twitter.com/leanderreimer/