SlideShare a Scribd company logo
Session ID:
Prepared by:
Remember to complete your evaluation for this session within the app!
1495
Build a DataWarehouse
for your (alert!) logs
With Python, AWS Athena
and AWS Glue
Wednesday, April 25 2018
Maxym Kharchenko
Sr. Database Engineer
Amazon.com
whoami
• Sr Database Engineer @amazon.com Big Data Technologies team
• Developer <-> DBA
• OCM, ACE Associate, AWS Developer (all “alumni”)
• I have stickers!
Agenda
• Why query (alert) logs with SQL
• How to query (alert) logs with SQL
• How to make it easy and efficient with AWS Athena and Glue
• Demo
Logs are the best operational data
about your system
Logs are great at simple ”tactical” questions
“Why did my query fail at 17:17 yesterday ?”
Sun Feb 11 17:17:04 2018
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 16:
‘/ora02/database/mydb/tbs12mydb_01.dbf'
“Why am I missing today’s partition ?”
Thu Jan 11 11:40:55 2018
Errors in file /logs/mydb/trace/mydb-36_j005_38530.trc:
ORA-12012: error on auto execute of job
"PART_ADMIN"."CREATE_PARTITION”
ORA-00028: your session has been killed
mydb
alert.log
But not so great when questions get “broader”
“Did the last patch solve our problem ?
> grep ORA-28 alert.log
opiodr aborting process unknown ospid (3411) as a result
of ORA-28
opiodr aborting process unknown ospid (65973) as a
result of ORA-28
opiodr aborting process unknown ospid (56719) as a
result of ORA-28
opiodr aborting process unknown ospid (129663) as a
result of ORA-28
opiodr aborting process unknown ospid (11260) as a
result of ORA-28
opiodr aborting process unknown ospid (22534) as a
result of ORA-28
mydb
alert.log
Or when analyzing multiple logs
“What is the timeline
of the latest cluster lockup issue ?”
Wed May 24 11:17:10 2017
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Wed May 24 11:17:17 2017
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Wed May 24 11:17:28 2017
Post SMON to start 1st pass IR Fix write in gcs resources
Reconfiguration complete
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
Or when correlating data across different logs
“Are we seeing more node crashes because of
- Disk malfunctions ?
- ASM issues ?
- Network disconnects ?
”
> grep “WARNING: inbound connection timed out”
alert*.log
> grep “corrupted block” asm*.log
> grep -P “failed|error|critical” kern*.log
> grep -P “long wait|error|disconnect” tnsping*.log
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 18 18
Or when looking for trends
“Has the rate of network disconnects
increased over the last 6 months ?”
“What databases have the highest archived
log switch rate?”
“Do we see more problems in specific
datacenter locations ?”
“Are there times of the day with almost no
user activity ?”
Logs are not exactly easy to query
(in bulk)
If only there was a simpler way
to query all my logs …
SELECT trunc(event_time, ‘DD’), db, count(1) AS errors
FROM “all my logs”
WHERE event_time > sysdate – interval ‘90’ days
AND (
message LIKE ‘%ORA-00028%’
OR
message LIKE ‘%ORA-28%’
)
GROUP BY trunc(event_time, ‘DD’), db
ORDER BY 1,2
/
How to query
(application, db, …) logs
with SQL
Is it even possible
to query “unstructured text” with SQL ?
SQL Engines!
“Table”
• Linux “directory”
• HDFS “folder”
• Cloud storage “folder”
Log files
(aka: “text”)
?
How to make logs “queriable”
1. Structur-ize
2. Table-ize
3. Transform and Compact-ize
Step 1: Structur-ize
”Raw” logs
(i.e. alert_db.log)
“Structured”
(i.e. JSON) logs
Step 1: Find “structure” in logs
Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port
Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off
Thu Jan 11 17:15:54 2018
Thread 32 advanced to log sequence 34018 (LGWR switch)
Current log# 251 seq# 34018 mem# 0: +DG1/mydb-1/onlinelog/group_12.384.931698439
Thu Jan 11 17:16:25 2018
Unable to create archive log file ‘+DG1’
ARC1: Error 19504 Creating archive log file to ‘+DG1’
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance mydb-1 - Archival Error
ORA-16038: log 12 sequence# 34017 cannot be archived
ORA-19504: failed to create file "”
ORA-00312: online log 254 thread 32: ‘+DG1/mydb-1/onlinelog/group_12.593.933491557'
Step 1: Find “structure” in logs
Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port
Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off
Thu Jan 11 17:15:54 2018
Thread 32 advanced to log sequence 34018 (LGWR switch)
Current log# 251 seq# 34018 mem# 0: +DG1/mydb-1/onlinelog/group_12.384.931698439
Thu Jan 11 17:16:25 2018
Unable to create archive log file ‘+DG1’
ARC1: Error 19504 Creating archive log file to ‘+DG1’
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance mydb-1 - Archival Error
ORA-16038: log 12 sequence# 34017 cannot be archived
ORA-19504: failed to create file "”
ORA-00312: online log 254 thread 32: ‘+DG1/mydb-1/onlinelog/group_12.593.933491557'
Step 1: Make log structure explicit
#! /usr/bin/env python
import json, re, sys
# Line format: <timestamp> <message>
# i.e. Jan 11 20:30:59 kernel: [185012.404818] sd 2:0:1:168: [sdgfp]
LINE_FORMAT = re.compile("^(w+s+d+s+d+:d+:d+)s+(.*)$")
for line in sys.stdin:
matched = LINE_FORMAT.match(line)
if matched:
# print ",".join(matched.groups())
print json.dumps( 
dict(zip(("event_time", "message"), matched.groups()))
)
Step 1: Make log structure explicit
Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port
Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off
{
"message": "host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk",
"event_time": "Jan 11 20:30:59”
}
{
"message": "host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk",
"event_time": "Jan 11 20:30:59”
}
Step 1: Make log structure explicit
Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk
Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port
Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off
{
"message": "host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk",
"event_time": ”2018-01-11 20:30:59.000”
}
{
"message": "host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk",
"event_time": ”2018-01-11 20:30:59.000”
}
Step 2: Table-ize
Table “directory”
Table
“Metadata”
CREATE TABLE …
“Structured”
(i.e. JSON) logs
Step 2: Create table and “ingest” data
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable (
`event_time` timestamp,
`message` string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe’
LOCATION 's3://databucket/mydb/mytable/’
;
> cp log*.json /data/mydb/mytable
> hadoop fs -cp log*.json hdfs:/data/mydb/mytable
> aws s3 cp log*.json s3:/databucket//mydb/mytable
Step 3: Transform (into final form)
• Rollup
• Aggregations
• Materializing complex joins
• Partitioning
Step 3: Compact-ize
“Scan all files!”
Open data formats
TSV
• Text based
• Row-oriented
• Some compression
• Limited filtering
• Easy to make
• Binary
• Columnar
• Really good compression
• Advanced filtering
• More difficult to make
Step 3: Transform and Compact-ize
JSON logs PARQUET
Logs
• Format Transform
• SQL Transform
Step 4: Query
PARQUET
Logs
The SQL-on-logs pipeline
Staging
table(s)
”Raw” logs Structured
logs
Final
table(s)
Step 5: Make it simple with AWS
”Raw” logs Structured
logs
“Staging”
S3 bucket
“Final”
S3 bucket
AWS Glue AWS Athena
AWS Athena
• ”Query data in S3 using SQL”
• Serverless Presto cluster
• Rich SQL
• Supports multiple open data formats
• Fast, interactive performance
AWS Glue
• ”Prepare and load data (ETL!)”
• Serverless Apache Spark
• Crawlers: ”data discovery” and
automatic catalog maintenance
• Job scheduling
• Integrated with many data
“sources” and “sinks”
• ETL script generation (or BYO)
Demo time
Extending
SQL-on-logs pipeline
Pre-parse logs in the cloud
S3: “Staging”
(JSON)
S3: “Final”
(Parquet)
Glue
to_parquet()
Athena
“Raw” logs
S3:
“Raw”
logs
Lambda
to_json()
Build materialized views
S3: “Staging”
(JSON)
S3: “Final”
(Parquet)
Glue
to_parquet()
Athena
“Raw” logs
S3:
“Raw”
logs
Lambda
to_json()
Glue:
make_mview()
Use different SQL front-ends
S3: “Staging”
(JSON)
S3: “Final”
(Parquet)
Glue
to_parquet()
Athena
“Raw” logs
S3:
“Raw”
logs
Lambda
to_json()
to_redshift() Redshift
to_oracle() RDS ORACLE
Session ID:
Remember to complete your evaluation for this session within the app!
1495
Thank you!
maxym@amazon.com

More Related Content

PPT
Troubleshooting SQL Server 2000 Virtual Server /Service Pack ...
webhostingguy
 
PDF
Manual Tecnico OGG Oracle to MySQL
Erick Vidbaz
 
PDF
Profiling the logwriter and database writer
Kyle Hailey
 
PPTX
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...
Andrejs Prokopjevs
 
PDF
Let your DBAs get some REST(api)
Ludovico Caldara
 
PDF
Long live to CMAN!
Ludovico Caldara
 
PDF
Oracle Database on Docker
Franck Pachot
 
Troubleshooting SQL Server 2000 Virtual Server /Service Pack ...
webhostingguy
 
Manual Tecnico OGG Oracle to MySQL
Erick Vidbaz
 
Profiling the logwriter and database writer
Kyle Hailey
 
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...
Andrejs Prokopjevs
 
Let your DBAs get some REST(api)
Ludovico Caldara
 
Long live to CMAN!
Ludovico Caldara
 
Oracle Database on Docker
Franck Pachot
 

What's hot (19)

PPTX
ProxySQL & PXC(Query routing and Failover Test)
YoungHeon (Roy) Kim
 
KEY
Varnish @ Velocity Ignite
Artur Bergman
 
PDF
Replication skeptic
Giuseppe Maxia
 
PDF
Integration of neutron, nova and designate how to use it and how to configur...
Miguel Lavalle
 
PDF
DB エンジニアのマイクロサービス入門〜Oracle Database と Docker ではじめる API サービス〜
Michitoshi Yoshida
 
PDF
ClickHouse Monitoring 101: What to monitor and how
Altinity Ltd
 
PPT
Oracle 10g Performance: chapter 09 enqueues
Kyle Hailey
 
PPT
Jurijs Velikanovs - RAC Attack 101 - How to install 12c RAC on your laptop
Andrejs Vorobjovs
 
PPTX
MySQL Audit using Percona audit plugin and ELK
YoungHeon (Roy) Kim
 
PPT
UKOUG, Oracle Transaction Locks
Kyle Hailey
 
PPTX
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
Dave Stokes
 
PDF
2017 DNSSEC KSK Rollover
APNIC
 
PDF
Rolling the Root KSK
APNIC
 
PPT
Inside rac
Shakti Singh
 
PDF
State of The Dolphin - May 2021
Frederic Descamps
 
PPTX
DataStax: An Introduction to DataStax Enterprise Search
DataStax Academy
 
PDF
Phd tutorial hawq_v0.1
seungdon Choi
 
PDF
Dsi 11g convert_to RAC
Anil Kumar
 
PDF
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
ProxySQL & PXC(Query routing and Failover Test)
YoungHeon (Roy) Kim
 
Varnish @ Velocity Ignite
Artur Bergman
 
Replication skeptic
Giuseppe Maxia
 
Integration of neutron, nova and designate how to use it and how to configur...
Miguel Lavalle
 
DB エンジニアのマイクロサービス入門〜Oracle Database と Docker ではじめる API サービス〜
Michitoshi Yoshida
 
ClickHouse Monitoring 101: What to monitor and how
Altinity Ltd
 
Oracle 10g Performance: chapter 09 enqueues
Kyle Hailey
 
Jurijs Velikanovs - RAC Attack 101 - How to install 12c RAC on your laptop
Andrejs Vorobjovs
 
MySQL Audit using Percona audit plugin and ELK
YoungHeon (Roy) Kim
 
UKOUG, Oracle Transaction Locks
Kyle Hailey
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
Dave Stokes
 
2017 DNSSEC KSK Rollover
APNIC
 
Rolling the Root KSK
APNIC
 
Inside rac
Shakti Singh
 
State of The Dolphin - May 2021
Frederic Descamps
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax Academy
 
Phd tutorial hawq_v0.1
seungdon Choi
 
Dsi 11g convert_to RAC
Anil Kumar
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
Ad

Similar to Build a DataWarehouse for your logs with Python, AWS Athena and Glue (20)

PDF
Analyzing Log Data With Apache Spark
Spark Summit
 
PPT
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
PPTX
Fatkulin presentation
Enkitec
 
PDF
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
Big Data Joe™ Rossi
 
PPTX
Modern sql
Elizabeth Smith
 
PPTX
from source to solution - building a system for event-oriented data
Eric Sammer
 
PPTX
Dmytro Okhonko "LogDevice: durable and highly available sequential distribute...
Fwdays
 
PDF
Cloud Storage Spring Cleaning: A Treasure Hunt
Steven Moy
 
PDF
Spark SQL - 10 Things You Need to Know
Kristian Alexander
 
PDF
Oracle Management Cloud
Dheeraj Hiremath
 
PDF
Oracle Management Cloud
Dheeraj Hiremath
 
PDF
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
ijujournal
 
PDF
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
ijujournal
 
PDF
Postgres Vienna DB Meetup 2014
Michael Renner
 
PDF
Log Analysis Engine with Integration of Hadoop and Spark
IRJET Journal
 
PDF
Cassandra summit keynote 2014
jbellis
 
PDF
Monitoring Error Logs at Databricks
Anyscale
 
PPTX
SEMLA_logging_infra
swy351
 
PDF
The Accidental DBA
PostgreSQL Experts, Inc.
 
PDF
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Sandesh Rao
 
Analyzing Log Data With Apache Spark
Spark Summit
 
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Fatkulin presentation
Enkitec
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
Big Data Joe™ Rossi
 
Modern sql
Elizabeth Smith
 
from source to solution - building a system for event-oriented data
Eric Sammer
 
Dmytro Okhonko "LogDevice: durable and highly available sequential distribute...
Fwdays
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Steven Moy
 
Spark SQL - 10 Things You Need to Know
Kristian Alexander
 
Oracle Management Cloud
Dheeraj Hiremath
 
Oracle Management Cloud
Dheeraj Hiremath
 
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
ijujournal
 
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
ijujournal
 
Postgres Vienna DB Meetup 2014
Michael Renner
 
Log Analysis Engine with Integration of Hadoop and Spark
IRJET Journal
 
Cassandra summit keynote 2014
jbellis
 
Monitoring Error Logs at Databricks
Anyscale
 
SEMLA_logging_infra
swy351
 
The Accidental DBA
PostgreSQL Experts, Inc.
 
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Sandesh Rao
 
Ad

More from Maxym Kharchenko (7)

PPTX
Hadoop databases for oracle DBAs
Maxym Kharchenko
 
PPTX
How to scale relational (OLTP) databases. Think: Sharding @C16LV
Maxym Kharchenko
 
PPTX
Visualizing ORACLE performance data with R @ #C16LV
Maxym Kharchenko
 
PPTX
Commit2015 kharchenko - python generators - ext
Maxym Kharchenko
 
PPTX
2015 555 kharchenko_ppt
Maxym Kharchenko
 
PPTX
Finding SQL execution outliers
Maxym Kharchenko
 
PPTX
SQL Top-N and pagination pattern (IOUG)
Maxym Kharchenko
 
Hadoop databases for oracle DBAs
Maxym Kharchenko
 
How to scale relational (OLTP) databases. Think: Sharding @C16LV
Maxym Kharchenko
 
Visualizing ORACLE performance data with R @ #C16LV
Maxym Kharchenko
 
Commit2015 kharchenko - python generators - ext
Maxym Kharchenko
 
2015 555 kharchenko_ppt
Maxym Kharchenko
 
Finding SQL execution outliers
Maxym Kharchenko
 
SQL Top-N and pagination pattern (IOUG)
Maxym Kharchenko
 

Recently uploaded (20)

PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 

Build a DataWarehouse for your logs with Python, AWS Athena and Glue

  • 1. Session ID: Prepared by: Remember to complete your evaluation for this session within the app! 1495 Build a DataWarehouse for your (alert!) logs With Python, AWS Athena and AWS Glue Wednesday, April 25 2018 Maxym Kharchenko Sr. Database Engineer Amazon.com
  • 2. whoami • Sr Database Engineer @amazon.com Big Data Technologies team • Developer <-> DBA • OCM, ACE Associate, AWS Developer (all “alumni”) • I have stickers!
  • 3. Agenda • Why query (alert) logs with SQL • How to query (alert) logs with SQL • How to make it easy and efficient with AWS Athena and Glue • Demo
  • 4. Logs are the best operational data about your system
  • 5. Logs are great at simple ”tactical” questions “Why did my query fail at 17:17 yesterday ?” Sun Feb 11 17:17:04 2018 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 16: ‘/ora02/database/mydb/tbs12mydb_01.dbf' “Why am I missing today’s partition ?” Thu Jan 11 11:40:55 2018 Errors in file /logs/mydb/trace/mydb-36_j005_38530.trc: ORA-12012: error on auto execute of job "PART_ADMIN"."CREATE_PARTITION” ORA-00028: your session has been killed mydb alert.log
  • 6. But not so great when questions get “broader” “Did the last patch solve our problem ? > grep ORA-28 alert.log opiodr aborting process unknown ospid (3411) as a result of ORA-28 opiodr aborting process unknown ospid (65973) as a result of ORA-28 opiodr aborting process unknown ospid (56719) as a result of ORA-28 opiodr aborting process unknown ospid (129663) as a result of ORA-28 opiodr aborting process unknown ospid (11260) as a result of ORA-28 opiodr aborting process unknown ospid (22534) as a result of ORA-28 mydb alert.log
  • 7. Or when analyzing multiple logs “What is the timeline of the latest cluster lockup issue ?” Wed May 24 11:17:10 2017 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Wed May 24 11:17:17 2017 Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Wed May 24 11:17:28 2017 Post SMON to start 1st pass IR Fix write in gcs resources Reconfiguration complete 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 8. Or when correlating data across different logs “Are we seeing more node crashes because of - Disk malfunctions ? - ASM issues ? - Network disconnects ? ” > grep “WARNING: inbound connection timed out” alert*.log > grep “corrupted block” asm*.log > grep -P “failed|error|critical” kern*.log > grep -P “long wait|error|disconnect” tnsping*.log 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 18
  • 9. Or when looking for trends “Has the rate of network disconnects increased over the last 6 months ?” “What databases have the highest archived log switch rate?” “Do we see more problems in specific datacenter locations ?” “Are there times of the day with almost no user activity ?”
  • 10. Logs are not exactly easy to query (in bulk)
  • 11. If only there was a simpler way to query all my logs … SELECT trunc(event_time, ‘DD’), db, count(1) AS errors FROM “all my logs” WHERE event_time > sysdate – interval ‘90’ days AND ( message LIKE ‘%ORA-00028%’ OR message LIKE ‘%ORA-28%’ ) GROUP BY trunc(event_time, ‘DD’), db ORDER BY 1,2 /
  • 12. How to query (application, db, …) logs with SQL
  • 13. Is it even possible to query “unstructured text” with SQL ?
  • 14. SQL Engines! “Table” • Linux “directory” • HDFS “folder” • Cloud storage “folder” Log files (aka: “text”) ?
  • 15. How to make logs “queriable” 1. Structur-ize 2. Table-ize 3. Transform and Compact-ize
  • 16. Step 1: Structur-ize ”Raw” logs (i.e. alert_db.log) “Structured” (i.e. JSON) logs
  • 17. Step 1: Find “structure” in logs Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off Thu Jan 11 17:15:54 2018 Thread 32 advanced to log sequence 34018 (LGWR switch) Current log# 251 seq# 34018 mem# 0: +DG1/mydb-1/onlinelog/group_12.384.931698439 Thu Jan 11 17:16:25 2018 Unable to create archive log file ‘+DG1’ ARC1: Error 19504 Creating archive log file to ‘+DG1’ ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance mydb-1 - Archival Error ORA-16038: log 12 sequence# 34017 cannot be archived ORA-19504: failed to create file "” ORA-00312: online log 254 thread 32: ‘+DG1/mydb-1/onlinelog/group_12.593.933491557'
  • 18. Step 1: Find “structure” in logs Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off Thu Jan 11 17:15:54 2018 Thread 32 advanced to log sequence 34018 (LGWR switch) Current log# 251 seq# 34018 mem# 0: +DG1/mydb-1/onlinelog/group_12.384.931698439 Thu Jan 11 17:16:25 2018 Unable to create archive log file ‘+DG1’ ARC1: Error 19504 Creating archive log file to ‘+DG1’ ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance mydb-1 - Archival Error ORA-16038: log 12 sequence# 34017 cannot be archived ORA-19504: failed to create file "” ORA-00312: online log 254 thread 32: ‘+DG1/mydb-1/onlinelog/group_12.593.933491557'
  • 19. Step 1: Make log structure explicit #! /usr/bin/env python import json, re, sys # Line format: <timestamp> <message> # i.e. Jan 11 20:30:59 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] LINE_FORMAT = re.compile("^(w+s+d+s+d+:d+:d+)s+(.*)$") for line in sys.stdin: matched = LINE_FORMAT.match(line) if matched: # print ",".join(matched.groups()) print json.dumps( dict(zip(("event_time", "message"), matched.groups())) )
  • 20. Step 1: Make log structure explicit Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off { "message": "host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk", "event_time": "Jan 11 20:30:59” } { "message": "host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk", "event_time": "Jan 11 20:30:59” }
  • 21. Step 1: Make log structure explicit Jan 11 20:30:59 host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185012.426272] sd 2:0:1:169: [sdgfs] Attached SCSI disk Jan 11 20:30:59 host-12 kernel: [185022.726345] rport-2:0-22: blocked FC remote port Jan 11 20:30:59 host-12 kernel: [185076.513763] sd 2:0:13:0: [sdd] Write Protect is off { "message": "host-12 kernel: [185012.404818] sd 2:0:1:168: [sdgfp] Attached SCSI disk", "event_time": ”2018-01-11 20:30:59.000” } { "message": "host-12 kernel: [185012.425995] sd 2:0:1:167: [sdgfn] Attached SCSI disk", "event_time": ”2018-01-11 20:30:59.000” }
  • 22. Step 2: Table-ize Table “directory” Table “Metadata” CREATE TABLE … “Structured” (i.e. JSON) logs
  • 23. Step 2: Create table and “ingest” data CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable ( `event_time` timestamp, `message` string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe’ LOCATION 's3://databucket/mydb/mytable/’ ; > cp log*.json /data/mydb/mytable > hadoop fs -cp log*.json hdfs:/data/mydb/mytable > aws s3 cp log*.json s3:/databucket//mydb/mytable
  • 24. Step 3: Transform (into final form) • Rollup • Aggregations • Materializing complex joins • Partitioning
  • 26. Open data formats TSV • Text based • Row-oriented • Some compression • Limited filtering • Easy to make • Binary • Columnar • Really good compression • Advanced filtering • More difficult to make
  • 27. Step 3: Transform and Compact-ize JSON logs PARQUET Logs • Format Transform • SQL Transform
  • 29. The SQL-on-logs pipeline Staging table(s) ”Raw” logs Structured logs Final table(s)
  • 30. Step 5: Make it simple with AWS ”Raw” logs Structured logs “Staging” S3 bucket “Final” S3 bucket AWS Glue AWS Athena
  • 31. AWS Athena • ”Query data in S3 using SQL” • Serverless Presto cluster • Rich SQL • Supports multiple open data formats • Fast, interactive performance
  • 32. AWS Glue • ”Prepare and load data (ETL!)” • Serverless Apache Spark • Crawlers: ”data discovery” and automatic catalog maintenance • Job scheduling • Integrated with many data “sources” and “sinks” • ETL script generation (or BYO)
  • 35. Pre-parse logs in the cloud S3: “Staging” (JSON) S3: “Final” (Parquet) Glue to_parquet() Athena “Raw” logs S3: “Raw” logs Lambda to_json()
  • 36. Build materialized views S3: “Staging” (JSON) S3: “Final” (Parquet) Glue to_parquet() Athena “Raw” logs S3: “Raw” logs Lambda to_json() Glue: make_mview()
  • 37. Use different SQL front-ends S3: “Staging” (JSON) S3: “Final” (Parquet) Glue to_parquet() Athena “Raw” logs S3: “Raw” logs Lambda to_json() to_redshift() Redshift to_oracle() RDS ORACLE
  • 38. Session ID: Remember to complete your evaluation for this session within the app! 1495 Thank you! [email protected]