SlideShare a Scribd company logo
14
Most read
16
Most read
17
Most read
Presented By:
Swantika Gupta
Software Consultant
Databricks and
Logging in
Notebooks
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your mobile devices in silent
mode, feel free to move out of
session in case you need to attend
an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during
the session.
Agenda
What is Databricks?
Reasons to use Azure Databricks
Azure Databricks Core Artifacts
Logging in Scala Notebooks
Workspace Clusters
Notebooks Libraries
Jobs Data
What is Databricks ?
Industry - leading, Zero-management cloud platform built around Spark.
Delivers
- fully managed Spark clusters
- an interactive workspace for exploration and visualization
- a production pipeline scheduler
- a platform for powering your favorite Spark-based applications
So instead of tackling headaches like setting up infrastructures, creating data backup, scaling
your nodes according to load, you can finally focus on finding answers that make an immediate
impact on your business.
It is a product offered by a third - party, but it is offered as a first - class service tightly integrated
with AWS and Azure
Reasons to use Azure Databricks
Familiar Languages and Environment
Higher Productivity and Collaboration
Easy integration with Microsoft Stack
Extensive List of Data Sources
Suitable for Small Jobs too
Extensive Documentation and Support Available
Azure Databricks Core Artifacts
Azure Databricks Artifact - Workspace
An environment inside Databricks
service with access to all your
databricks resources
Organizes various objects, like
Notebooks, Jars into Folders.
Provides easy one-click access to
computational resources, like clusters
and data stored
Azure Databricks Artifact - Clusters
Core Component of Databricks
A set of computational resources and
configurations
Runs our Data Engineering, Data
Science and Data Analytics
workloads
Types of Clusters:
- Interactive Clusters
- Automated Clusters
Azure Databricks Artifact - Data
● Create Tables directly from imported data.
The table schema is stored in the internal
databricks metastore
● Use Apache Spark commands to read data
from supported data source
● Import data into DBFS and use the DBFS
CLI, DBFS API, DBFS utilities, Spark APIs,
and local file APIs to access the data.
Azure Databricks Artifact - Notebooks
Web-Based interface to a Document
containing
- Runnable Code
- Visualizations
- Narrations
Support for multiple languages in the
same notebook
Real-time collaboration on the same
notebook
Revision History of the notebook
Azure Databricks Artifact - Jobs
As an alternative to running notebooks
interactively, you can set for a notebook or Jar
to either run immediately or on a scheduled
basis.
Three types of task can be run as jobs:
- Notebooks
- JARs
- Spark Submit
Notebook Job and JAR jobs can be
configured by passing parameters too
Spark-Submit can also be configured
Azure Databricks Artifact - Libraries
A Library can be installed on a cluster to
make some third-party or custom code
available to the running notebook or JAR
Libraries can be installed in 3 modes:
- Workspace Libraries
- Cluster Libraries
- Notebook scoped Libraries
Spark Jobs work with large amount of data and the tasks involve time consuming
computations.
They run at remote locations too. So, it becomes difficult to track the execution step without
logs.
Logs help to track at what point is the execution at, they help the developer to debug at what
points is the job consuming maximum of it’s time.
Logs often contain vast amounts of metadata, including date stamps, logger name [that can be
set to be the name of the Logging Class], source information such as cluster name. This data
helps in the debugging process.
Using Real-time Logs and messages logged in them, certain Alerting techniques can be used
to create notifications if a log containing a particular message appears.
Importance of Logs
Logging in Databricks Scala Notebooks
Databricks’ Log Delivery System
Delivers Spark Driver, Executor and Event
Logs to a location specified during configuring
the cluster
Delivered every 5 minutes
Incase a cluster terminates, Databricks make
sure to deliver all logs up till the termination to
the delivery location.
Logging in Databricks Scala Notebooks
Default Log4j Properties in Databricks
Two log4j.properties files:
For Driver:
For Executors:
Logging in Databricks Scala Notebooks
Overwriting the Default Log4j Properties with init scripts
Drawback - Every time configuration is to be changed, cluster restart is required
Logging in Databricks Scala Notebooks
Using External Log4j configuration for
your job
- Create a Log4j properties file for your
custom loggers and appenders
- Upload the file to your required DBFS
location
- Use Log4j’s PropertyConfigurator object
to configure using your custom log4j
properties
References
https://kb.databricks.com/clusters/overwrite-log4j-logs.html
https://www.youtube.com/watch?v=cxyUy1bZ9mk
https://forums.databricks.com/questions/17625/how-can-i-customize-log4j.html
https://docs.databricks.com/getting-started/concepts.html
https://blog.knoldus.com/databricks-make-log4j-configurable
Thank You !
Get in touch with us:
Lorem Studio, Lord Building
D4456, LA, USA

More Related Content

What's hot (20)

PDF
Simplifying Model Management with MLflow
Databricks
 
PDF
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
PPTX
Resilient Distributed DataSets - Apache SPARK
Taposh Roy
 
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
PDF
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
 
PDF
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
PPTX
Introducing Azure SQL Data Warehouse
James Serra
 
PPTX
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
PPTX
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
PPT
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
 
PPTX
ADF Demo_ppt.pptx
vamsytaurus
 
PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
PPTX
Snowflake Datawarehouse Architecturing
Ishan Bhawantha Hewanayake
 
PDF
3D: DBT using Databricks and Delta
Databricks
 
PDF
WEB SCRAPING.pdf
Anass Nabil
 
PDF
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
PPTX
Oracle database performance tuning
Yogiji Creations
 
PDF
Introduction to Apache Spark
Samy Dindane
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
PPTX
Apache Spark Fundamentals
Zahra Eskandari
 
Simplifying Model Management with MLflow
Databricks
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Resilient Distributed DataSets - Apache SPARK
Taposh Roy
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
Introducing Azure SQL Data Warehouse
James Serra
 
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
 
ADF Demo_ppt.pptx
vamsytaurus
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
Snowflake Datawarehouse Architecturing
Ishan Bhawantha Hewanayake
 
3D: DBT using Databricks and Delta
Databricks
 
WEB SCRAPING.pdf
Anass Nabil
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
Oracle database performance tuning
Yogiji Creations
 
Introduction to Apache Spark
Samy Dindane
 
Introduction to Apache Spark
Rahul Jain
 
Apache Spark Fundamentals
Zahra Eskandari
 

Similar to Databricks and Logging in Notebooks (20)

PPTX
Azure Databricks (For Data Analytics).pptx
Knoldus Inc.
 
PPTX
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
PPTX
Introduction to Databricks - AccentFuture
Accentfuture
 
PDF
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
PPTX
Introduction to Azure Databricks
James Serra
 
PPTX
Introduction_to_Databricks_power_point_presentation.pptx
xeranaw566
 
PDF
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
PDF
Predicting Flights with Azure Databricks
Sarah Dutkiewicz
 
DOCX
Databricks Online Training | Databricks Online Course
Accentfuture
 
PPTX
TechEvent Databricks on Azure
Trivadis
 
PDF
Master Databricks with AccentFuture – Online Training
Accentfuture
 
PPTX
Azure Databricks is Easier Than You Think
Ike Ellis
 
PDF
Data Lakes with Azure Databricks
Data Con LA
 
PPTX
Databricks for Dummies
Rodney Joyce
 
PDF
Databricks Online Training | Databricks Online Course
Accentfuture
 
PPTX
slides.pptx
MayankJain659
 
PPTX
slides.pptx
FahmiTounsiBakri
 
PPTX
Azure Data serices and databricks architecture
AdventureWorld5
 
PPTX
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
PPTX
Azure Databricks Training | Azure Databricks Online Training
eshwarvisualpath
 
Azure Databricks (For Data Analytics).pptx
Knoldus Inc.
 
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
Introduction to Databricks - AccentFuture
Accentfuture
 
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
Introduction to Azure Databricks
James Serra
 
Introduction_to_Databricks_power_point_presentation.pptx
xeranaw566
 
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
Predicting Flights with Azure Databricks
Sarah Dutkiewicz
 
Databricks Online Training | Databricks Online Course
Accentfuture
 
TechEvent Databricks on Azure
Trivadis
 
Master Databricks with AccentFuture – Online Training
Accentfuture
 
Azure Databricks is Easier Than You Think
Ike Ellis
 
Data Lakes with Azure Databricks
Data Con LA
 
Databricks for Dummies
Rodney Joyce
 
Databricks Online Training | Databricks Online Course
Accentfuture
 
slides.pptx
MayankJain659
 
slides.pptx
FahmiTounsiBakri
 
Azure Data serices and databricks architecture
AdventureWorld5
 
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
Azure Databricks Training | Azure Databricks Online Training
eshwarvisualpath
 
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
PPTX
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
PPTX
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
PPTX
Java 17 features and implementation.pptx
Knoldus Inc.
 
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
PPTX
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
PPTX
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
PPTX
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
PPTX
Intro to Azure Container App Presentation
Knoldus Inc.
 
PPTX
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
PPTX
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
PPTX
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
PPTX
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
July Patch Tuesday
Ivanti
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
July Patch Tuesday
Ivanti
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 

Databricks and Logging in Notebooks

  • 1. Presented By: Swantika Gupta Software Consultant Databricks and Logging in Notebooks
  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call. Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3. Agenda What is Databricks? Reasons to use Azure Databricks Azure Databricks Core Artifacts Logging in Scala Notebooks Workspace Clusters Notebooks Libraries Jobs Data
  • 4. What is Databricks ? Industry - leading, Zero-management cloud platform built around Spark. Delivers - fully managed Spark clusters - an interactive workspace for exploration and visualization - a production pipeline scheduler - a platform for powering your favorite Spark-based applications So instead of tackling headaches like setting up infrastructures, creating data backup, scaling your nodes according to load, you can finally focus on finding answers that make an immediate impact on your business. It is a product offered by a third - party, but it is offered as a first - class service tightly integrated with AWS and Azure
  • 5. Reasons to use Azure Databricks Familiar Languages and Environment Higher Productivity and Collaboration Easy integration with Microsoft Stack Extensive List of Data Sources Suitable for Small Jobs too Extensive Documentation and Support Available
  • 7. Azure Databricks Artifact - Workspace An environment inside Databricks service with access to all your databricks resources Organizes various objects, like Notebooks, Jars into Folders. Provides easy one-click access to computational resources, like clusters and data stored
  • 8. Azure Databricks Artifact - Clusters Core Component of Databricks A set of computational resources and configurations Runs our Data Engineering, Data Science and Data Analytics workloads Types of Clusters: - Interactive Clusters - Automated Clusters
  • 9. Azure Databricks Artifact - Data ● Create Tables directly from imported data. The table schema is stored in the internal databricks metastore ● Use Apache Spark commands to read data from supported data source ● Import data into DBFS and use the DBFS CLI, DBFS API, DBFS utilities, Spark APIs, and local file APIs to access the data.
  • 10. Azure Databricks Artifact - Notebooks Web-Based interface to a Document containing - Runnable Code - Visualizations - Narrations Support for multiple languages in the same notebook Real-time collaboration on the same notebook Revision History of the notebook
  • 11. Azure Databricks Artifact - Jobs As an alternative to running notebooks interactively, you can set for a notebook or Jar to either run immediately or on a scheduled basis. Three types of task can be run as jobs: - Notebooks - JARs - Spark Submit Notebook Job and JAR jobs can be configured by passing parameters too Spark-Submit can also be configured
  • 12. Azure Databricks Artifact - Libraries A Library can be installed on a cluster to make some third-party or custom code available to the running notebook or JAR Libraries can be installed in 3 modes: - Workspace Libraries - Cluster Libraries - Notebook scoped Libraries
  • 13. Spark Jobs work with large amount of data and the tasks involve time consuming computations. They run at remote locations too. So, it becomes difficult to track the execution step without logs. Logs help to track at what point is the execution at, they help the developer to debug at what points is the job consuming maximum of it’s time. Logs often contain vast amounts of metadata, including date stamps, logger name [that can be set to be the name of the Logging Class], source information such as cluster name. This data helps in the debugging process. Using Real-time Logs and messages logged in them, certain Alerting techniques can be used to create notifications if a log containing a particular message appears. Importance of Logs
  • 14. Logging in Databricks Scala Notebooks Databricks’ Log Delivery System Delivers Spark Driver, Executor and Event Logs to a location specified during configuring the cluster Delivered every 5 minutes Incase a cluster terminates, Databricks make sure to deliver all logs up till the termination to the delivery location.
  • 15. Logging in Databricks Scala Notebooks Default Log4j Properties in Databricks Two log4j.properties files: For Driver: For Executors:
  • 16. Logging in Databricks Scala Notebooks Overwriting the Default Log4j Properties with init scripts Drawback - Every time configuration is to be changed, cluster restart is required
  • 17. Logging in Databricks Scala Notebooks Using External Log4j configuration for your job - Create a Log4j properties file for your custom loggers and appenders - Upload the file to your required DBFS location - Use Log4j’s PropertyConfigurator object to configure using your custom log4j properties
  • 19. Thank You ! Get in touch with us: Lorem Studio, Lord Building D4456, LA, USA