SlideShare a Scribd company logo
3
Most read
4
Most read
8
Most read
Introduction to SQOOP
Agenda
 What is Sqoop
 Why Sqoop?
 How Sqoop Works
 Sqoop Architecture
 Sqoop Import
 Sqoop Export
What is Sqoop
 Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and
structured datastores such as relational databases.
 Sqoop imports data from external structured datastores into HDFS or related systems like Hive and
HBase.
 Sqoop can also be used to export data from Hadoop and export it to external structured datastores
such as relational databases and enterprise data warehouses.
Why Sqoop?
 As more organizations deploy Hadoop to analyse vast streams of information, they may
find they need to transfer large amount of data between Hadoop and their existing
databases, data warehouses and other data sources
 Loading bulk data into Hadoop from production systems or accessing it from map-
reduce applications running on a large cluster is a challenging task since transferring
data using scripts is a inefficient and time-consuming task
 Allows data imports from external datastores and enterprise data warehouses into
Hadoop
 Parallelizes data transfer for fast performance and optimal system utilization
 Copies data quickly from external systems to Hadoop
 Makes data analysis more efficient
How Sqoop Works
Sqoop Architecture
Sqoop Import
 sqoop import --connect jdbc:postgresql://hdp-master/sqoop_db --username
sqoop_user --password postgres --table cities
Sqoop Export
 sqoop export --connect jdbc:postgresql://hdp-master/sqoop_db --username
sqoop_user --password postgres --table cities --export-dir cities

More Related Content

What's hot (20)

PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
PPTX
Programming in Spark using PySpark
Mostafa
 
PPTX
Big data and Hadoop
Rahul Agarwal
 
PPTX
Introduction to HiveQL
kristinferrier
 
PPTX
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
PPTX
03 hive query language (hql)
Subhas Kumar Ghosh
 
PPTX
Apache Airflow overview
NikolayGrishchenkov
 
PDF
Introduction to Spark with Python
Gokhan Atil
 
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
PDF
Introduction to PySpark
Russell Jurney
 
PPTX
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
PPTX
Introduction to Hadoop and Hadoop component
rebeccatho
 
PPTX
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
 
PDF
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
Edureka!
 
PDF
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
PDF
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
PDF
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
PDF
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Programming in Spark using PySpark
Mostafa
 
Big data and Hadoop
Rahul Agarwal
 
Introduction to HiveQL
kristinferrier
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
03 hive query language (hql)
Subhas Kumar Ghosh
 
Apache Airflow overview
NikolayGrishchenkov
 
Introduction to Spark with Python
Gokhan Atil
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Introduction to Apache Spark
Rahul Jain
 
Introduction to PySpark
Russell Jurney
 
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
Introduction to Hadoop and Hadoop component
rebeccatho
 
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
Edureka!
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Introduction to Azure Data Lake
Antonios Chatzipavlis
 

Viewers also liked (20)

PDF
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
aaamase
 
PDF
Highlights Of Sqoop2
Alexander Alten
 
PPTX
Big Data with Apache Hadoop
InfoFarm
 
PPTX
Hadoop crashcourse v3
Hortonworks
 
PDF
Big data: Loading your data with flume and sqoop
Christophe Marchal
 
PDF
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
PDF
Optimizing Hive Queries
Owen O'Malley
 
PPTX
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
 
PDF
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
PDF
Apache Flume
GetInData
 
PDF
Apache Flume
Arinto Murdopo
 
PPTX
From oracle to hadoop with Sqoop and other tools
Guy Harrison
 
PDF
Intro To MongoDB
Alex Sharp
 
PDF
Apache Flume - DataDayTexas
Arvind Prabhakar
 
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
PPT
Introduction to MongoDB
Ravi Teja
 
PDF
Hive Quick Start Tutorial
Carl Steinbach
 
PDF
Integration of Hive and HBase
Hortonworks
 
KEY
Intro to Data Science for Enterprise Big Data
Paco Nathan
 
PDF
Hadoop Family and Ecosystem
tcloudcomputing-tw
 
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
aaamase
 
Highlights Of Sqoop2
Alexander Alten
 
Big Data with Apache Hadoop
InfoFarm
 
Hadoop crashcourse v3
Hortonworks
 
Big data: Loading your data with flume and sqoop
Christophe Marchal
 
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
Optimizing Hive Queries
Owen O'Malley
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
 
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
Apache Flume
GetInData
 
Apache Flume
Arinto Murdopo
 
From oracle to hadoop with Sqoop and other tools
Guy Harrison
 
Intro To MongoDB
Alex Sharp
 
Apache Flume - DataDayTexas
Arvind Prabhakar
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Introduction to MongoDB
Ravi Teja
 
Hive Quick Start Tutorial
Carl Steinbach
 
Integration of Hive and HBase
Hortonworks
 
Intro to Data Science for Enterprise Big Data
Paco Nathan
 
Hadoop Family and Ecosystem
tcloudcomputing-tw
 
Ad

Similar to Introduction to sqoop (20)

PDF
Sqoop tutorial
Ashoka Vanjare
 
PDF
SQOOP - RDBMS to Hadoop
Sofian Hadiwijaya
 
PDF
Sqoop Explanation with examples and syntax
dspyanand
 
PDF
Introduction to scoop and its functions
Rupak Roy
 
PPT
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.ppt
AjajKhan23
 
PDF
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
PDF
Oracle hadoop let them talk together !
Laurent Leturgez
 
PPTX
Advanced Sqoop
Yogesh Kulkarni
 
PDF
Scoop Job, import and export to RDBMS
Rupak Roy
 
PPTX
Hadoop and rdbms with sqoop
Guy Harrison
 
PPTX
Big Data and Hadoop
ch adnan
 
PDF
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Skills Matter
 
PPTX
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
 
PPTX
Bigdata
sweetysweety8
 
PPT
Apache scoop overview
Nisanth Simon
 
PDF
Why and How to integrate Hadoop and NoSQL?
Tugdual Grall
 
PPTX
Hadoop and MapReduce
Abhishek Dey
 
PPTX
Case study on big data
Khushboo Kumari
 
PPT
Introduction to Apache hadoop
Omar Jaber
 
PPTX
Introduction of Big data and Hadoop
Arohi Khandelwal
 
Sqoop tutorial
Ashoka Vanjare
 
SQOOP - RDBMS to Hadoop
Sofian Hadiwijaya
 
Sqoop Explanation with examples and syntax
dspyanand
 
Introduction to scoop and its functions
Rupak Roy
 
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.ppt
AjajKhan23
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Oracle hadoop let them talk together !
Laurent Leturgez
 
Advanced Sqoop
Yogesh Kulkarni
 
Scoop Job, import and export to RDBMS
Rupak Roy
 
Hadoop and rdbms with sqoop
Guy Harrison
 
Big Data and Hadoop
ch adnan
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Skills Matter
 
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
 
Bigdata
sweetysweety8
 
Apache scoop overview
Nisanth Simon
 
Why and How to integrate Hadoop and NoSQL?
Tugdual Grall
 
Hadoop and MapReduce
Abhishek Dey
 
Case study on big data
Khushboo Kumari
 
Introduction to Apache hadoop
Omar Jaber
 
Introduction of Big data and Hadoop
Arohi Khandelwal
 
Ad

More from Uday Vakalapudi (12)

PPTX
Introduction to pig
Uday Vakalapudi
 
PPTX
Introduction to hbase
Uday Vakalapudi
 
PPTX
Introduction to Hive
Uday Vakalapudi
 
PPTX
Introduction to HDFS and MapReduce
Uday Vakalapudi
 
PPTX
Advanced topics in hive
Uday Vakalapudi
 
PPTX
Mapreduce total order sorting technique
Uday Vakalapudi
 
PPTX
Repartition join in mapreduce
Uday Vakalapudi
 
PPTX
Hadoop Mapreduce joins
Uday Vakalapudi
 
PPTX
Oozie workflow using HUE 2.2
Uday Vakalapudi
 
PPTX
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
PPTX
How Hadoop Exploits Data Locality
Uday Vakalapudi
 
PPTX
Flume basic
Uday Vakalapudi
 
Introduction to pig
Uday Vakalapudi
 
Introduction to hbase
Uday Vakalapudi
 
Introduction to Hive
Uday Vakalapudi
 
Introduction to HDFS and MapReduce
Uday Vakalapudi
 
Advanced topics in hive
Uday Vakalapudi
 
Mapreduce total order sorting technique
Uday Vakalapudi
 
Repartition join in mapreduce
Uday Vakalapudi
 
Hadoop Mapreduce joins
Uday Vakalapudi
 
Oozie workflow using HUE 2.2
Uday Vakalapudi
 
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
How Hadoop Exploits Data Locality
Uday Vakalapudi
 
Flume basic
Uday Vakalapudi
 

Recently uploaded (20)

PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 

Introduction to sqoop

  • 2. Agenda  What is Sqoop  Why Sqoop?  How Sqoop Works  Sqoop Architecture  Sqoop Import  Sqoop Export
  • 3. What is Sqoop  Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.  Sqoop imports data from external structured datastores into HDFS or related systems like Hive and HBase.  Sqoop can also be used to export data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.
  • 4. Why Sqoop?  As more organizations deploy Hadoop to analyse vast streams of information, they may find they need to transfer large amount of data between Hadoop and their existing databases, data warehouses and other data sources  Loading bulk data into Hadoop from production systems or accessing it from map- reduce applications running on a large cluster is a challenging task since transferring data using scripts is a inefficient and time-consuming task  Allows data imports from external datastores and enterprise data warehouses into Hadoop  Parallelizes data transfer for fast performance and optimal system utilization  Copies data quickly from external systems to Hadoop  Makes data analysis more efficient
  • 7. Sqoop Import  sqoop import --connect jdbc:postgresql://hdp-master/sqoop_db --username sqoop_user --password postgres --table cities
  • 8. Sqoop Export  sqoop export --connect jdbc:postgresql://hdp-master/sqoop_db --username sqoop_user --password postgres --table cities --export-dir cities