SlideShare a Scribd company logo
Performance and
Application of GIS
and Big Data ETL
Processes Using FME
Courtney Maxson
Spatial Business Systems
About SBS
• Spatial integration firm based in Lakewood, CO
• FME Gold-level reseller
• 5 certified FME professionals on staff
• Over 15 years of FME experience
www.spatialbiz.com
Overview
• Explore process / issues/ requirements of
working with Big Data
• Design / Test / Analyze various configurations
of development and production environments
Software &
Supporting Technology
Cloud-Based Data Storage
• Amazon S3
• DynamoDB
• PostGIS
Software
• ExpanDrive
• TntDrive
• FME Desktop
• FME Cloud
Constants
Constants
• Data
• Study Area
• Base FME Workspace
Data
• Landsat 8 single-band imagery
• Stored in Amazon S3 bucket
• More than 150 TB
• Divided by WRS Paths / Rows
• 11 Bands
• GeoTIFF and Textfile
Study Area
• Focused on Madagascar
• Minimize processing time
• 40 WRS-2 Scenes
• 1 TB Landsat data
Base FME Workspace
1. Manipulate Metadata
MetadataExtractor
2. Select Only 1 Text File Per Scene and Read in
Corresponding GeoTIFFs
3. Extract Attribute Values from GeoTIFF Filename
4. Create Combined-Band Images
FullBandCombiner
5. Enhance Composite Images
Original Image
Pan-Sharpened Image
6. Create Mosaic Images
Output Images
Case Studies
4 Case Studies
• Explore 3 Variables
1. Source Data Location
2. Processing Location
3. Output Data Format and Destination
Case Study 1
• Source Data: Local
• Processing: Local
− FME Desktop
• Output Data: Local
− GeoTIFFs
Case Study 1: Entirely On-Premises
Source Data
Directory
(E:CapstoneData
Landsat_Data)
FME Desktop
Target Data
Directory
(E:CapstoneData
Output_Data)
Reads
GeoTIFF /
Textfile
Writes
GeoTIFF
Case Study 2
• Source Data: Cloud
− Amazon S3 / TntDrive
• Processing: Local
− FME Desktop
• Output Data: Local
− GeoTIFFs
Amazon S3
External Bucket
(landsat-pds)
Amazon Web Services
(AWS)
Local Computer
FME Desktop
Target Data
Directory
(E:CapstoneData
Output_Data)
Case Study 2:
Read from Cloud,
Process Locally,
Write Locally
The Cloud
Using TntDrive to Mount
S3 Bucket as a Drive
Reads
GeoTIFF / Text
Writes
Composite
Images
(GeoTIFF)
Case Study 3
• Source Data: Cloud
− Amazon S3
• Processing: Local
− FME Desktop
• Output Data: Cloud
− Part A: S3 / DynamoDB (JPEG)
− Part B: PostGIS (GeoTIFF)
Amazon S3
Personal Bucket
(maxsonbucket)
DynamoDB
The Cloud
Amazon Web Services
(AWS)
Local Computer
FME Desktop
CS3-B CS3-B
FME Cloud
PostGIS
CS3-A
Amazon S3
External Bucket
(landsat-pds)
Case Study 3:
Read from Cloud,
Process Locally,
Write to Cloud
Writes
Composite
Images
(GeoTIFF)
Write
JPEGs
Indexing and
Geohashes
Reads
GeoTIFF /
Text Using
TntDrive
Reads
GeoTIFF /
Text Using
TntDrive
CS3-A
Case Study 4
• Source Data: Cloud
− Amazon S3
• Processing: Cloud
− FME Cloud (1 Engine)
− FME Cloud (8 Engines)
• Output Data: Cloud
− PostGIS
Amazon S3
External Bucket
(landsat-pds)
FME Cloud
PostGIS
CS4
Case Study 4:
Entirely In the Cloud
Amazon Web
Services
(AWS)
Runs Workspace on
FME Server
Download
GeoTIFF / Text
CS4
Writes
GeoTIFF
Results & Conclusions
Summary Statistics
Success Scores
Thank you!
Courtney Maxson
Spatial Business Systems
courtney.maxson@spatialbiz.com

More Related Content

What's hot (12)

PPTX
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
 
PDF
Petabytes, Exabytes, and Beyond: Managing Delta Lakes for Interactive Queries...
Databricks
 
PPTX
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
 
PDF
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
PDF
Back to FME School - Day 1: Your Data and FME
Safe Software
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PPTX
Presto @ Netflix: Interactive Queries at Petabyte Scale
DataWorks Summit
 
PPTX
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
MongoDB
 
PPTX
Presto@Netflix Presto Meetup 03-19-15
Zhenxiao Luo
 
PDF
Bring Satellite and Drone Imagery into your Data Science Workflows
Databricks
 
PDF
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
 
PDF
InfluxDB 2.0: Dashboarding 101 by David G. Simmons
InfluxData
 
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
 
Petabytes, Exabytes, and Beyond: Managing Delta Lakes for Interactive Queries...
Databricks
 
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
Back to FME School - Day 1: Your Data and FME
Safe Software
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
DataWorks Summit
 
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
MongoDB
 
Presto@Netflix Presto Meetup 03-19-15
Zhenxiao Luo
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Databricks
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
 
InfluxDB 2.0: Dashboarding 101 by David G. Simmons
InfluxData
 

Similar to Performance and Application of GIS and Big Data ETL Processes Using FME (20)

PPTX
Managing data interoperability with FME
IMGS
 
PPTX
FME Around the World
Safe Software
 
PPTX
FME User Stories from Around the World
Safe Software
 
PDF
FME World Tour 2015 - Around the World - Ken Bragg
IMGS
 
PDF
Bandicam Crack FREE Download Latest Version 2025
channarbrothers93
 
PDF
Driving Transportation Forward: Real-World Data Solutions
Safe Software
 
PDF
DVDFab Crack FREE Download Latest Version 2025
sidrawaqar630
 
PDF
Wondershare UniConverter Crack FREE Download Latest Version 2025
fs4635986
 
PDF
Windows 10 Professional Crack FREE Download 2025
mu394968
 
PDF
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
PDF
Automating Spatial Data Sharing
GIM_nv
 
PPTX
Utilities Industry Success Stories with FME
Safe Software
 
PPTX
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
 
PDF
FME Around the World
IMGS
 
PDF
Using Data Integration to Deliver Intelligence to Anyone, Anywhere
Safe Software
 
PDF
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
IMGS
 
PDF
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
PDF
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Safe Software
 
PDF
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
PPTX
Producing Standardized Data Using a Master FME Workspace
Safe Software
 
Managing data interoperability with FME
IMGS
 
FME Around the World
Safe Software
 
FME User Stories from Around the World
Safe Software
 
FME World Tour 2015 - Around the World - Ken Bragg
IMGS
 
Bandicam Crack FREE Download Latest Version 2025
channarbrothers93
 
Driving Transportation Forward: Real-World Data Solutions
Safe Software
 
DVDFab Crack FREE Download Latest Version 2025
sidrawaqar630
 
Wondershare UniConverter Crack FREE Download Latest Version 2025
fs4635986
 
Windows 10 Professional Crack FREE Download 2025
mu394968
 
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Automating Spatial Data Sharing
GIM_nv
 
Utilities Industry Success Stories with FME
Safe Software
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
 
FME Around the World
IMGS
 
Using Data Integration to Deliver Intelligence to Anyone, Anywhere
Safe Software
 
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
IMGS
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Safe Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Producing Standardized Data Using a Master FME Workspace
Safe Software
 
Ad

More from Safe Software (20)

PDF
Notification System for Construction Logistics Application
Safe Software
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
FME in Overdrive - Peak of Data & AI 2025
Safe Software
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PDF
FME in Overdrive: Unleashing the Power of Parallel Processing
Safe Software
 
PDF
Fiber to the People! By Deutsche Telekom
Safe Software
 
PDF
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
Safe Software
 
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
PDF
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
Safe Software
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PDF
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Notification System for Construction Logistics Application
Safe Software
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
FME in Overdrive - Peak of Data & AI 2025
Safe Software
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
FME in Overdrive: Unleashing the Power of Parallel Processing
Safe Software
 
Fiber to the People! By Deutsche Telekom
Safe Software
 
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
Safe Software
 
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
Safe Software
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Ad

Recently uploaded (20)

PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 

Performance and Application of GIS and Big Data ETL Processes Using FME

Editor's Notes

  • #6: Amazon’s Simple Storage Service (S3) Amazon Web Service’s cloud storage service Capable of storing practically unlimited amounts of data Data is stored in directories called buckets Stored input and output data DynamoDB AWS’s fully-managed No-SQL database service DynamoDB was utilized to store geohash indexes and metadata for output imagery PostGIS Spatial database extender for PostgreSQL Most powerful open source spatial database engine Used to store output composite Landsat imagery and metadata
  • #7: ExpanDrive and TntDrive Provide ability to mount S3 bucket as a drive Copy, move, add, and delete files through file directory explorer Utilized to download data from S3 and to allow FME to read data directly from S3 FME Desktop All ETL workflows designed for this project were developed using FME Desktop FME Cloud Runs on AWS Instance runs a virtual FME Server machine and PostGIS database Provides scalability - Instance can be expanded up to 16 cores, with up to 4 engines per core Maximum of 64 jobs running in parallel Was used to run workspaces and to host the output PostGIS database. Two Payment structures Pay by hour Annual subscription
  • #10: Landsat 8 single-band imagery stored in a public Amazon S3 bucket Imagery for 2015 alone contains over 150 TB of data. Catalogued using the Worldwide Reference System (WRS), which divides the satellite-imaged area into paths and rows. The Landsat 8 satellite records imagery in 11 spectral bands, and each image is stored separately. In S3, the data is organized by scene (WRS Path-Row combination) -grouped in sets based landsat pass Each sub-directory contains 11 GeoTIFF’s (1 for each individual Landsat 8 band) and a metadata text file
  • #12: Reads and processes the single-band Landsat imagery and metadata text files Outputs composite images, using 11 common band combinations. 6 main parts
  • #13: Uses the MetadataExtractor custom transformer to create and populate metadata attributes Uses regular expressions Creates attributes such as time, date, station, id, WRS path and row, and cloud cover
  • #14: Developed to increase performance Eliminates the need to read every GeoTIFF within the study area Text files are sorted by scene and filtered to find the most recent file with the lowest cloud coverage per Landsat scene – 1 per scene Only the corresponding GeoTIFF files are read into the workspace In FME 2016, a single AttributeManager can be used to create and rename different attributes
  • #16: Images are filtered by band. Bands are sent through FullBandCombiners Combines images based on band combo type Stretches pixel values to allow for color distinction Outputs combined images
  • #17: Performs raster enhancement techniques Adding an alpha channel for transparency Add Pan Sharpening Uses the high-resolution data from Band 8 to “sharpen” the lower-resolution data RasterExpressionEvaluator to apply a weighted Brovey transformation equation
  • #19: Merges all composite images with the same band combination into a single image
  • #20: List of the 11 band combinations produced by the base FME Workspace Map on the right shows an example of 4 composite images produced for Landsat scene 159-073
  • #22: 4 case studies were developed, which explored 3 variables. The case studies range from being entirely local to entirely in the cloud.
  • #23: Case Study 1 was completed entirely on-premises. Data was read from, processed on, and written to a local machine. The study area dataset was downloaded from S3 to a local drive, using both TntDrive and ExpanDrive. The base workspace was run locally, and the output GeoTIFFs were written to a local directory.
  • #24: Case Study 2 explores the use of cloud storage for the input data, while retaining the locality of the rest of the process. The source text files and GeoTIFFs were read directly from the Amazon S3 bucket, using TntDrive. The reader and writer in the CS2 workspace are identical to those used in CS1, but the pathnames were redirected to the landsat-pds bucket, mounted as the Y: drive.
  • #25: For Case Study 3, all input and output data was stored in the cloud, and the actual image processing took place on-premises. There were 2 parts to Case Study 3: Part A wrote to Amazon S3 and DynamoDB, while Part B wrote to a PostGIS database hosted on FME Cloud. In both parts, the FME workspaces were run locally, but they read the Landsat data directly from the landsat-pds public S3 bucket using TntDrive.
  • #26: Case Study 4 was executed entirely in the Cloud. Source data was read from FME Server resources, the workspaces were run using FME Server, and data was written a PostGIS database that was hosted on FME Cloud. The process was executed twice – with 1 engine, then with 8 engines running in parallel.
  • #28: CS1 – Longest total time – due to downloading CS2 – Fastest Single-threaded process CS4 Multi-Threaded - Fastest total time CS3 Dynamo was cheapest CS4 Single thread – most expensive CS4 Multi-threaded – Cheaper than single
  • #29: Successfulness determined through analyzing common set of success criteria 1 = best, 5 = worst for each criterion Case Study 1 Slowest total time Easiest setup Largest strain on local machine Case study 4 least local strain Case Study 4 8 engines – fastest overall time CS with highest score (CS3 Dynamo) is least successful CS with lowest score (CS4 PostGIS 8 Engines) is most successful