SlideShare a Scribd company logo
Read Anytime Anywhere Easy Ebook Downloads at ebookmeta.com
Mapping Data Flows in Azure Data Factory 1st
Edition Mark Kromer
https://ebookmeta.com/product/mapping-data-flows-in-azure-
data-factory-1st-edition-mark-kromer/
OR CLICK HERE
DOWLOAD EBOOK
Visit and Get More Ebook Downloads Instantly at https://ebookmeta.com
Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.
Azure Data Factory by Example: Practical Implementation
for Data Engineers Swinbank
https://ebookmeta.com/product/azure-data-factory-by-example-practical-
implementation-for-data-engineers-swinbank/
ebookmeta.com
Azure Data Factory by Example: Practical Implementation
for Data Engineers 1st Edition Richard Swinbank
https://ebookmeta.com/product/azure-data-factory-by-example-practical-
implementation-for-data-engineers-1st-edition-richard-swinbank/
ebookmeta.com
Modern Data Architecture on Azure: Design Data-centric
Solutions on Microsoft Azure 1st Edition Sagar Lad
https://ebookmeta.com/product/modern-data-architecture-on-azure-
design-data-centric-solutions-on-microsoft-azure-1st-edition-sagar-
lad/
ebookmeta.com
NEGOTIATION & DISPUTE RESOLUTION 2nd Edition Beverly J.
Demarr
https://ebookmeta.com/product/negotiation-dispute-resolution-2nd-
edition-beverly-j-demarr/
ebookmeta.com
Why CISOs Fail, 2nd Barak Engel
https://ebookmeta.com/product/why-cisos-fail-2nd-barak-engel/
ebookmeta.com
Chasing Whiskey 1st Edition Sophie Stern Stern Sophie
https://ebookmeta.com/product/chasing-whiskey-1st-edition-sophie-
stern-stern-sophie/
ebookmeta.com
Downloaded When a Killer Calls A Haunting Story of Murder
Criminal Profiling and Justice in a Small Town Cases of
the FBI s Original Mindhunter 2 1st Edition John E.
Douglas
https://ebookmeta.com/product/downloaded-when-a-killer-calls-a-
haunting-story-of-murder-criminal-profiling-and-justice-in-a-small-
town-cases-of-the-fbi-s-original-mindhunter-2-1st-edition-john-e-
douglas/
ebookmeta.com
Building the Hyperconnected Society Internet of Things
Research and Innovation Value Chains Ecosystems and
Markets 1st Edition Ovidiu Vermesan
https://ebookmeta.com/product/building-the-hyperconnected-society-
internet-of-things-research-and-innovation-value-chains-ecosystems-
and-markets-1st-edition-ovidiu-vermesan/
ebookmeta.com
Fidele Jenna Lynn Brown
https://ebookmeta.com/product/fidele-jenna-lynn-brown/
ebookmeta.com
The Corporate Media Toolkit Advanced Techniques for
Producers Writers and Directors 1st Edition Ray Dizazzo
https://ebookmeta.com/product/the-corporate-media-toolkit-advanced-
techniques-for-producers-writers-and-directors-1st-edition-ray-
dizazzo/
ebookmeta.com
Mapping Data Flows in Azure Data Factory 1st Edition Mark Kromer
Mark Kromer
Mapping Data Flows in Azure Data
Factory
Building Scalable ETL Projects in the Microsoft
Cloud
Mark Kromer
SNOHOMISH, WA, USA
ISBN 978-1-4842-8611-1 e-ISBN 978-1-4842-8612-8
https://doi.org/10.1007/978-1-4842-8612-8
© Mark Kromer 2022
This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in
any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks,
service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.
The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Apress imprint is published by the registered company APress
Media, LLC, part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
This book is dedicated to my loving wife Stacy and our boys Ethan and
Jude. Thank you for putting up with my late hours working on data
analytics and writing this book!
Introduction
The ETL (extract, transform, load) process has been a cornerstone of
data warehouses, data marts, and business intelligence for decades.
ETL is how data engineers have traditionally refined raw data into
business analytics that guide the business to make better decisions.
These projects have allowed engineers to build up libraries of common
ETL processes and practices from traditional on-premises data
warehouses over the years, very commonly with data coming from
Oracle, Microsoft, IBM, or Sybase databases or business ERP/CRM
applications like Salesforce, SAP, Dynamics, etc. However, over the past
decade, our industry has seen these analytical workloads migrate to the
cloud at a very rapid pace.
To keep up with these changes, we’ve had to adjust ETL techniques
to account for more varied and larger data. The big data revolution and
cloud migrations have forced us to rethink many of our proven ETL
patterns to meet modern data transformation challenges and demands.
Today, the vast majority of data that we process exists primarily in the
cloud. And that data may not always be governed and curated by rigid
business processes in the way that our previous ETL processes could
rely on.
The common scenarios of processing well-known hardened
schemas from SAP and CSV exports will now have a new look and
challenge. The data sources will likely vary in shape, size, and scope
from day to day. We need to account for schema drift, data drift, and
other possible obstructions to refining data in a way that turns the data
into refined business analytics.
Cloud-First ETL with Mapping Data Flows
Welcome to Mapping Data Flows in Azure Data Factory! In this book, I’m
going to introduce you to Microsoft Azure Data Factory and the
Mapping Data Flows feature in ADF as the key ETL toolset to tackle
these modern data analytics challenges. As you make your way through
the book, you’ll learn key concepts, and through the use of examples,
you’ll begin to build your first cloud-based ETL projects that can help
you to unlock the potential of scaled-out big data ETL processing in the
cloud. I’ll demonstrate how to tackle the particularly difficult and
challenging aspects of big data analytics and how to prepare data for
business decision makers in the cloud.
To get the most value from this book, you should have a firm
understanding of building data warehouses and business intelligence
projects. It is not necessary to have many hours of experience building
cloud-first big data analytics projects already. However, having some
experience in cloud computing will provide valuable context that will
help you as you work through some of these new approaches.
The examples and scenarios used in this book will be patterns and
practices that are based on ETL common scenarios, so having data
engineering experience and background will also be very helpful. I’ll
help guide you along as you migrate from traditional on-premises data
engineering to the world of Azure Data Factory.
Overview of Azure Data Factory
To become familiar with the data engineering process in Microsoft
Azure, we’ll need to begin with an overview of Azure Data Factory
(ADF), which is the Azure service for building data pipelines. The first
chapter will focus on conceptual discussions of how to build a process
to transform massive of amounts of data with many quality issues in
the cloud. Essentially, we need to redefine ETL for cloud-based big data,
where data volumes and veracity can change daily, and we’ll compare
and contrast the Azure mechanism for the modern data engineer with
traditional ETL. That’s where we’ll begin the process of building ETL
pipelines that will serve as the basis for your big data analytics projects.
I’m going to present a series of common use cases that will
demonstrate how to apply the concepts discussed in the earlier
chapters to practical ETL projects. From there, the focus will turn to a
deep dive on Mapping Data Flows and how to build ETL frameworks in
ADF by using the visual design-time interface to build code-free data
flows. Mapping Data Flows is primarily a code-free visual design
experience, so we’ll walk through techniques and best practices for
managing the software development life cycle of a data flow in ADF.
Data Factory provides many different means to process and transform
data that include coding and calling external compute processes.
However, in this book, the focus will be on building ETL pipelines in a
code-free style in Mapping Data Flows.
As you work your way through the early chapters in this book, you
should begin to develop an understanding of how to apply data
engineering principles in ADF and Mapping Data Flows. That’s where
we’ll begin to implement mechanisms to help organize your work and
design-time environment, preparing for eventual operationalization at
runtime. We’ll set up a Git repo for our work, as you should in real-life
scenarios. We’ll design interactive data transformation graphs using
serverless compute that can scale out as needed. You won’t need to
manage physical servers and clusters with ADF, but I will explain how
things work behind the scenes to provide this serverless compute
power for your pipelines. Behind the scenes, ADF will leverage the
Azure platform-as-a-service workflow engine Logic Apps for pipeline
execution and scheduling. The transformation engine for Mapping Data
Flows is Apache Spark. But you won’t have to learn anything about
those underlying dependent services. The Azure Integration Runtimes
will provide that compute for you dynamically in a serverless manner.
Operationalizing Data Pipelines
As you begin designing data flows for cloud-first big data workloads, we
will test and debug in nonproduction environments and then promote
that work to production environments. Execution of those jobs will be
performed via ADF data pipelines based on schedules. These chapters
will focus on operationalizing our work in a way that will become the
eventual automated ETL framework for your business analytics. A
complete end-to-end solution must also require monitoring and
management of these processes on an ongoing basis. The final chapters
will provide mechanisms in ADF that can be leveraged to monitor runs
over time and to examine the performance of your pipelines. Because
the nature of big data in the cloud is that the data will be messy and
ever-changing, it is important to establish alerts and handling for
schema and data drift. I’ll explain how to add fail-safe mechanisms,
monitoring, and traps for these common problems so that your data
pipelines can execute continuously. The frameworks needed for design,
debug, schedule, monitor, and manage are all contained inside of ADF,
and we’ll spend time digging into each one of those areas.
Goal for the Book
My goal is that by the end of this book, you’ll be able to apply the
concepts and the patterns presented here to build ETL pipelines for
your next big data analytics project in the cloud. By mapping these new,
updated approaches to processing data for analytics (a.k.a. big data
analytics) to the world of traditional ETL processing that you are
already familiar with, you will be able to use Azure Data Factory and
Mapping Data Flows to provide your business with analytics that will
result in making better business decisions. Many of the patterns and
practices in this book can be applied directly to your projects where
you are beginning to build cloud-first data projects in Azure. You can
use these techniques to begin building a new set of reusable common
ETL patterns. As you work your way through the progression of this
book’s chapters, you’ll build upon the lessons learned in each chapter
with the goal of having all of the necessary lessons learned to begin
building your own big data analytics ETL solution natively in the cloud
using Azure Data Factory with Mapping Data Flows. So welcome, and I
hope you find this book helpful as you begin building powerful ETL
solutions in the cloud!
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub
(https://github.com/Apress). For more detailed information, please
visit http://www.apress.com/source-code.
Table of Contents
Part I: Getting Started with Azure Data Factory and Mapping Data
Flows
Chapter 1:​ETL for the Cloud Data Engineer
General ETL Process
Differences in Cloud-Based ETL
Data Drift
Landing the Refined Data
Typical SDLC
Summary
Chapter 2:​Introduction to Azure Data Factory
What Is Azure Data Factory?​
Factory Resources
Pipelines
Activities
Triggers
Mapping Data Flows
Linked Services
Datasets
Azure Integration Runtime
Self-Hosted Integration Runtime
Elements of a Pipeline
Pipeline Execution
Pipeline Triggers
Pipeline Monitoring
Summary
Chapter 3:​Introduction to Mapping Data Flows
Getting Started
Design Surface
Connector Lines and Reference Lines
Repositioning Nodes
Data Flow Script
Transformation Primitives
Multiple Inputs/​
Outputs
Schema Modifier
Formatters
Row Modifier
Flowlets
Destination
Expression language
Functions
Input Schema
Parameters
Cached Lookup
Locals
Data Preview
Manage Compute Environment from Azure IR
Debugging from the Data Flow Surface
Debugging from Pipeline
Summary
Part II: Designing Scalable ETL Jobs with ADF Mapping Data Flows
Chapter 4:​Build Your First ETL Pipeline in ADF
Scenario
Data Quality
Task 1:​Start with a New Data Flow
Task 2:​Metadata Checker
Task 3:​Add Asserts for Data Validation
Task 4:​Filter Out NULLs
Task 5:​Create Full Address Field
Final Step:​Land the Data As Parquet in the Data Lake
Summary
Chapter 5:​Common ETL Pipeline Practices in ADF with Mapping
Data Flows
Task 1:​Create a New Pipeline
Task 2:​Debug the Pipeline
Task 3:​Evaluate Execution Plan
Task 4:​Evaluate Results
Task 5:​Prepare Pipeline for Operational Deployment
Summary
Chapter 6:​Slowly Changing Dimensions
Building a Slowly Changing Dimension Pattern in Mapping Data
Flows
Data Sources
NewProducts
ExistingProducts​
Cached Lookup
Create Cache
Create Row Hashes
Surrogate Key Generation
Check for Existing Dimension Members
Set Dimension Properties
Bring the Streams Together
Prepare Data for Writing to Database
Summary
Chapter 7:​Data Deduplication
The Need for Data Deduplication
Type 1:​Distinct Rows
Type 2:​Fuzzy Matching
Column Pattern Matching
Self-Join
Match Scoring
Scoring Your Data for Duplication Evaluation
Turn the Data Flow into a Reusable Flowlet
Debugging a Flowlet
Summary
Chapter 8:​Mapping Data Flow Advanced Topics
Working with Complex Data Types
Hierarchical Structures
Arrays
Maps
Data Lake File Formats
Parquet
Delta Lake
Optimized Row Columnar
Avro
JSON and Delimited Text
Data Flow Script
Summary
Part III: Operationalize Your ETL Data Pipelines
Chapter 9:​Basics of CI/​
CD and Pipeline Scheduling
Configure Git
New Factory
Existing Factory
Branching
Publish Changes
Pipeline Scheduling
Debug Run
Trigger Now
Schedule Trigger
Tumbling Window Trigger
Storage Events Trigger
Custom Events Trigger
Summary
Chapter 10:​Monitor, Manage, and Optimize
Monitoring Your Jobs
Error Row Handling
Partitioning Strategies
Optimizing Integration Runtimes
Compute Settings
Time to Live (TTL)
Iterating over Files
Parameterizing
Pipeline Parameters
Data Flow Parameters
Late Binding
Data Profiling
Mapping Data Flow Statistics
Data Preview Statistics
Profile Stats
Power Query Activity
Transformation Optimization
byName( ) and byNames( )
Rank and Surrogate Key
Sorting
Database Queries
Joins and Lookups
Pipeline Optimizations for Data Flow Activity
Run in Parallel
Logging Level
Database Staging
Summary
Index
About the Author
Mark Kromer
has been in the data analytics product space for over 20 years and is
currently a Principal Program Manager for Microsoft’s Azure data
integration products. Mark often writes and speaks on big data
analytics and data analytics and was an engineering architect and
product manager for Oracle, Pentaho, AT&T, and Databricks prior to
Microsoft Azure.
About the Technical Reviewer
Andy Leonard
is a husband, dad, and grandfather;
creator of – and Data Philosopher at –
DILM Suite for Data Integration Lifecycle
Management (dilmsuite.com); a blogger
(andyleonard.blog); founder and Chief
Data Engineer at Enterprise Data &​
Analytics (entdna.com); an SSIS and
Azure Data Factory trainer, consultant,
and developer; a SQL Server database
and data warehouse developer; and an
author, mentor, engineer, and farmer.
Part I
Getting Started with Azure Data Factory
and Mapping Data Flows
(1)
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2022
M. Kromer, Mapping Data Flows in Azure Data Factory
https://doi.org/10.1007/978-1-4842-8612-8_1
1. ETL for the Cloud Data Engineer
Mark Kromer1
SNOHOMISH, WA, USA
In the modern business data ecosystem, “digital transformation” is one
of the most prominently used terms to describe the transformation of
traditional technology practices to cloud and big data approaches. The
term has become a ubiquitous term in IT and has come to represent the
embrace of cloud and big data technologies in the data engineering
world.
The data part of this digital business transformation puts data
engineers at the center of the data processing value chain. What data
engineers are challenged with is how to find a way to effectively extract,
transform, and load massive amounts of new data points that are often
unwieldy in nature. That means that we have to update our ETL
processes to meet these new cloud-first big data approaches. Digital
transformation is crucial for the success of businesses to compete and
grow in today’s cloud-first IT strategies , so let’s dig into how to adjust
and build comparable solutions in Azure using ADF and Mapping Data
Flows.
General ETL Process
Figure 1-1 is an example of a general ETL process from traditional on-
premises projects where your sources are highly governed source data
like data that originates from SAP, database tables, and file extracts that
abide by well-known contracts.
Figure 1-1 Traditional ETL general process
As a data engineer working on cloud-first projects in Microsoft
Azure, you’ll employ a process similar to the diagram in Figure 1-2,
which only differs slightly from the concepts shown in Figure 1-1. But
the details in each step bring about a significant amount of change that
will be the topic of the ADF-specific chapters to come. At the end of the
day, the objective of preparing data for business decision makers, who
will use business intelligence tools, SQL queries, Excel, data science
tools , and other decision-oriented tooling, is no different than you see
in traditional on-premises scenarios with highly curated data sources
and targets.
Figure 1-2 A general example of the ETL process in Azure
The consumers of the analytics in both of these instances are
analysts who are building reports where actual business value is
derived for business decision makers. For the data to be useful, the data
engineers, data scientists, and citizen data integrators must contribute
in a governed way to refining raw data into business-friendly models
for exploration and reporting.
Differences in Cloud-Based ETL
We’ll need to have a common understanding of what we are achieving
in this book, so let’s dive into this process in detail and identify some of
the differences in cloud-based ETL in Azure from similar traditional on-
premises ETL projects:
1. Raw data
a. Much of the data extraction in big data cloud ETL will be of
unknown quality and can change shape and size dramatically
between job executions. In ADF, we’ll make use of the Copy
Activity and Data Flow Activity connectors, linked services, and
datasets. In traditional data warehouse scenarios, you may have
found that all of your business data resides on-premises and
inside the network confines of your business. Additionally,
often that data has been curated and already refined through a
data quality process. Do not make such assumptions about data
that you’ll land in the data lake. The details of the different ADF
components will come in the next set of chapters.
2. Staging layer
a. This is where we will land an initial snapshot, lightly
transformed, version of the source data in a landing zone in the
data lake. For most of the demo scenarios in the book, we’ll land
the data in Azure Data Lake Store Gen2 (ADLS Gen2 or simply
ADLS). If you’ve previously designed data warehouses with an
ODS model or used database tables as staging tables, you can
equate the staging layer in the data lake as an analogy. Because
the data volumes are expected to be very large here, we will
implement incremental data loading patterns in ADF rather
than attempt to extract the entire set of data every time.
3. Transform
a. This is where we will spend a lot of our time and attention in
this book using the code-free Mapping Data Flows feature in
ADF. We’ll build data flows that will perform all different types
of data transformations to prepare the data for consumption by
our target users. We’ll derive columns, aggregate data, and
design slowly changing dimension handlers and many more
exciting data transformations. A key difference you may find in
the transformation layer from traditional ETL projects is that
the data will not always be tabular and relational in shape.
Rather than expecting to receive database table connections
and CSV files, we will need to work with big data native file
formats like Parquet, Avro, JSON, and ORC. That can make
transformations tricky when you begin to work with arrays,
maps, structures, and hierarchies.
4. Serving layer
a. The serving layer is going to be a data store that is generally a
database like Azure SQL Database or Cosmos DB. You will also
often use an analytical database like Azure Synapse, Snowflake,
or other database targets. Another option here is to simply
leave the data in ADLS but utilize Delta Lake folders as a way to
organize your data and provide CRUD operations on your
analytical data. We’ll talk about all of these options in the book
including the benefits of both in terms of cost and effectiveness
for consumption by business users.
5. Presentation layer
a. As mentioned earlier, this is where the business users live and
how they will access the refined data to make business
decisions. Business intelligence tools will utilize the resulting
models from the ETL process and create reports and
dashboards. The end-user interaction with the resulting data
does not change dramatically with modern data approaches to
ETL. However, you should keep the requirements in mind in
terms of what BI tools will be used. Not all BI tools and
business-decision tools can read and work with data in the lake
or data stored in formats like Parquet and Delta Lake.
6. Orchestrate and monitor data pipelines in ADF
a. When thinking about a scalable framework to build and manage
complex ETL jobs, it is critical to consider operationalization
requirements. In this diagram, I particularly call out
orchestration of the pipelines and monitoring of the pipelines.
The orchestration piece is not specific just to ADF, but I will
only reference ADF techniques in this book. There are many
underlying facilities to orchestration that we’ll need to touch on
that are very important. For example, scheduling jobs,
managing the software development life cycle, version control,
CI/CD, and more that we’ll dive into in later chapters. With
common legacy ETL tools, you should already have most of
these capabilities. I believe, however, that providing a level of
governance to the big data cloud world is even more important
because the modern data estate environment can be much
more complex than traditional environments. After your
pipelines have been scheduled, you need a mechanism to
monitor the health of your ETL jobs. We’ll walk through setting
up alerts and day-to-day monitoring of tasks in ADF later.
Let’s dig into each area of the ETL process, starting with the raw
data. In modern big data cloud-first data ecosystems , raw data is going
to be quite varied and will range from traditional relational database
tables with well-defined schemas to raw JSON files with changing
properties. You should always expect the unexpected and design your
data extraction logic defensively. You may choose to tell ADF to fail your
pipeline when attributes or data domains are not within a specified set
of constraints. Or you can utilize the built-in concepts of “schema drift”
and “data drift” to create a more resilient pipeline that evolves with
changing source data. Schema drift occurs when the expected data
schema evolves unexpectedly by adding new columns, removing
columns, or changing columns. In ADF, you can switch on schema drift
handling very easily, and that will tell ADF to accept new or evolving
columns. This handling of evolving source data creates a very resilient
pattern where your ETL processes will not fail because new columns
have been detected. However, it can also hide underlying issues with
the source data that you may wish to tag as data quality errors. You will
need to make that architectural decision to either fail when the
incoming schema breaks the existing contract or continue processing.
Data Drift
Similar to metadata schema drift , data drift occurs when values inside
of existing columns begin to arrive outside of a set domain or
boundaries. In ADF, you can establish “Assert” expectations that define
data ranges. When those domains or ranges of metadata rules are
breached in the data, you can fail the job or tag the rows as data quality
errors and make downstream decisions on how to handle those errors.
For example, you can decide to output an alert, redirect the rows to an
error log, or simply ignore the failures and continue processing.
The staging layer is where you will land data from sources into the
data lake. In the past, you may have used temporary tables in a
database as the staging area, where you would quickly land raw data
without transformation. Within Azure, we’re going to use ADLS Gen2
using that same analogy of staging data. You are going to land your data
into ADLS “Containers,” which is where you’ll define your folder
strategy. In big data storage, folders are very important because they
can be used by runtimes like Spark to define file partition strategies. A
very common methodology to employ is to create folders based on
dates. For example, create a folder structure like this to store raw data
of employee data:
MyContainer/RawData/Emp/YYYY/YYYYMM.
The format of that folder structure inside of your Azure Data Lake
would look like Figure 1-3.
Figure 1-3 Example folder structure
Folder partitioning can help with carving out portions of the lake for
incremental processing and for partition elimination at query time,
improving performance of the Spark engine, which is the execution
engine that we’ll use in ADF for data transformation. Another common
method to optimize your data lake folders for processing is to use
key/value pairs to store unique values in your data as folders with data
residing in the leaf-level Parquet file as in Figure 1-4.
Figure 1-4 Key value folder partitioning
In the earlier example, my output data contains the columns
“releaseyear” and “month”. I’ve created a folder for every unique
“releaseyear” and every unique “month” value in my data using the
format of releaseyear=yyyy/month=mm. The files residing at the
leaf level in that folder structure is Parquet format and, in this
particular example, has a friendly name of moviesoutnew.parquet.
But you cannot assume that files written by ADF and Spark, generally,
are going to use readable names like that. In fact, in most cases, it is
much more optimal to allow Spark to write the file name based on the
job process ID. Don’t be surprised to find many files with GUID names
in your folders after executing your data pipelines. Throughout this
book, we’ll use samples that will output partitioned Parquet folders,
and we’ll configure ADF to automatically create that folder structure.
The transform layer is the topic we’ll focus on in depth in the
coming chapters. This is where your data transformation logic will
reside. In later chapters, we’ll design code-free graphs that will perform
common ETL operations like slowly changing dimensions, data
cleansing, aggregations, fact loading, and data preparation. Those
patterns are common throughout the history of ETL and data
engineering that we’ll update for the modern data landscape. In this
book, we’ll touch on data partitioning strategies, pushdown
optimizations, cluster distributions, and other topics specific to big data
in the cloud, including making use of Parquet data formats. Parquet is a
columnar, highly compressed file format that is very efficient when
used for analytics with Spark, and you’ll come across this format
throughout the book. But for the purposes of ETL in ADF Mapping Data
Flows, assume Parquet will be your default format you’ll land your data
results as in the lake when using ADF Mapping Data Flows.
Landing the Refined Data
Now that your data has been prepped, cleaned, and transformed, you
will land the refined data into an analytical data store to make it
available to your end users. This is known as serving layer (not server
layer ), and typically a database is utilized in this layer as the data store.
The biggest change here is that the traditional relational database may
be replaced by a cloud-first database, a NoSQL data store, or even just
files in the lake. This is where the partitioned Parquet folder techniques
listed earlier come into focus. The serving layer can remain a data lake
with a computation engine (likely SQL based) serving queries to the
presentation layer.
Now that all of the hard work of the data engineer is complete and
the ETL project has been established, we reach the top of the analytics
value chain: the presentation layer. The data engineer has successfully
performed the ETL process of refining raw data into consumable
business data for decision makers and analysts, who will utilize tools
like Power BI, Excel, Looker, etc., to build reports and dashboards with
business metrics and KPIs. Another important audience for analytics in
these scenarios will be data scientist. They may use tools to build data
models, additional data wrangling, and data exploration using Jupyter
Notebooks, SQL queries, or data wrangling tools. Both are target
personas for the analytics that you have generated from your ETL jobs.
Typical SDLC
Let’s take a look at what the software development life cycle (SDLC)
looks like for a typical ETL project with ADF in Figure 1-5.
Figure 1-5 SDLC for ADF ETL pipeline projects
We’ll walk through configurations needed to connect your data
factory to Azure DevOps for Git support in later chapters. But for this
conceptual discussion, just focus on the distinct steps that you should
follow to produce quality ETL jobs that meet your user requirements.
1. Gather business requirements.
a. Where do you start with an ETL project? Start by talking with
your end users, the business analysts, and data scientists
represented in the presentation layer earlier. Essentially, you’ll
want to deeply understand the consumers of your refined data
results and understand the analytics that they need to drive the
business with their reports. Ask what data is important to
making the right decisions and building the best models.
Discuss ways to aggregate complex data and summarize it into
business semantics. Then begin tracking down the sources of
the data points you’ll need to lock to provide the results they’re
looking for. This exercise should result in a list of required data
for your ETL jobs to produce as well as a list of the data sources
and access credentials required to get to the source data. Once
you’ve listed all of the sources required and the analytical
results you need to produce, you’re ready to start designing.
2. Design ETL pipelines in new Git branch.
a. We’ll walk through building a new branch in Git from ADF later.
But you can think of this as your first step on a new ADF
project. You’ll work from a new branch as your sandbox
environment. Never develop new pipelines against the live ADF
service or from an existing branch. You risk losing work and
damaging existing, working code. Now this is where the fun
begins! We’ll talk in detail soon about how to build code-free
graphs for data transformation pipelines.
3. Unit test, debug, user acceptance testing.
a. Testing your pipelines before release to your production factory
is a critical element in an ETL project . It is also another great
reason to leverage Git in your ADF project so that you can have
a factory that is in a separate development branch, making it
much easier to test before deploying to production. We’ll talk
about testing strategies, debugging, previewing results, and
other important factors in this step.
4. Publish from main branch.
a. After all tests have passed, your next step is to deploy your new
pipelines to production. You’ll merge your current development
branch into a collaboration branch and publish the updates to
the live ADF service from your main branch.
5. Operationalize and monitor.
a. The last set of operations to undertake will include setting a
schedule for your pipeline and monitoring the results. We’ll
walk through different types of schedules that most effectively
meet the update cadence uncovered by your business
requirements step. Then we’ll set alerts for pipeline failures and
check the status of our ETL jobs over time.
Summary
We began our journey by learning the fundamentals of ETL in the cloud
for data engineers with Azure Data Factory’s Mapping Data Flows. Now
that we have a clear understanding of the ETL process in Azure, let’s
begin diving into Azure Data Factory and apply these principles to our
first factory.
(1)
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2022
M. Kromer, Mapping Data Flows in Azure Data Factory
https://doi.org/10.1007/978-1-4842-8612-8_2
2. Introduction to Azure Data Factory
Mark Kromer1
SNOHOMISH, WA, USA
Azure Data Factory is the Microsoft Azure cloud service for data
engineers for building, scheduling, and executing data integration and
extract, transform, and load (ETL) processes . In this chapter, we’ll focus
on how to build cloud-first ETL projects using ADF with ADF’s Mapping
Data Flows code-free data transformation features.
What Is Azure Data Factory?
But first we need to start with a fundamental overview of the ADF
service and its components. It is important to have an understanding
first of the ADF UI before we begin. You will need to first have an Azure
subscription and follow the steps needed to create a new data factory
from the Azure portal. We’ll walk through those steps at the end of the
book when we create a sample project. For now, let’s start by looking at
the primary high-level concepts in ADF and become a bit more familiar
with the pipeline designer as shown in Figure 2-1.
Figure 2-1 The ADF web-based user interface
Let’s dig into each one of these high-level concepts in more detail.
Figure 2-2 introduces you to each of the data factory resources , which
will be described next.
Figure 2-2 Azure Data Factory concepts
Factory Resources
The resource explorer on the far left of the ADF UI represents a list of
each of the top-level artifacts in an ADF factory. We’re going to skip over
discussions of the Power Query and Templates high-level artifacts in
this book. Instead, let’s talk a bit about each of the primary ADF
artifacts that are important for building ETL jobs.
Pipelines
The primary unit of work in ADF is pipelines . Pipelines drive all of the
actions that your data integration and ETL jobs perform. A pipeline is
essentially a collection of activities that you connect together in a
meaningful pattern to create a workflow. All actions in ADF are
scheduled through a pipeline execution, including the Mapping Data
Flows that we’ll build in this book.
Activities
Pipelines are constructed from individual activities. Activities define
the individual action that you wish to perform. There are many
activities that you can use to compose a pipeline. Examples of activities
include copying data (Copy activity), transforming data (Mapping Data
Flows), “For Each,” “If Then,” and other control flow activities. You can
also call out to external compute activities like Databricks Notebook
and Azure Functions to execute custom code. For this book, we’re going
to focus on the data flow activity and building Mapping Data Flows for
ETL jobs.
Triggers
Triggers allow you to set the conditions for your pipeline to execute.
You can create schedule triggers, tumbling window, storage events, and
custom events. The most common is schedule triggers, which allow you
to set the execute frequency and times for your pipeline trigger.
Tumbling window allows for time intervals. ADF will establish
windows of time for the recurrence that you choose starting on the date
that you choose. Storage events will allow you to trigger your pipeline
when a file arrives or is deleted from a storage account. And the final
type is custom event triggers. You can create custom topics in Azure
Event Grid and then subscribe to those events. When a specific event is
received by your custom event trigger, your pipeline will be triggered
automatically.
Mapping Data Flows
This is the code-free data transformation feature that we’ll focus on for
the rest of this book. Mapping Data Flows has its own browser designer
that will open when you create a new data flow. This is where we’ll
design data transformation graphs and then execute the data flow from
a pipeline. You execute your data flow from a pipeline by adding the
data flow activity to your pipeline and then choose which data flow to
execute.
Linked Services
You will use linked services to store credentials, location, and
authentication mechanisms to connect to your data. Linked services are
used by datasets and activities in ADF pipelines so that it can be
determined where and how to connect to your data. You can share
linked service definitions across objects in your factory.
Datasets
Datasets define the shape of your data. In ADF, datasets do not contain
or hold any data. Instead, they point to the data and provide ADF
information about the schema for your data. In ADF, your data does not
require schema. You can work with data in a schema-less manner.
When you build ETL jobs using schema-less datasets, you will build
data flows that are known as “late binding” and working with “schema
drift.” It is a very powerful and flexible concept that we’ll talk about
later and means that your dataset is not required to hold a specific
schema at all.
Azure Integration Runtime
Throughout the book, I’ll refer to the Azure Integration Runtime as the
Azure IR or sometimes simply as IR. This is a configuration object
stored in the ADF metastore that defines the location and type of
compute that you’ll use for parts of your pipeline that require
computation. This can mean VMs for copying data, executing SSIS (SQL
Server Integration Services) packages, or cluster size and type for
Mapping Data Flows. We’re not going to talk about SSIS in this book, but
it is a very powerful feature in ADF. Basically, you can take your existing
SSIS packages from SQL Server and execute them in the cloud using an
ADF pipeline. The SSIS Integration Runtime provides the SSIS compute
on VMs in a fully managed environment.
The Azure IR also has a Vnet option that allows you to execute your
pipelines using compute resources that are inside protected networks.
This is a very good option if you are working in a highly regulated
industry or your corporate network policies require all services to be
Vnets. ADF is a fully managed platform-as-a-service (PaaS) offering, so
you do not manage any servers. Since the integration runtimes are the
mechanism defining the compute you wish to use for pipeline and data
flow execution, this is where you can specify that you need to execute in
a protected network. Mapping Data Flows, where we will spend a lot of
time digging into in this book, execute on the Spark compute that you
specify in the Azure IR. When we get to building our first pipeline with
data flows, we’ll talk about optimizations and details of the IR.
The Azure IR is a fully serverless managed microservice inside ADF
that runs in the cloud. However, you can also configure the networking
to connect to your on-premises data sources by peering your network
to the Vnet created for your Azure IR. When executing data flow
activities in an ADF pipeline, you can use that technique if your data is
not in the cloud and not in Azure.
Self-Hosted Integration Runtime
Another approach to executing ADF pipeline activities in a private
network or to connect to on-premises data is by using the self-hosted
integration runtime or SHIR. This is a software download that you will
install on-premises or on a virtual machine that has visibility to your
data. ADF will communicate with the SHIR in order to provide access to
data in your data center . Self-hosted IR is not supported by Mapping
Data Flows, so instead, you’ll use the Vnet option in the Azure IR
mentioned earlier. Management of all IRs is located in the manage
section of the ADF UI left-hand navigation panel (see Figure 2-3).
Figure 2-3 Management screen for Integration Runtimes
Elements of a Pipeline
The ADF pipeline is the most fundamentally important artifact in your
factory, so let’s dig into a pipeline first. In Figure 2-4, you’ll see an
example of a very simple pipeline. It is made up of five activities each
interconnected with directional edges in green. The Data Flow activity
has both a green and red connector emanating from it. ADF will take
the red path if there is a failure from the result of the activity execution,
and the green path signifies success. The flow of execution in an ADF
pipeline is left to right. If you add activities without connecting lines,
those disconnected activities will execute in parallel at the same time as
the first node in your connected graph.
Figure 2-4 Sample ADF pipeline
This sample is a pipeline that will call a data flow to process type 2
slowly changing dimensions (SCDs) in a for each loop. The get metadata
activity at the start of the pipeline is using a dataset called
“genericfolder ” (Figure 2-5). This dataset points to a folder in my Azure
Blob Store and will loop through each of the files to process different
files for each dimension in the target analytical model.
Figure 2-5 Dataset called “genericfolder”
The definition of the blob storage account is stored in the Linked
Service property of the dataset. In the linked service settings, you set
the authentication method and provide credentials for the dataset to
use when connecting to your data. In my example, “AzureBlobStorage1
”, I’m using the account key authentication method in the linked service
(see Figure 2-6).
Figure 2-6 Linked service
Note that in the dataset “genericfolder ” in Figure 2-5, I am pointing
to a folder in my blob store located at mycontainer/SampleData.
This will tell ADF to find all files in that folder and return the list to the
get metadata activity. That metadata will now be available to the next
activity in the pipeline, which is a For Each activity.
The For Each iterates over each item in a collection. In this case, the
collection will be the list of files found in the folder from the get
metadata activity . To set an iterator inside the For Each, reference the
name of the for each activity and access the output.childItems
array from the activity: @activity('Get
files').output.childItems. That will contain the list of files
from the dataset folder.
The formulas you write in the ADF pipeline expression editor are
known as pipeline expression language (
https://docs.microsoft.com/azure/data-
factory/control-flow-expression-language-functions
). To enter expressions, click on the “Add dynamic content” link next to
properties and fields in the ADF pipeline designer that allows for
custom expressions. The expression editor will slide in (see Figure 2-7),
which is where you can enter your expression. To enter the earlier
expression for the for each iterator, click on Settings ➤ Items in the get
metadata activity settings panel.
Figure 2-7 Pipeline expression editor
Figure 2-8 Get metadata settings
Now that the For Each is connected to the get metadata activity
(Figure 2-8), you can add activities inside of the for each. The activities
will execute one time for each item in the list. Inside of the for each is a
data flow activity (Figure 2-9) that executes the first of two Mapping
Data Flows. This first data flow will clean and prep the data from each
of the files. You’ll learn how to build data cleaning, data quality, and
data prep data flows later in this book. To add activities to the For Each,
you can double-click on the node in the pipeline graph.
Discovering Diverse Content Through
Random Scribd Documents
after returning home met an accidental death by a falling tree. The
ancestral home of the Needhams is near Frankfort, Ky. The Gundy
family is held in high esteem in their home county and the members
of the family are well respected by their friends and acquaintances.
Charles T. Gundy was educated in the rural schools and
attended the Memphis Academy for one year. Circumstances were
such that he found it necessary to do considerable studying at home
and “burned the midnight oil” in the pursuit of an education. He
fitted himself for teaching and taught for four years in the schools of
his native county. In the meantime he read law and was successful in
being admitted to the bar in 1902. For three years thereafter he
practiced his profession in Memphis. He then secured a Government
position in the postoffice department at Washington, D. C., and
pursued his law studies in the National University at Washington. He
graduated from that institution May 30, 1908. Having small desire to
become a mere cog in a great machine, as seemed to be the lot of
thousands of Government employes, he resigned his position in
October of the same year and located in Keokuk, Iowa, and had
charge of the farm loan department of the State Central Savings
Bank. He resigned this position in March of 1910 and came to
Atchison, opening an office in the Auld building on Commercial
street. Since this time he has built up an excellent practice. He was
appointed city judge in December of 1910 to fill a vacancy caused by
the resignation of Judge J. P. Adams. He was elected to the office in
1912 and again elected in 1914.
Judge Gundy was united in marriage with Eleanor M.
McCormick on August 12, 1909. Mrs. Gundy was a resident of
Washington, D. C., and is a daughter of John McCormick, who died
in 1905. Judge Gundy is a member of the Baptist church and he and
Mrs. Gundy have a wide circle of friends who esteem them for their
many likable qualities.
The Republican party has always claimed the allegiance of Judge
Gundy and he takes an active and influential interest in political
affairs.
LOUIS R. KUEHNHOFF.
Louis R. Kuehnhoff, farmer and stockman, of Lancaster
township, Atchison county, Kansas, was born January 1, 1880, on the
farm where he now resides. He is a son of Charles and Caroline
Kuehnhoff, and is one of nine children, six of whom are living. The
father was born in Germany in 1841, and left there when a boy of
sixteen years and sailed for New York. He remained there a short
time when he went west, arriving at St. Joseph, Mo. He had not been
there very long when the Civil war broke out and he enlisted at St.
Joseph in Company B of the Volunteer infantry. After the war was
over he was mustered out at Lexington, Mo., having won a
praiseworthy military record in his country’s service. He then
returned to civil life in St. Joseph, Mo., where he worked for a time
as a laborer, receiving eight dollars a month. Shortly afterward he
came to Atchison county, Kansas, and bought eighty acres of land in
section 10, Lancaster township. Using oxen, he broke the ground on
his newly acquired farm and began to improve it as rapidly as his
resources would permit. In 1894 he retired and went to live at the
National Soldiers’ Home at Leavenworth, Kan., where he died in
1903. The mother was born in Germany in 1845, and died in 1899.
Louis R. Kuehnhoff grew up on his father’s farm, and attended
Eden district school, and also District No. 3, Lancaster township. He
remained at home until he was nineteen years of age, and the next
five years worked as a farm hand, and then he bought the old home
place of 200 acres. Louis Kuehnhoff is an industrious worker. He
keeps graded stock of all kinds and takes a special interest in fine
mules. He always attends the county fairs in Atchison county and
occasionally makes entries. On April 26, 1905, he was married to
Lena Werner, who was born in Germany November 2, 1881. Her
parents were John and Marie (Earhart) Werner. The father was born
in Germany in 1815. He belonged to the Masonic lodge in Germany.
In 1889, when he was quite an old man, he came to America and
settled at Leavenworth, where he died in 1891. The mother was born
in Germany January 17, 1843, and is now living with her children, of
whom there are six, as follows: Adam, teamster, Leavenworth, Kan.;
Martha Nolan, deceased; Lizzie Loman, Bowling, Kan.; Katherine
Weimer, Wallula, Wyandotte county, Kansas; Lena, wife of Mr.
Kuehnhoff, of this review. Mrs. Kuehnhoff attended the Pleasant
Ridge school and the German school, north of Potter, Kan. She is a
good, loyal, hard-working mother, and has three children: Marie,
Edna and Edwin. The last two are twins and are three years old. In
politics Mr. Kuehnhoff is independent. He is a member of the
Independent Order of Odd Fellows. He is a progressive farmer and is
constantly on the lookout for improvements in agricultural methods.
He has a fine eight-room house and a large barn equipped with
modern conveniences. He also has a stone milk-house which was
built by his father years ago. He has a small but thriving orchard and
has twelve head of fine cattle. Besides these, he has four horses and a
span of excellent mules. Mr. Kuehnhoff takes a lively interest in his
stock and in his farm generally.
BENJAMIN FRANKLIN SANDERS.
All honor to the pioneer settlers of Kansas. It was they who
broke the way in the unpeopled wilderness and endured the
hardships and privations on the frontier of advancing civilization in
order that the path of empire might be pushed steadily westward,
ever onward toward the setting sun. Their work is done; the halcyon
pioneer days when this broad land was but a vast unbroken
wilderness of waving prairie grass, dotted here and there with belts
of timber along the streams, is no more; towns and cities have
sprung up; the locomotive shrieks its way over the ribbon-like rails,
hauling the products of the land to the millions in need of
sustenance, where once the hardy freighters drove their mule teams
and guarded the precious freight overland to the homes of the
settlers in the West. Benjamin Franklin Sanders is one of the few
remaining members of the “old guard,” who sixty years ago began the
task of reclaiming a wilderness. He is one of the ranking old pioneer
settlers of Atchison county and has lived a record which is thrilling
and interesting to a high degree. He is the only living “ye old time
fiddler” in Atchison county, who with his comrade was wont to play
at the old-time dances and “hoe downs” in northeast Kansas fifty
years and more ago.
Benjamin Franklin Sanders is now living retired in Center
township, Atchison county. He was born August 8, 1833, in Franklin
county, Missouri, and is a son of George and Elizabeth (Graham)
Sanders, who were the parents of the following children: Nancy
married William McQuillan, and by her second marriage became
Mrs. William Burns, and died in Benton county, Missouri; Robert,
deceased; Oliver died in Jewell county, Kansas; Lydia married Fred
Wilming, and died in Shannon township, Atchison county; William
died in Franklin county, Missouri; and Benjamin, the subject of this
sketch. Benjamin F. Sanders was sent to the country school in
Franklin county, Missouri, but the school was poor and the roads
were bad in the winter time, and, altogether, he had little
opportunity to learn. His whole time in school, he estimates, did not
amount to more than three months. His father was a Kentuckian and
followed farming all of his life, and died in 1856, at the age of fifty-
five years. The mother was a native of Missouri and of Scotch
descent. She died in Kansas, in 1872, at the age of seventy-six years.
B. F. Sanders
B. F. Sanders and His
Great-Granddaughter,
Gail Maxine Keirns,
Daughter of Mr. and
Mrs. Art Keirns.
At the age of twelve Benjamin F. Sanders was apprenticed to a
carriage and wagon-maker in St. Louis, Mo. He remained there
twelve years, coming to Kansas in 1856. He returned to Missouri for
a short time and then came back to Kansas the following year. He
opened a wagon-maker’s shop at Monrovia, Atchison county, which
he operated for two years. He then engaged in farming, taking up a
claim near where Effingham now stands. This was ten miles from
any settlement then and Mr. Sanders, fearing that the district would
not be settled, gave up his claim and preëmpted eighty acres one and
one-half miles north of where he now lives, in Center township, and
began his life as a real farmer. He hired a man from Iowa who had
six yoke of oxen to break up his land. He lived in the most primitive
way during the first years on this place. Coffee, for one thing, was
very high in price at that time, and there also was very little money in
the territory, so a substitute for coffee was used. They mixed wheat
and rye, calling it essence of coffee, and used this as a beverage in
place of the regular coffee. It was the same way with flour. When he
needed flour he would take a quantity of wheat to the gristmill where
it would be ground into coarse flour, nearest mills being at Valley
Falls and Kickapoo. His nearest postoffice was at Oceana, just north
of Pardee, where the postoffice was located later. In 1860 Mr.
Sanders bought more land. At one time he owned as high as 400
acres of land in Center township, Atchison county, Kansas. He went
through the whole evolution of civilization, beginning in a little log
house on his first eighty acres of land and passed through the wild
days of the border war. In 1863 he was a member of Captain
Whittaker’s company of Colonel McQuigg’s regiment of the Kansas
State militia. He participated in several skirmishes and was
honorably discharged at Ft. Leavenworth in 1864.
In 1859 Mr. Sanders married Margaret Ramsey in Putnam
county, Ohio, who was born in 1840. She was a daughter of John and
Elizabeth (Dorothy) Ramsey, natives of Ohio. She died in 1868,
leaving the following children: Ira, farmer, Whiting, Kan.; Bertha
(Mrs. C. G. Moore), deceased; William and Little Joy, both deceased.
Mr. Sanders was married a second time in 1870 to Mrs. Elizabeth
(Ramsey) Keirns, a sister of his first wife. She died in May, 1904. She
was the widow of Rufus Keirns, and by her last marriage three
children were born: Henry R., farmer, Pardee, Kan.; Mrs. Etta C.
Browne, Pardee, Kan.; Benjamin, Jr., died when seventeen years of
age.
Mr. Sanders is a Republican and a member of the Methodist
Episcopal church. He is now living with Arthur Keirns, a son of his
step-son. In these days his life is rather quiet compared with the
early-day existence which he passed through. Indians camped near
his farm when he first came to Kansas. The trail to the Kickapoo
reservation passed near his farm and the Indians were constantly
traveling back and forth along it. He has a hobby of “fiddling.” He
calls himself a “fiddler” in distinction from a violinist. He played at
the first corn carnival held in Atchison and won a prize. He used to
play with Samuel King, a well known “fiddler,” and they played for all
the old “hoe down” or “break down” dances. Although he is eighty-
three years old, he still plays his “fiddle” with as much vim as ever
and his ear is just as ready as it was when he was a young man. In
addition to being a farmer, Mr. Sanders has done a large amount of
carpenter work in Kansas. He has built a number of barns and other
buildings. Mr. Sanders was elected to the office of township trustee
and held the office two terms, having been reëlected at the close of
his first term.
KARL AUGUST KAMMER.
Karl August Kammer, farmer and stockman, Lancaster
township, Atchison county, Kansas, was born on the farm where he
now lives, October 12, 1869, and is a son of Karl and Joehanna
(Hida) Kammer. He is one of six children: Joehanna (Gutzman),
deceased; Emma (Fuhrman), Lancaster township; Karl, subject of
this sketch; Julius, Lancaster township; Bertha H. (Buttron),
Lancaster township; one child who died in infancy. The father was
born in Germany in 1840. Leaving there in 1862, he came to
Atchison county, Kansas, where he worked in a vineyard for two
years. The following four years he was employed in a brewery at
Atchison, and then farmed two years in Lancaster township. At that
time he had a chance to buy 160 acres in section 16 of Lancaster
township, and with the aid of a partner, the land was bought. He
built a one-room shanty and a thatched barn, and broke prairie with
the oxen and planted the first crop. Later a better house and barn
were built, and gradually, other improvements were added and a fine
orchard planted. At the time of his death, in October, 1910, Mr.
Kammer owned 240 acres of land. The mother was born in Germany,
February 20, 1840, and married in her native land just before
coming to America. She died in 1904.
Karl Kammer, the subject of this sketch, was reared on his
father’s farm in Lancaster township. He attended High Prairie
district school, No. 3, and remained on the home farm until he was
twenty-six years old, when he rented some land from his father, and
six years later he was able to buy the land he had been renting. He
improved the farm considerably and stocked it with graded cattle,
and now has an excellent farm, modern in every respect, consisting
of 160 acres of land, and also has a fine orchard of two acres.
Mr. Kammer was married October 23, 1895, to Emma Buttron, a
native of Lancaster township, Atchison county, born August 14, 1870.
She is a daughter of Henry and Rosa (Scheu) Buttron, the father a
native of Germany, born in 1833. When a young man he left his
native land and came to America, locating in Pennsylvania where he
worked as a blacksmith. From there he went to Elgin, Ill., and
continued at his trade, and in 1857, he moved to Atchison, Kan.,
following blacksmithing for a short time. He then preëmpted 160
acres of land in Lancaster township, where he built a house. The first
crop was destroyed by grasshoppers, and he was forced to return to
his trade during the following winter. When spring came, he went
back to his farm and that year was successful and his start was
assured. Mr. Buttron bought more land and continued to make
improvements, and after a long and prosperous career he died in
1914. Mr. and Mrs. Kammer are the parents of three children:
Katherine, Rosa and Henrietta, all living at home with their parents.
Mr. Kammer is a Republican, and is a member of the Independent
Order of Odd Fellows. Mr. and Mrs. Kammer and family are
members of the Evangelical Lutheran church of High Prairie
neighborhood.
MARSHALL J. CLOYES.
The demise of Marshall J. Cloyes May 5, 1915, marked the
passing of one of the sturdy figures who assisted in developing
Atchison county, and was one of the grand old men of the city. At the
time of his death he was probably the oldest living pioneer settler of
Atchison county, in point of age and years of residence in the county.
For over half a century he had been one of the well known and
distinguished characters whom people trusted and respected. In the
days when strong men were required to redeem a wilderness and
make it habitable for men and their progeny, Marshall Cloyes was
one of those who never gave up the fight. During the terrible drought
of 1860, when scores of families deserted their homes and left the
State, he and his family were among those who decided to remain
and win out over the vagaries of nature. His faith in the future of
Kansas was amply justified as the years rolled on and ever increasing
prosperity came to him and his, as a just and equitable reward for a
faith and confidence bestowed upon the new country during a time
which tried men’s souls and caused weaker mortals to give up the
fight.
He was born at Salisbury, Vt., October 24, 1826, and descended
from sturdy New England ancestry. His parents were Elijah and
Mary (Beach) Cloyes. On his father’s side his ancestry can be traced
back in the centuries to two brothers who settled in New England in
the seventeenth century. His grandfather was William Cloyes, who
fought for his country in the War of 1812. The boyhood days of
Marshall were spent in the town of Salisbury, where he attended the
public schools and later pursued his education in a private school. He
learned the trade of shoemaker but did not follow it to any great
extent. In 1847 he engaged in the lumber business at Ripton, Vt., and
was there for twelve years prior to coming to Kansas. From the town
in which he was born he came to Kansas, arriving here in Atchison
June 2, 1859. The following autumn his wife and sons followed him
and during the ensuing winter the family lived in a two room hut, on
the rear of the lots where Mrs. Jacob Leu’s residence now stands. On
February 21, 1860, they loaded all their goods in a wagon, and with
an ox team moved to a farm north of Lancaster. During the night an
old-time Kansas blizzard gave them a cold reception in their new
home. When Mr. Cloyes had agreed to pay $650 for his first quarter
section of land he was still shy $2.50 of the necessary amount, and
was forced to borrow this small sum from a kind neighbor. During
the following summer he worked in Oliver Davis’ sawmill and got
enough lumber to build a shanty on his farm. While this was building
the family lived in two rooms in the home of John S. Rust. In the fall
of the bad year of 1860, Mr. Cloyes decided to try to cash in on the
reputation he had left behind him in Vermont, and applied to an
uncle for a loan of $400. The uncle readily responded with the
statement in his letter, “If you are ever able, I know you will pay it
back; if you are never able to pay it back I can get along without it.”
During the summer Mr. Cloyes put in his spare time cutting prairie
hay and stacking it. When fall and winter came on, the returning
freighters from Pike’s Peak were willing to sell their oxen and wagons
for almost any price. Mr. Cloyes invested a part of his $400 capital in
these outfits, wintered the cattle on the hay, and in the spring was
able to dispose of the oxen for more than double the purchase prices.
During the next two years he was enabled to pay off all of his debts,
and prosperity attended his efforts from that time on. By the hard
work and good management of himself and his two sons he increased
his holdings to an entire section of land. He remained on the farm
until 1872, then gave the farm to his sons and removed to a home at
417 North Seventh street in Atchison.
On July 5, 1848, Mr. Cloyes was married to Miss Betsy
Henderson, of Middlebury, Vt., who died in Atchison in 1893, leaving
two sons, Frank E. and Mark S. On September 15, 1909, he took a
second wife, the bride being this time Mrs. Matilda Franke, of
Atchison. She was born at Thuringen, Germany, November 16, 1855,
a daughter of John and Christiana (Temme) Franke, who
immigrated to America in 1858, making the long sea voyage in a
sailing vessel which took six long weeks to make a trip, which is now
made in six days. From New York City the Frankes came directly to
St. Louis, and there made their home until their removal to Atchison.
At the outbreak of the Civil war, John Franke volunteered his
services in defense of the Union which had given him a home. He
served in a Missouri regiment of volunteers for one year, and was
then discharged on account of serious disability, caused by the
hardships which he had undergone. He was never the same man
afterwards, and died in 1865 as a direct result of his disabilities
incurred in behalf of his adopted country. The mother and family
lived in St. Louis until 1883 when they removed to Atchison. Mrs.
Franke died some years later at the home of her daughter, Mrs.
Cloyes. Matilda Franke was first married to Theo A. Franke, a native
of Saxony, Germany, in 1879, and who came to America when a
youth of eighteen years of age, and settled in Pittsburgh, Penn. Theo
A. Franke was also a veteran of the Civil war, having enlisted in 1861
in Company D, Seventy-fourth regiment, Pennsylvania infantry. He
served throughout the great conflict and was wounded several times
while participating in the battles fought by the Army of the Potomac.
He enlisted again, after being discharged on account of a serious
wound, and was a brave and valiant soldier who fought for sheer love
of his adopted country. Mr. Franke’s first trip to Atchison was made
in 1859, but he returned to Pittsburg upon the outbreak of the Civil
war and there proffered his services as stated above. He returned to
Atchison after the close of the war and here met, in the course of
years, Matilda, who was visiting friends in Atchison. Their
acquaintance ripened into a warm friendship which gave place to
love and they were married March 10, 1879. A happy wedded life
endured until Mr. Franke’s death in 1882. Children blessed this
union as follows: Rose M., wife of Bert Gilmore, an electrician of
Atchison; Elsa, wife of Fred Moore, a railway engineer of Falls City,
Neb.; Theo Franke, of Pierce, Ariz. During Mr. Franke’s first year of
residence in Atchison he was a freighter across the plains. Upon his
return in 1865 he entered the grocery business and prospered,
accumulating considerable property interests. He was well known in
Atchison and was considered to be one of the city’s most substantial
men.
Mr. Cloyes was prominently identified with the political affairs
of the county and was an influential leader of the Republican party
for many years. Even before coming to Atchison from the farm he
had taken an active interest in politics in his home township and
county. He was elected to represent his district in the State
legislature in 1867, leaving the impress of his individuality upon laws
passed in the following session. For eight years he served in the
Atchison city council and in 1891 was elected mayor. Two years later
he was reëlected. Honorable and thoroughly upright in all his
dealings, his administrations were characterized by integrity, sound
judgment and an unusual amount of good sense. He was a member
of Washington Lodge, No. 5, Ancient Free and Accepted Masons, and
all who knew him respected him for his sterling worth.
MARK D. SNYDER.
Mark D. Snyder, retired farmer, living in Monrovia, Atchison
county, Kansas, is a native son of Kansas, having been born in
Atchison county November 2, 1858. He is a son of Hon. Solomon J.
H. Snyder, one of the influential figures of the early pioneer days of
Kansas, and who was a stanch and uncompromising adherent of the
Free State principles. The father of Mark D. was born in Washington
county, Maryland, February 7, 1812, and died at Monrovia, Atchison
county, November 28, 1873. When eight years of age he accompanied
his father to Tuscarawas county, Ohio, where he was educated in the
district schools and a graded school at Canton, Ohio. Between 1830
and 1833 he cleared a farm of 160 acres of heavily timbered land. In
1838 he married Susan Winklepleck and then cleared and cultivated
a tract of timber land which he purchased until 1848. His wife died in
that year, leaving him with three small children. He sold all of his
holdings, placed his children with neighborhood families and then
traveled 4,000 miles in an endeavor to forget his great loss and
overcome his grief over the death of his wife. Later, he married Eliza
Fisher, and in 1852 removed to Indiana, and then came west to Ft.
Leavenworth in 1854. On the morning of May 4, 1854, he made the
first legal homestead claim ever entered in the State of Kansas,
comprising the land upon which the southern part of the city of
Leavenworth now stands, and then returned to Indiana for his
family. On his return to his homestead he found his claim “jumped”
and the country in the hands of border ruffians. He was driven from
the polls at the first election held in the Territory on account of his
Free Soil principles. Two other claims which he bought were wrested
from him by a pro-slavery “squatter court,” his life threatened, and
he sought refuge in an unsettled part of the State where Monrovia
now stands. Here he made his home and became prominently
identified with the politics of the new State of Kansas. In 1862 Mr.
Snyder was elected to the State legislature and served for two terms
in the house of representatives, and one in the senate, where he did
faithful and conscientious work in behalf of the people of Kansas.
Solomon J. H. Snyder was a devoted Christian, and was one of
the organizers of the first Lutheran church organization in the State,
at Monrovia, of which he remained a member until his demise. He
was a great Sunday school worker and wrote two very interesting and
valuable Sunday school books, “The Lost Children” and “Scenes in
the Far West,” and at the time of his death was engaged in the
preparation of a work entitled, “The Evidences of Christianity.” His
influence was ever in behalf of the betterment of mankind and his
Christianity was of the practical kind which introduces helpfulness,
kindness and forbearance into our daily lives. The children of S. J. H.
and Eliza (Fisher) Snyder were as follows: Angeline (Conley),
deceased; Mrs. Sarah Dunn, of Anadarko, Okla.; Mrs. Cora Shifflet,
deceased; and Mark D. The three children by his first wife were: Mrs.
Susan Reck, deceased; Mrs. Anna Berndt, of Mexico City; and J. H.,
San Diego, Cal. The mother of these children was born in Ohio in
1838, and died at her home near Monrovia, in 1896.
Mark D. Snyder, with whom this review is directly concerned,
was born, reared, and reared his own family in Atchison county. He
is one of the real native born citizens of the county. Upon the death
of his father he took charge of the old home place, and when his
mother died he purchased the family estate. By the exercise of
industry and economy, aided by good financial judgment, he has
become the owner of 240 acres of excellent land which is well
improved and one of the most productive tracts of land in northeast
Kansas. He cultivated his broad acres assiduously until 1909, when
he turned over the management of his farm to his son, and retired to
Monrovia, where he now resides.
Mr. Snyder was married November 30, 1881, to Helen M.
Maxfield, and this union has been blessed with eight children,
namely: Elsie and Minnie, deceased; John, who is farming the home
place; Mark, living in Omaha, Neb.; Mildred, deceased; Margaret
and Marguerette, twins, deceased; James, a boy twelve years old,
living with John on the home farm. The mother of these children was
born in Henry county, Illinois, a daughter of David and Anna
(Freeze) Maxfield, who first emigrated from Illinois to Sedgwick
county, Kansas, and in 1873 came to Atchison county. Mrs. Snyder
died in 1909. Mr. Snyder has always been a loyal supporter of the
Republican party, is an attendant of the Lutheran church, and is a
member of the Ancient Order of United Workmen, of Effingham,
Kan.
EDWARD PERDUE.
Edward Perdue, president of the First National Bank of
Atchison, and extensive farmer, of Huron, Kan., has been a resident
of Atchison county for the past forty-five years. Like other successful
men who were pioneers in Kansas, he arrived here from Canada
when a young man of twenty years of age without money, but
possessed of strength, a willingness to work at honest labor and an
ambition to succeed. How well he has succeeded is seen in the
substantial fortune which he has accumulated and the honors which
have been conferred upon him by his fellow citizens.
Mr. Perdue was born on a farm in Peterboro county, Ontario,
Canada, June 27, 1850, a son of Thomas and Catharine Perdue,
natives of Ireland, who left the Emerald Isle in their youth and
settled in Canada. Edward Perdue was reared to sturdy young
manhood on the parental farm and attended the country school in
the vicinity of his home as opportunity afforded. In March of 1870 he
arrived in Atchison, and during his first year worked at any odd jobs
which were presented, including labor on the streets and harvesting
on the nearby farms. During the following five years he was
employed as a construction foreman on the grading and building of
the Santa Fe railroad from Atchison to the Colorado-Kansas State
line. He saved his money and by the exercise of strict economy,
which meant the denial to himself of all but the actual necessities of
life, he was enabled to accumulate sufficient funds to invest in a farm
near the town of Huron, on which he resided for the next five years.
He then sold this farm and bought another one about one and one-
half miles east from Huron, which remains his home to the present
time. Mr. Perdue has given his attention mostly to the raising and
feeding of live stock in his farming operations and has succeeded in
amassing a comfortable fortune during the forty years he has been an
agriculturist. He has increased his land holdings until at the present
time he is the owner of 1,040 acres of splendid farm lands in
Lancaster township. His home farm is one of the best improved
tracts of farm land in the county and all of his farms show the results
obtained from soil conservation and advanced methods of farming.
Edward Perdue
While Mr. Perdue has been primarily a farmer, he has given his
attention to other matters as betokens a man of influence and
substance. In the year 1891 he assisted in the organization of the
Huron State Bank and is president of this thriving concern. In 1906
he took part in the organization of the Commercial State Bank of
Atchison, which was succeeded later by the First National Bank, of
which banking institution he has served as president since 1900. He
is also a stockholder of the State Savings Bank of Leavenworth,
Kansas.
Mr. Perdue was married in 1878 to Mary Viola Davey, of Brown
county, Kansas, a daughter of Charles Davey, which marriage has
resulted in the birth of seven children, as follows: Mrs. Maria
Walters, living on a farm near Huron; Edna, wife of J. M. Delaney,
merchant, of Huron, Kan.; Mrs. Mabel Schmidt, wife of the assistant
cashier of the Huron State Bank; Charles, who is cultivating the
home farm; Thomas Hendricks, at home; George, a farmer in North
Dakota; and Edward, Jr.
Mr. Perdue has been a life-long Democrat, who has always taken
a more or less active part in the political affairs of the county. He was
elected county commissioner in 1897 and served one term. In 1904
he served one term as a member of the State legislature, representing
this district, declining reëlection when his term of office expired.
While he was reared in the Catholic belief, Mr. Perdue is tolerant of
all creeds and takes a broad-minded view of religious matters. He
belongs to the Ancient Order of United Workmen and the Modern
Woodmen.
DR. CHARLES L. HIXON.
Dr. Charles L. Hixon, a leading dental practitioner of Atchison is
a native son of Kansas and comes of a pioneer family of the State. He
was born on a farm in Jackson county, Kansas, January 14, 1872, and
is a son of John S. and Alice (Clark) Hixon. His father, John S.
Hixon, was born in Ohio in 1850, a son of Jacob and Cassandra
(Stonebraker) Hixon, who resided in Ashland county, Ohio, until
their removal to Putnam county, Indiana, in the early pioneer days
when that part of the Hoosier State was being settled by large
numbers of Ohio people. Alice Clark Hixon, mother of Dr. Hixon,
was likewise born in 1850 in Putnam county, Indiana, a daughter of
Andrew Jackson and Harriet (Mann) Clark, natives of New York
State, and also pioneer settlers of Putnam county, Indiana. While
John S. Hixon and Alice Clark were attending the district school in
the neighborhood of their respective homes, they became great
friends, and the warm friendship ripening into love which
culminated in their marriage several years later in Jackson county,
Kansas.
The Hixons and Clarks were essentially pioneers, and the history
of the family for generations shows that some member of the family,
or several of them, have been continually pushing westward and
settling in the newer countries. Jacob Hixon was one of the first men
in his neighborhood to hearken to the call of the West, and, after
disposing of his land holdings in Putnam county, Indiana, he with all
of his family migrated to Kansas, settling in Jackson county. They
arrived in Atchison during the stormy days of the Civil war, and at a
time when the local vigilance committee was in control of
community affairs and were naturally very suspicious of all
strangers. There had been considerable lawlessness in Atchison and
neighboring towns and many outrages had been perpetrated by
border ruffians and outlaws. The vigilance committee had taken
charge of the affairs and had summarily lynched three men on the
banks of White Clay creek just previous to the arrival of the Hixon
family. Mr. Hixon was interrogated as to his loyalty to the Union and
asked his intentions. His replies being satisfactory to the members of
the committee, he was allowed to proceed on his way to Jackson
county and arrived at Holton, Kansas, without further delay. Jacob
Hixon settled on a fine farm near Holton, developed it and prospered
as the years rolled on and the country became more and more
settled. He died in 1905, at the advanced age of eighty-four years, his
wife, Cassandra, departing this life in 1885.
The Clark family came to Kansas from Indiana in 1868, and
Andrew Jackson Clark naturally settled in that part of Jackson
county where his old friend and neighbor had chosen his place of
residence. The intimacy which had existed between the two families
in Putnam county, Indiana, was renewed, and as time went on, John
S. Hixon and Alice Clark grew to maturity and were united in
marriage. Their married life has been a happy and prosperous one,
and five children have blessed this union: Dr. Charles L. Hixon, with
whom this review is directly concerned; Mrs. J. C. Neeley, of Weiser,
Idaho; Ernest H. Hixon, of Kansas City, Mo.; one child died in
infancy. John S. Hixon became prominently identified with the civic
life of Jackson county and is serving his county well and faithfully as
treasurer for two terms, having been elected on the Republican ticket
in 1912 and again in 1914. Mr. and Mrs. John S. Hixon reside in
Holton, in Jackson county, and are prosperous and well respected in
the neighborhood.
Dr. C. L. Hixon spent his boyhood days on the farm and early
learned to assist in the farm work. He received his elementary
education in the district schools, and was ambitious to secure a
higher education. He has practically educated himself, and after
learning all that was possible for him to learn in the country school,
he attended Campbell College, at Holton, Kan., for two years. His
ambition was to become a dentist, and with this end in view he
matriculated in the University of Iowa in 1895. After spending two
profitable years in this institution in the study of dentistry he
returned home, and a short time later opened an office in Atchison,
where he has practiced continuously for the past eighteen years.
After seven years of practice in his first location, he opened well
equipped offices at 519 Commercial street, and remained there until
his removal to his present location at 613 Commercial street, where
he has offices equipped with all the latest appliances for facilitating
his work. Dr. Hixon is kept very busy attending to the calls made
upon him in the practice of his profession, and during the many
years he has been located in Atchison, he has built up an extensive
and lucrative practice. He finds time, however, to keep abreast of the
latest developments made in his profession, and is ever seeking to
better his skill and knowledge of dentistry. He has been distinctly
honored by the members of his profession, having served as
president of the Northeast Kansas Dental Association, and is at
present an active member of this association. He is a leading member
of the Atchison Dental Association, and ranks high in his profession,
not only as a successful practitioner, but as a citizen who has the best
interests of his home city at heart. He is a member of the Ancient
Free and Accepted Masons, Washington Lodge, No. 5, and is
fraternally affiliated with the Odd Fellows, the Modern Woodmen of
America, the Rebekah and Eastern Star lodges.
Dr. Hixon was united in marriage with Miss Inez B. Horn in
1902, and one child has been born to this union, Charles Horn
Hixon, born May 25, 1907. Mrs. Inez B. Hixon was born in Atchison
county, a daughter of J. H. and Catharine (Wallick) Horn, who reside
at 1126 North Third street, Atchison. Mrs. Horn is a daughter of
Benjamin Wallick, who served as sheriff of the county during the
time of the Civil war.
LOUIS KLOEPPER.
Louis Kloepper, farmer and stockman of Lancaster township,
Atchison county, was born January 18, 1888, on the farm where he
now lives. He is a son of William and Fredericka (Von Derahe)
Kloepper, who were the parents of four children as follows: Louis,
subject of this sketch; Emma, deceased; William, deceased; Pauline,
living at home. The father was born in Germany, December 14, 1853.
He left there in 1883 and came directly to Atchison county, Kansas,
where he bought eighty acres of land in section 27, Lancaster
township. He farmed this one year, and in 1885 returned to Germany
to be married. In 1886 he returned to his farm and began to improve
it, building a large eight-room house in 1899 in place of the little
three-room affair which stood on the place. In 1903 he built a fine
32×40 feet granary, and in 1904 he erected a large barn, 40×48 feet.
The following year he bought more land and put up additional
buildings, building in 1908 another barn, 32×40 feet. At the time of
his death, February 7, 1913, he owned 240 acres of well improved
land under cultivation, and thirteen acres of fine timber land. This
achievement is the more remarkable in view of the fact that he
landed with only $1,200. But he was industrious, and worked
faithfully to improve his farm. He was a member, trustee and
steward of the German Lutheran church. His wife was born in
Germany, February 15, 1858, and is a daughter of Henry and
Fredericka (Von Behren) Von Derahe, natives of Germany. The
mother is now living with her son, Louis.
Louis Kloepper attended the old Huron school of Lancaster
township, and grew to manhood on the farm which he now operates.
Since the death of his father he has had charge of the farm and has
worked to the extent of his ability in installing modern
improvements on his place. He owns 160 acres in section 27,
Lancaster township, in addition to the home place, and has three
acres of orchard and grove. He also has a vineyard which was the
feature of the place which Louis, and his father before him, always
loved most. Special attention has been given to the vineyard when
other things had to be neglected, perhaps. It is the pride of Mr.
Kloepper’s place. He keeps graded stock and is a practical farmer. He
now is operating 400 acres of land, 114 acres of which are in corn,
and ninety-three acres are in cloves, the latter having been unusually
successful. He owns a threshing outfit and two clover hullers, a corn
shredder, and three gas engines. He utilizes these engines in
numerous ways, including pumping and threshing and plowing. Mr.
Kloepper has a modern farm in every way and has all up-to-date
improvements of a labor and time saving kind, as well as an
automobile. He is a stockholder in the Farmers’ Mercantile
Association of Effingham, Kan. He is a practical farmer, of the
progressive type.
In 1911 he married Marie Meier, a native of Germany, born July
3, 1888. She is a daughter of Henry and Fredericka (Finke) Meier,
and was educated in Germany and left her native land at the age of
seventeen. Mr. and Mrs. Kloepper have two children, Fredia, born
November 13, 1911, and Emma, born April 21, 1913. Mr. Kloepper is
an independent voter. He belongs to the German Lutheran church.
CHARLES W. FERGUSON.
Charles W. Ferguson, vice-president of the Atchison Savings
Bank, is one of the best known men in financial circles of
northeastern Kansas, and he is equally as well known over a large
section of western Missouri. Mr. Ferguson was born at Plattsburg,
Mo., December 29, 1862, and is a son of William L. and Fannie A.
(Carpenter) Ferguson, both natives of Kentucky, whose parents were
Virginians and very early settlers of the Blue Grass State. The
Ferguson family removed from Kentucky to Missouri about 1851.
They came up the Missouri river by boat as far as Liberty Landing,
and later located in Clinton county, Missouri. The father was a
merchant and also engaged in the grain business, and was an all
around progressive business man. He was a Republican, and in 1862
was elected sheriff of Clinton county, being the first Republican
elected to office in that county within a period of twenty-five years.
During the Civil war he was captain of the Home Guards. He died in
1893, age 64 years. Charles W. Ferguson is one of a family of six
children, as follows: John L., assistant general passenger agent of the
Chicago & Northwestern railroad, Chicago, Ill.; Mary F., widow of M.
B. Riley, and resides in St. Joseph, Mo.; Adelia M., Plattsburg, Mo.;
Katherine, Plattsburg, Mo.; Charles W., the subject of this sketch,
and Louis, a conductor on the Chicago & Northwestern railroad,
resides at Highland Park, Ill. Charles W. Ferguson attended the
public schools in Plattsburg until he was thirteen years old, and at
that early age went to work in the express office at Plattsburg, where
he remained about five years. He then entered the employ of Stonum
Brothers, remaining with that company two years. He then accepted
a position in the Plattsburg Bank, as bookkeeper and assistant
cashier, remaining with that institution for seven years. He then
went with the Schuster-Hax National Bank, St. Joseph, Mo., as
receiving teller, and served in that capacity for four years. He
resigned that position in June, 1894, to become bookkeeper of the
Exchange National Bank of Atchison. He served with that institution
in the capacity of paying teller, assistant cashier and cashier,
resigning the latter position February 1, 1914. In November, 1914, he
accepted a position with the Federal Reserve Bank, of Kansas City,
Mo., and was with that institution for eight months, and in July,
1915, became vice-president of the Atchison Savings Bank. Mr.
Ferguson has had a vast experience in the field of banking, and is
well posted on the intricate problems of finance, and possesses the
keen discriminating qualities of the successful banker. Mr. Ferguson
was married April 28, 1892, to Miss Sallie Clay, of Plattsburg, Mo.
She is a daughter of James M. Clay, a member of the Kentucky
branch of the Clay family. Mr. Ferguson is a member of the Masonic
lodge, the Benevolent and Protective Order of Elks and the Modern
Woodmen of America.
EARL V. JONES.
Signal success in any one field of endeavor is worthy of
recognition by the public, whether it be professional, inventive,
mercantile or of an industrial nature. Some men are naturally gifted
with the ability to become successful in the industrial and
manufacturing field, and are mentally equipped with a certain
amount of mechanical genius, along with decided business ability to
take hold of a proposition, and makes it succeed, despite difficulties.
E. V. Jones, treasurer and manager of the Bailor Plow Company, of
Atchison, is one of the latter type who is fast climbing to a place of
eminence in his chosen field of endeavor, and holds a high place
among the manufacturing and mercantile interests of Atchison and
the Middle West.
Mr. Jones was born in Livingston county, Missouri, January 21,
1878, a son of Charles Jones, a building contractor, who was a native
of Kentucky and a son of William Jones, owner of a large plantation
in Kentucky, which was lost as one of the misfortunes which befell
the family as a result of the Civil war’s ravages in Kentucky. Desirous
of making a new start in a land further removed from internecine
strife, and where opportunities for success seemed greater, William
Jones removed to Missouri, and here Charles, the father of E. V., was
reared and became successful in agricultural pursuits, the son, Earl
V., being reared on the family estate in Livingston county, Missouri.
The Jones family is originally of Scotch-Irish stock, the founder of
the family emigrating from the north of Ireland to this country
several generations ago. Charles Jones married Miss Jennie Wills, a
daughter of John Wills, native of the east coast of England, and who
immigrated to this country with his brother, George, and followed
his trade of wagon maker successfully. John Wills owned and
operated an extensive blacksmith and wagon maker’s shop at
Chillicothe, Mo., which did a large business and made moderate
wealth for its proprietor.
Earl V. Jones, with whom this review is directly concerned, was
educated in the common and high schools of his native county, and

More Related Content

Similar to Mapping Data Flows in Azure Data Factory 1st Edition Mark Kromer (20)

DOCX
Microsoft Fabric data warehouse by dataplatr
ajaykumar405166
 
DOCX
Discussion post· The proper implementation of a database is es.docx
madlynplamondon
 
PDF
SemTech 2010: Pelorus Platform
Clark & Parsia LLC
 
PDF
Enabling SQL Access to Data Lakes
Vasu S
 
PDF
oracle-adw-melts snowflake-report.pdf
ssuserf8f9b2
 
PDF
Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Exampl...
dphfmuw5765
 
PDF
Learning Airtable (First Early Release) Elliott Adams
krevlmammag
 
PDF
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
CLARA CAMPROVIN
 
PPTX
TAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVE
EmilySmith271958
 
PPT
Mr bi
renjan131
 
PDF
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
PDF
The Evolving Role of the Data Engineer - Whitepaper | Qubole
Vasu S
 
PDF
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
DOCX
Data warehouse 2.0 and sql server architecture and vision
Klaudiia Jacome
 
PPTX
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
marksimpsongw
 
PDF
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
Fwdays
 
PDF
Data warehouse pricing & cost: what you'll really spend
noviari sugianto
 
PPTX
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Lucas Jellema
 
PDF
Data Mesh in Action (MEAP V04) Jacek Majchrzak
nakishouke2w
 
Microsoft Fabric data warehouse by dataplatr
ajaykumar405166
 
Discussion post· The proper implementation of a database is es.docx
madlynplamondon
 
SemTech 2010: Pelorus Platform
Clark & Parsia LLC
 
Enabling SQL Access to Data Lakes
Vasu S
 
oracle-adw-melts snowflake-report.pdf
ssuserf8f9b2
 
Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Exampl...
dphfmuw5765
 
Learning Airtable (First Early Release) Elliott Adams
krevlmammag
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
CLARA CAMPROVIN
 
TAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVE
EmilySmith271958
 
Mr bi
renjan131
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
Vasu S
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
Data warehouse 2.0 and sql server architecture and vision
Klaudiia Jacome
 
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
marksimpsongw
 
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
Fwdays
 
Data warehouse pricing & cost: what you'll really spend
noviari sugianto
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Lucas Jellema
 
Data Mesh in Action (MEAP V04) Jacek Majchrzak
nakishouke2w
 

Recently uploaded (20)

PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PPTX
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
How to Create Odoo JS Dialog_Popup in Odoo 18
Celine George
 
PDF
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PPTX
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
How to Create Odoo JS Dialog_Popup in Odoo 18
Celine George
 
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
Ad

Mapping Data Flows in Azure Data Factory 1st Edition Mark Kromer

  • 1. Read Anytime Anywhere Easy Ebook Downloads at ebookmeta.com Mapping Data Flows in Azure Data Factory 1st Edition Mark Kromer https://ebookmeta.com/product/mapping-data-flows-in-azure- data-factory-1st-edition-mark-kromer/ OR CLICK HERE DOWLOAD EBOOK Visit and Get More Ebook Downloads Instantly at https://ebookmeta.com
  • 2. Recommended digital products (PDF, EPUB, MOBI) that you can download immediately if you are interested. Azure Data Factory by Example: Practical Implementation for Data Engineers Swinbank https://ebookmeta.com/product/azure-data-factory-by-example-practical- implementation-for-data-engineers-swinbank/ ebookmeta.com Azure Data Factory by Example: Practical Implementation for Data Engineers 1st Edition Richard Swinbank https://ebookmeta.com/product/azure-data-factory-by-example-practical- implementation-for-data-engineers-1st-edition-richard-swinbank/ ebookmeta.com Modern Data Architecture on Azure: Design Data-centric Solutions on Microsoft Azure 1st Edition Sagar Lad https://ebookmeta.com/product/modern-data-architecture-on-azure- design-data-centric-solutions-on-microsoft-azure-1st-edition-sagar- lad/ ebookmeta.com NEGOTIATION & DISPUTE RESOLUTION 2nd Edition Beverly J. Demarr https://ebookmeta.com/product/negotiation-dispute-resolution-2nd- edition-beverly-j-demarr/ ebookmeta.com
  • 3. Why CISOs Fail, 2nd Barak Engel https://ebookmeta.com/product/why-cisos-fail-2nd-barak-engel/ ebookmeta.com Chasing Whiskey 1st Edition Sophie Stern Stern Sophie https://ebookmeta.com/product/chasing-whiskey-1st-edition-sophie- stern-stern-sophie/ ebookmeta.com Downloaded When a Killer Calls A Haunting Story of Murder Criminal Profiling and Justice in a Small Town Cases of the FBI s Original Mindhunter 2 1st Edition John E. Douglas https://ebookmeta.com/product/downloaded-when-a-killer-calls-a- haunting-story-of-murder-criminal-profiling-and-justice-in-a-small- town-cases-of-the-fbi-s-original-mindhunter-2-1st-edition-john-e- douglas/ ebookmeta.com Building the Hyperconnected Society Internet of Things Research and Innovation Value Chains Ecosystems and Markets 1st Edition Ovidiu Vermesan https://ebookmeta.com/product/building-the-hyperconnected-society- internet-of-things-research-and-innovation-value-chains-ecosystems- and-markets-1st-edition-ovidiu-vermesan/ ebookmeta.com Fidele Jenna Lynn Brown https://ebookmeta.com/product/fidele-jenna-lynn-brown/ ebookmeta.com
  • 4. The Corporate Media Toolkit Advanced Techniques for Producers Writers and Directors 1st Edition Ray Dizazzo https://ebookmeta.com/product/the-corporate-media-toolkit-advanced- techniques-for-producers-writers-and-directors-1st-edition-ray- dizazzo/ ebookmeta.com
  • 6. Mark Kromer Mapping Data Flows in Azure Data Factory Building Scalable ETL Projects in the Microsoft Cloud
  • 7. Mark Kromer SNOHOMISH, WA, USA ISBN 978-1-4842-8611-1 e-ISBN 978-1-4842-8612-8 https://doi.org/10.1007/978-1-4842-8612-8 © Mark Kromer 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Apress imprint is published by the registered company APress Media, LLC, part of Springer Nature.
  • 8. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
  • 9. This book is dedicated to my loving wife Stacy and our boys Ethan and Jude. Thank you for putting up with my late hours working on data analytics and writing this book!
  • 10. Introduction The ETL (extract, transform, load) process has been a cornerstone of data warehouses, data marts, and business intelligence for decades. ETL is how data engineers have traditionally refined raw data into business analytics that guide the business to make better decisions. These projects have allowed engineers to build up libraries of common ETL processes and practices from traditional on-premises data warehouses over the years, very commonly with data coming from Oracle, Microsoft, IBM, or Sybase databases or business ERP/CRM applications like Salesforce, SAP, Dynamics, etc. However, over the past decade, our industry has seen these analytical workloads migrate to the cloud at a very rapid pace. To keep up with these changes, we’ve had to adjust ETL techniques to account for more varied and larger data. The big data revolution and cloud migrations have forced us to rethink many of our proven ETL patterns to meet modern data transformation challenges and demands. Today, the vast majority of data that we process exists primarily in the cloud. And that data may not always be governed and curated by rigid business processes in the way that our previous ETL processes could rely on. The common scenarios of processing well-known hardened schemas from SAP and CSV exports will now have a new look and challenge. The data sources will likely vary in shape, size, and scope from day to day. We need to account for schema drift, data drift, and other possible obstructions to refining data in a way that turns the data into refined business analytics. Cloud-First ETL with Mapping Data Flows Welcome to Mapping Data Flows in Azure Data Factory! In this book, I’m going to introduce you to Microsoft Azure Data Factory and the Mapping Data Flows feature in ADF as the key ETL toolset to tackle these modern data analytics challenges. As you make your way through the book, you’ll learn key concepts, and through the use of examples, you’ll begin to build your first cloud-based ETL projects that can help you to unlock the potential of scaled-out big data ETL processing in the
  • 11. cloud. I’ll demonstrate how to tackle the particularly difficult and challenging aspects of big data analytics and how to prepare data for business decision makers in the cloud. To get the most value from this book, you should have a firm understanding of building data warehouses and business intelligence projects. It is not necessary to have many hours of experience building cloud-first big data analytics projects already. However, having some experience in cloud computing will provide valuable context that will help you as you work through some of these new approaches. The examples and scenarios used in this book will be patterns and practices that are based on ETL common scenarios, so having data engineering experience and background will also be very helpful. I’ll help guide you along as you migrate from traditional on-premises data engineering to the world of Azure Data Factory. Overview of Azure Data Factory To become familiar with the data engineering process in Microsoft Azure, we’ll need to begin with an overview of Azure Data Factory (ADF), which is the Azure service for building data pipelines. The first chapter will focus on conceptual discussions of how to build a process to transform massive of amounts of data with many quality issues in the cloud. Essentially, we need to redefine ETL for cloud-based big data, where data volumes and veracity can change daily, and we’ll compare and contrast the Azure mechanism for the modern data engineer with traditional ETL. That’s where we’ll begin the process of building ETL pipelines that will serve as the basis for your big data analytics projects. I’m going to present a series of common use cases that will demonstrate how to apply the concepts discussed in the earlier chapters to practical ETL projects. From there, the focus will turn to a deep dive on Mapping Data Flows and how to build ETL frameworks in ADF by using the visual design-time interface to build code-free data flows. Mapping Data Flows is primarily a code-free visual design experience, so we’ll walk through techniques and best practices for managing the software development life cycle of a data flow in ADF. Data Factory provides many different means to process and transform data that include coding and calling external compute processes.
  • 12. However, in this book, the focus will be on building ETL pipelines in a code-free style in Mapping Data Flows. As you work your way through the early chapters in this book, you should begin to develop an understanding of how to apply data engineering principles in ADF and Mapping Data Flows. That’s where we’ll begin to implement mechanisms to help organize your work and design-time environment, preparing for eventual operationalization at runtime. We’ll set up a Git repo for our work, as you should in real-life scenarios. We’ll design interactive data transformation graphs using serverless compute that can scale out as needed. You won’t need to manage physical servers and clusters with ADF, but I will explain how things work behind the scenes to provide this serverless compute power for your pipelines. Behind the scenes, ADF will leverage the Azure platform-as-a-service workflow engine Logic Apps for pipeline execution and scheduling. The transformation engine for Mapping Data Flows is Apache Spark. But you won’t have to learn anything about those underlying dependent services. The Azure Integration Runtimes will provide that compute for you dynamically in a serverless manner. Operationalizing Data Pipelines As you begin designing data flows for cloud-first big data workloads, we will test and debug in nonproduction environments and then promote that work to production environments. Execution of those jobs will be performed via ADF data pipelines based on schedules. These chapters will focus on operationalizing our work in a way that will become the eventual automated ETL framework for your business analytics. A complete end-to-end solution must also require monitoring and management of these processes on an ongoing basis. The final chapters will provide mechanisms in ADF that can be leveraged to monitor runs over time and to examine the performance of your pipelines. Because the nature of big data in the cloud is that the data will be messy and ever-changing, it is important to establish alerts and handling for schema and data drift. I’ll explain how to add fail-safe mechanisms, monitoring, and traps for these common problems so that your data pipelines can execute continuously. The frameworks needed for design,
  • 13. debug, schedule, monitor, and manage are all contained inside of ADF, and we’ll spend time digging into each one of those areas. Goal for the Book My goal is that by the end of this book, you’ll be able to apply the concepts and the patterns presented here to build ETL pipelines for your next big data analytics project in the cloud. By mapping these new, updated approaches to processing data for analytics (a.k.a. big data analytics) to the world of traditional ETL processing that you are already familiar with, you will be able to use Azure Data Factory and Mapping Data Flows to provide your business with analytics that will result in making better business decisions. Many of the patterns and practices in this book can be applied directly to your projects where you are beginning to build cloud-first data projects in Azure. You can use these techniques to begin building a new set of reusable common ETL patterns. As you work your way through the progression of this book’s chapters, you’ll build upon the lessons learned in each chapter with the goal of having all of the necessary lessons learned to begin building your own big data analytics ETL solution natively in the cloud using Azure Data Factory with Mapping Data Flows. So welcome, and I hope you find this book helpful as you begin building powerful ETL solutions in the cloud!
  • 14. Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub (https://github.com/Apress). For more detailed information, please visit http://www.apress.com/source-code.
  • 15. Table of Contents Part I: Getting Started with Azure Data Factory and Mapping Data Flows Chapter 1:​ETL for the Cloud Data Engineer General ETL Process Differences in Cloud-Based ETL Data Drift Landing the Refined Data Typical SDLC Summary Chapter 2:​Introduction to Azure Data Factory What Is Azure Data Factory?​ Factory Resources Pipelines Activities Triggers Mapping Data Flows Linked Services Datasets Azure Integration Runtime Self-Hosted Integration Runtime Elements of a Pipeline Pipeline Execution Pipeline Triggers Pipeline Monitoring Summary
  • 16. Chapter 3:​Introduction to Mapping Data Flows Getting Started Design Surface Connector Lines and Reference Lines Repositioning Nodes Data Flow Script Transformation Primitives Multiple Inputs/​ Outputs Schema Modifier Formatters Row Modifier Flowlets Destination Expression language Functions Input Schema Parameters Cached Lookup Locals Data Preview Manage Compute Environment from Azure IR Debugging from the Data Flow Surface Debugging from Pipeline Summary Part II: Designing Scalable ETL Jobs with ADF Mapping Data Flows Chapter 4:​Build Your First ETL Pipeline in ADF
  • 17. Scenario Data Quality Task 1:​Start with a New Data Flow Task 2:​Metadata Checker Task 3:​Add Asserts for Data Validation Task 4:​Filter Out NULLs Task 5:​Create Full Address Field Final Step:​Land the Data As Parquet in the Data Lake Summary Chapter 5:​Common ETL Pipeline Practices in ADF with Mapping Data Flows Task 1:​Create a New Pipeline Task 2:​Debug the Pipeline Task 3:​Evaluate Execution Plan Task 4:​Evaluate Results Task 5:​Prepare Pipeline for Operational Deployment Summary Chapter 6:​Slowly Changing Dimensions Building a Slowly Changing Dimension Pattern in Mapping Data Flows Data Sources NewProducts ExistingProducts​ Cached Lookup Create Cache Create Row Hashes Surrogate Key Generation
  • 18. Check for Existing Dimension Members Set Dimension Properties Bring the Streams Together Prepare Data for Writing to Database Summary Chapter 7:​Data Deduplication The Need for Data Deduplication Type 1:​Distinct Rows Type 2:​Fuzzy Matching Column Pattern Matching Self-Join Match Scoring Scoring Your Data for Duplication Evaluation Turn the Data Flow into a Reusable Flowlet Debugging a Flowlet Summary Chapter 8:​Mapping Data Flow Advanced Topics Working with Complex Data Types Hierarchical Structures Arrays Maps Data Lake File Formats Parquet Delta Lake Optimized Row Columnar Avro
  • 19. JSON and Delimited Text Data Flow Script Summary Part III: Operationalize Your ETL Data Pipelines Chapter 9:​Basics of CI/​ CD and Pipeline Scheduling Configure Git New Factory Existing Factory Branching Publish Changes Pipeline Scheduling Debug Run Trigger Now Schedule Trigger Tumbling Window Trigger Storage Events Trigger Custom Events Trigger Summary Chapter 10:​Monitor, Manage, and Optimize Monitoring Your Jobs Error Row Handling Partitioning Strategies Optimizing Integration Runtimes Compute Settings Time to Live (TTL) Iterating over Files
  • 20. Parameterizing Pipeline Parameters Data Flow Parameters Late Binding Data Profiling Mapping Data Flow Statistics Data Preview Statistics Profile Stats Power Query Activity Transformation Optimization byName( ) and byNames( ) Rank and Surrogate Key Sorting Database Queries Joins and Lookups Pipeline Optimizations for Data Flow Activity Run in Parallel Logging Level Database Staging Summary Index
  • 21. About the Author Mark Kromer has been in the data analytics product space for over 20 years and is currently a Principal Program Manager for Microsoft’s Azure data integration products. Mark often writes and speaks on big data analytics and data analytics and was an engineering architect and product manager for Oracle, Pentaho, AT&T, and Databricks prior to Microsoft Azure.
  • 22. About the Technical Reviewer Andy Leonard is a husband, dad, and grandfather; creator of – and Data Philosopher at – DILM Suite for Data Integration Lifecycle Management (dilmsuite.com); a blogger (andyleonard.blog); founder and Chief Data Engineer at Enterprise Data &​ Analytics (entdna.com); an SSIS and Azure Data Factory trainer, consultant, and developer; a SQL Server database and data warehouse developer; and an author, mentor, engineer, and farmer.
  • 23. Part I Getting Started with Azure Data Factory and Mapping Data Flows
  • 24. (1) © The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022 M. Kromer, Mapping Data Flows in Azure Data Factory https://doi.org/10.1007/978-1-4842-8612-8_1 1. ETL for the Cloud Data Engineer Mark Kromer1 SNOHOMISH, WA, USA In the modern business data ecosystem, “digital transformation” is one of the most prominently used terms to describe the transformation of traditional technology practices to cloud and big data approaches. The term has become a ubiquitous term in IT and has come to represent the embrace of cloud and big data technologies in the data engineering world. The data part of this digital business transformation puts data engineers at the center of the data processing value chain. What data engineers are challenged with is how to find a way to effectively extract, transform, and load massive amounts of new data points that are often unwieldy in nature. That means that we have to update our ETL processes to meet these new cloud-first big data approaches. Digital transformation is crucial for the success of businesses to compete and grow in today’s cloud-first IT strategies , so let’s dig into how to adjust and build comparable solutions in Azure using ADF and Mapping Data Flows. General ETL Process Figure 1-1 is an example of a general ETL process from traditional on- premises projects where your sources are highly governed source data like data that originates from SAP, database tables, and file extracts that abide by well-known contracts.
  • 25. Figure 1-1 Traditional ETL general process As a data engineer working on cloud-first projects in Microsoft Azure, you’ll employ a process similar to the diagram in Figure 1-2, which only differs slightly from the concepts shown in Figure 1-1. But the details in each step bring about a significant amount of change that will be the topic of the ADF-specific chapters to come. At the end of the day, the objective of preparing data for business decision makers, who will use business intelligence tools, SQL queries, Excel, data science tools , and other decision-oriented tooling, is no different than you see in traditional on-premises scenarios with highly curated data sources and targets. Figure 1-2 A general example of the ETL process in Azure
  • 26. The consumers of the analytics in both of these instances are analysts who are building reports where actual business value is derived for business decision makers. For the data to be useful, the data engineers, data scientists, and citizen data integrators must contribute in a governed way to refining raw data into business-friendly models for exploration and reporting. Differences in Cloud-Based ETL We’ll need to have a common understanding of what we are achieving in this book, so let’s dive into this process in detail and identify some of the differences in cloud-based ETL in Azure from similar traditional on- premises ETL projects: 1. Raw data a. Much of the data extraction in big data cloud ETL will be of unknown quality and can change shape and size dramatically between job executions. In ADF, we’ll make use of the Copy Activity and Data Flow Activity connectors, linked services, and datasets. In traditional data warehouse scenarios, you may have found that all of your business data resides on-premises and inside the network confines of your business. Additionally, often that data has been curated and already refined through a data quality process. Do not make such assumptions about data that you’ll land in the data lake. The details of the different ADF components will come in the next set of chapters. 2. Staging layer a. This is where we will land an initial snapshot, lightly transformed, version of the source data in a landing zone in the data lake. For most of the demo scenarios in the book, we’ll land the data in Azure Data Lake Store Gen2 (ADLS Gen2 or simply ADLS). If you’ve previously designed data warehouses with an ODS model or used database tables as staging tables, you can equate the staging layer in the data lake as an analogy. Because the data volumes are expected to be very large here, we will
  • 27. implement incremental data loading patterns in ADF rather than attempt to extract the entire set of data every time. 3. Transform a. This is where we will spend a lot of our time and attention in this book using the code-free Mapping Data Flows feature in ADF. We’ll build data flows that will perform all different types of data transformations to prepare the data for consumption by our target users. We’ll derive columns, aggregate data, and design slowly changing dimension handlers and many more exciting data transformations. A key difference you may find in the transformation layer from traditional ETL projects is that the data will not always be tabular and relational in shape. Rather than expecting to receive database table connections and CSV files, we will need to work with big data native file formats like Parquet, Avro, JSON, and ORC. That can make transformations tricky when you begin to work with arrays, maps, structures, and hierarchies. 4. Serving layer a. The serving layer is going to be a data store that is generally a database like Azure SQL Database or Cosmos DB. You will also often use an analytical database like Azure Synapse, Snowflake, or other database targets. Another option here is to simply leave the data in ADLS but utilize Delta Lake folders as a way to organize your data and provide CRUD operations on your analytical data. We’ll talk about all of these options in the book including the benefits of both in terms of cost and effectiveness for consumption by business users. 5. Presentation layer a. As mentioned earlier, this is where the business users live and how they will access the refined data to make business
  • 28. decisions. Business intelligence tools will utilize the resulting models from the ETL process and create reports and dashboards. The end-user interaction with the resulting data does not change dramatically with modern data approaches to ETL. However, you should keep the requirements in mind in terms of what BI tools will be used. Not all BI tools and business-decision tools can read and work with data in the lake or data stored in formats like Parquet and Delta Lake. 6. Orchestrate and monitor data pipelines in ADF a. When thinking about a scalable framework to build and manage complex ETL jobs, it is critical to consider operationalization requirements. In this diagram, I particularly call out orchestration of the pipelines and monitoring of the pipelines. The orchestration piece is not specific just to ADF, but I will only reference ADF techniques in this book. There are many underlying facilities to orchestration that we’ll need to touch on that are very important. For example, scheduling jobs, managing the software development life cycle, version control, CI/CD, and more that we’ll dive into in later chapters. With common legacy ETL tools, you should already have most of these capabilities. I believe, however, that providing a level of governance to the big data cloud world is even more important because the modern data estate environment can be much more complex than traditional environments. After your pipelines have been scheduled, you need a mechanism to monitor the health of your ETL jobs. We’ll walk through setting up alerts and day-to-day monitoring of tasks in ADF later. Let’s dig into each area of the ETL process, starting with the raw data. In modern big data cloud-first data ecosystems , raw data is going to be quite varied and will range from traditional relational database tables with well-defined schemas to raw JSON files with changing properties. You should always expect the unexpected and design your data extraction logic defensively. You may choose to tell ADF to fail your
  • 29. pipeline when attributes or data domains are not within a specified set of constraints. Or you can utilize the built-in concepts of “schema drift” and “data drift” to create a more resilient pipeline that evolves with changing source data. Schema drift occurs when the expected data schema evolves unexpectedly by adding new columns, removing columns, or changing columns. In ADF, you can switch on schema drift handling very easily, and that will tell ADF to accept new or evolving columns. This handling of evolving source data creates a very resilient pattern where your ETL processes will not fail because new columns have been detected. However, it can also hide underlying issues with the source data that you may wish to tag as data quality errors. You will need to make that architectural decision to either fail when the incoming schema breaks the existing contract or continue processing. Data Drift Similar to metadata schema drift , data drift occurs when values inside of existing columns begin to arrive outside of a set domain or boundaries. In ADF, you can establish “Assert” expectations that define data ranges. When those domains or ranges of metadata rules are breached in the data, you can fail the job or tag the rows as data quality errors and make downstream decisions on how to handle those errors. For example, you can decide to output an alert, redirect the rows to an error log, or simply ignore the failures and continue processing. The staging layer is where you will land data from sources into the data lake. In the past, you may have used temporary tables in a database as the staging area, where you would quickly land raw data without transformation. Within Azure, we’re going to use ADLS Gen2 using that same analogy of staging data. You are going to land your data into ADLS “Containers,” which is where you’ll define your folder strategy. In big data storage, folders are very important because they can be used by runtimes like Spark to define file partition strategies. A very common methodology to employ is to create folders based on dates. For example, create a folder structure like this to store raw data of employee data: MyContainer/RawData/Emp/YYYY/YYYYMM.
  • 30. The format of that folder structure inside of your Azure Data Lake would look like Figure 1-3. Figure 1-3 Example folder structure Folder partitioning can help with carving out portions of the lake for incremental processing and for partition elimination at query time, improving performance of the Spark engine, which is the execution engine that we’ll use in ADF for data transformation. Another common method to optimize your data lake folders for processing is to use key/value pairs to store unique values in your data as folders with data residing in the leaf-level Parquet file as in Figure 1-4. Figure 1-4 Key value folder partitioning In the earlier example, my output data contains the columns “releaseyear” and “month”. I’ve created a folder for every unique “releaseyear” and every unique “month” value in my data using the format of releaseyear=yyyy/month=mm. The files residing at the leaf level in that folder structure is Parquet format and, in this particular example, has a friendly name of moviesoutnew.parquet. But you cannot assume that files written by ADF and Spark, generally, are going to use readable names like that. In fact, in most cases, it is much more optimal to allow Spark to write the file name based on the job process ID. Don’t be surprised to find many files with GUID names in your folders after executing your data pipelines. Throughout this book, we’ll use samples that will output partitioned Parquet folders, and we’ll configure ADF to automatically create that folder structure. The transform layer is the topic we’ll focus on in depth in the coming chapters. This is where your data transformation logic will reside. In later chapters, we’ll design code-free graphs that will perform
  • 31. common ETL operations like slowly changing dimensions, data cleansing, aggregations, fact loading, and data preparation. Those patterns are common throughout the history of ETL and data engineering that we’ll update for the modern data landscape. In this book, we’ll touch on data partitioning strategies, pushdown optimizations, cluster distributions, and other topics specific to big data in the cloud, including making use of Parquet data formats. Parquet is a columnar, highly compressed file format that is very efficient when used for analytics with Spark, and you’ll come across this format throughout the book. But for the purposes of ETL in ADF Mapping Data Flows, assume Parquet will be your default format you’ll land your data results as in the lake when using ADF Mapping Data Flows. Landing the Refined Data Now that your data has been prepped, cleaned, and transformed, you will land the refined data into an analytical data store to make it available to your end users. This is known as serving layer (not server layer ), and typically a database is utilized in this layer as the data store. The biggest change here is that the traditional relational database may be replaced by a cloud-first database, a NoSQL data store, or even just files in the lake. This is where the partitioned Parquet folder techniques listed earlier come into focus. The serving layer can remain a data lake with a computation engine (likely SQL based) serving queries to the presentation layer. Now that all of the hard work of the data engineer is complete and the ETL project has been established, we reach the top of the analytics value chain: the presentation layer. The data engineer has successfully performed the ETL process of refining raw data into consumable business data for decision makers and analysts, who will utilize tools like Power BI, Excel, Looker, etc., to build reports and dashboards with business metrics and KPIs. Another important audience for analytics in these scenarios will be data scientist. They may use tools to build data models, additional data wrangling, and data exploration using Jupyter Notebooks, SQL queries, or data wrangling tools. Both are target personas for the analytics that you have generated from your ETL jobs.
  • 32. Typical SDLC Let’s take a look at what the software development life cycle (SDLC) looks like for a typical ETL project with ADF in Figure 1-5. Figure 1-5 SDLC for ADF ETL pipeline projects We’ll walk through configurations needed to connect your data factory to Azure DevOps for Git support in later chapters. But for this conceptual discussion, just focus on the distinct steps that you should follow to produce quality ETL jobs that meet your user requirements. 1. Gather business requirements. a. Where do you start with an ETL project? Start by talking with your end users, the business analysts, and data scientists represented in the presentation layer earlier. Essentially, you’ll want to deeply understand the consumers of your refined data results and understand the analytics that they need to drive the business with their reports. Ask what data is important to making the right decisions and building the best models. Discuss ways to aggregate complex data and summarize it into business semantics. Then begin tracking down the sources of the data points you’ll need to lock to provide the results they’re looking for. This exercise should result in a list of required data for your ETL jobs to produce as well as a list of the data sources and access credentials required to get to the source data. Once you’ve listed all of the sources required and the analytical results you need to produce, you’re ready to start designing. 2. Design ETL pipelines in new Git branch.
  • 33. a. We’ll walk through building a new branch in Git from ADF later. But you can think of this as your first step on a new ADF project. You’ll work from a new branch as your sandbox environment. Never develop new pipelines against the live ADF service or from an existing branch. You risk losing work and damaging existing, working code. Now this is where the fun begins! We’ll talk in detail soon about how to build code-free graphs for data transformation pipelines. 3. Unit test, debug, user acceptance testing. a. Testing your pipelines before release to your production factory is a critical element in an ETL project . It is also another great reason to leverage Git in your ADF project so that you can have a factory that is in a separate development branch, making it much easier to test before deploying to production. We’ll talk about testing strategies, debugging, previewing results, and other important factors in this step. 4. Publish from main branch. a. After all tests have passed, your next step is to deploy your new pipelines to production. You’ll merge your current development branch into a collaboration branch and publish the updates to the live ADF service from your main branch. 5. Operationalize and monitor. a. The last set of operations to undertake will include setting a schedule for your pipeline and monitoring the results. We’ll walk through different types of schedules that most effectively meet the update cadence uncovered by your business requirements step. Then we’ll set alerts for pipeline failures and check the status of our ETL jobs over time.
  • 34. Summary We began our journey by learning the fundamentals of ETL in the cloud for data engineers with Azure Data Factory’s Mapping Data Flows. Now that we have a clear understanding of the ETL process in Azure, let’s begin diving into Azure Data Factory and apply these principles to our first factory.
  • 35. (1) © The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022 M. Kromer, Mapping Data Flows in Azure Data Factory https://doi.org/10.1007/978-1-4842-8612-8_2 2. Introduction to Azure Data Factory Mark Kromer1 SNOHOMISH, WA, USA Azure Data Factory is the Microsoft Azure cloud service for data engineers for building, scheduling, and executing data integration and extract, transform, and load (ETL) processes . In this chapter, we’ll focus on how to build cloud-first ETL projects using ADF with ADF’s Mapping Data Flows code-free data transformation features. What Is Azure Data Factory? But first we need to start with a fundamental overview of the ADF service and its components. It is important to have an understanding first of the ADF UI before we begin. You will need to first have an Azure subscription and follow the steps needed to create a new data factory from the Azure portal. We’ll walk through those steps at the end of the book when we create a sample project. For now, let’s start by looking at the primary high-level concepts in ADF and become a bit more familiar with the pipeline designer as shown in Figure 2-1.
  • 36. Figure 2-1 The ADF web-based user interface Let’s dig into each one of these high-level concepts in more detail. Figure 2-2 introduces you to each of the data factory resources , which will be described next. Figure 2-2 Azure Data Factory concepts Factory Resources The resource explorer on the far left of the ADF UI represents a list of each of the top-level artifacts in an ADF factory. We’re going to skip over discussions of the Power Query and Templates high-level artifacts in
  • 37. this book. Instead, let’s talk a bit about each of the primary ADF artifacts that are important for building ETL jobs. Pipelines The primary unit of work in ADF is pipelines . Pipelines drive all of the actions that your data integration and ETL jobs perform. A pipeline is essentially a collection of activities that you connect together in a meaningful pattern to create a workflow. All actions in ADF are scheduled through a pipeline execution, including the Mapping Data Flows that we’ll build in this book. Activities Pipelines are constructed from individual activities. Activities define the individual action that you wish to perform. There are many activities that you can use to compose a pipeline. Examples of activities include copying data (Copy activity), transforming data (Mapping Data Flows), “For Each,” “If Then,” and other control flow activities. You can also call out to external compute activities like Databricks Notebook and Azure Functions to execute custom code. For this book, we’re going to focus on the data flow activity and building Mapping Data Flows for ETL jobs. Triggers Triggers allow you to set the conditions for your pipeline to execute. You can create schedule triggers, tumbling window, storage events, and custom events. The most common is schedule triggers, which allow you to set the execute frequency and times for your pipeline trigger. Tumbling window allows for time intervals. ADF will establish windows of time for the recurrence that you choose starting on the date that you choose. Storage events will allow you to trigger your pipeline when a file arrives or is deleted from a storage account. And the final type is custom event triggers. You can create custom topics in Azure Event Grid and then subscribe to those events. When a specific event is
  • 38. received by your custom event trigger, your pipeline will be triggered automatically. Mapping Data Flows This is the code-free data transformation feature that we’ll focus on for the rest of this book. Mapping Data Flows has its own browser designer that will open when you create a new data flow. This is where we’ll design data transformation graphs and then execute the data flow from a pipeline. You execute your data flow from a pipeline by adding the data flow activity to your pipeline and then choose which data flow to execute. Linked Services You will use linked services to store credentials, location, and authentication mechanisms to connect to your data. Linked services are used by datasets and activities in ADF pipelines so that it can be determined where and how to connect to your data. You can share linked service definitions across objects in your factory. Datasets Datasets define the shape of your data. In ADF, datasets do not contain or hold any data. Instead, they point to the data and provide ADF information about the schema for your data. In ADF, your data does not require schema. You can work with data in a schema-less manner. When you build ETL jobs using schema-less datasets, you will build data flows that are known as “late binding” and working with “schema drift.” It is a very powerful and flexible concept that we’ll talk about later and means that your dataset is not required to hold a specific schema at all. Azure Integration Runtime Throughout the book, I’ll refer to the Azure Integration Runtime as the Azure IR or sometimes simply as IR. This is a configuration object
  • 39. stored in the ADF metastore that defines the location and type of compute that you’ll use for parts of your pipeline that require computation. This can mean VMs for copying data, executing SSIS (SQL Server Integration Services) packages, or cluster size and type for Mapping Data Flows. We’re not going to talk about SSIS in this book, but it is a very powerful feature in ADF. Basically, you can take your existing SSIS packages from SQL Server and execute them in the cloud using an ADF pipeline. The SSIS Integration Runtime provides the SSIS compute on VMs in a fully managed environment. The Azure IR also has a Vnet option that allows you to execute your pipelines using compute resources that are inside protected networks. This is a very good option if you are working in a highly regulated industry or your corporate network policies require all services to be Vnets. ADF is a fully managed platform-as-a-service (PaaS) offering, so you do not manage any servers. Since the integration runtimes are the mechanism defining the compute you wish to use for pipeline and data flow execution, this is where you can specify that you need to execute in a protected network. Mapping Data Flows, where we will spend a lot of time digging into in this book, execute on the Spark compute that you specify in the Azure IR. When we get to building our first pipeline with data flows, we’ll talk about optimizations and details of the IR. The Azure IR is a fully serverless managed microservice inside ADF that runs in the cloud. However, you can also configure the networking to connect to your on-premises data sources by peering your network to the Vnet created for your Azure IR. When executing data flow activities in an ADF pipeline, you can use that technique if your data is not in the cloud and not in Azure. Self-Hosted Integration Runtime Another approach to executing ADF pipeline activities in a private network or to connect to on-premises data is by using the self-hosted integration runtime or SHIR. This is a software download that you will install on-premises or on a virtual machine that has visibility to your data. ADF will communicate with the SHIR in order to provide access to data in your data center . Self-hosted IR is not supported by Mapping Data Flows, so instead, you’ll use the Vnet option in the Azure IR
  • 40. mentioned earlier. Management of all IRs is located in the manage section of the ADF UI left-hand navigation panel (see Figure 2-3). Figure 2-3 Management screen for Integration Runtimes Elements of a Pipeline The ADF pipeline is the most fundamentally important artifact in your factory, so let’s dig into a pipeline first. In Figure 2-4, you’ll see an example of a very simple pipeline. It is made up of five activities each interconnected with directional edges in green. The Data Flow activity has both a green and red connector emanating from it. ADF will take the red path if there is a failure from the result of the activity execution, and the green path signifies success. The flow of execution in an ADF pipeline is left to right. If you add activities without connecting lines, those disconnected activities will execute in parallel at the same time as the first node in your connected graph.
  • 41. Figure 2-4 Sample ADF pipeline This sample is a pipeline that will call a data flow to process type 2 slowly changing dimensions (SCDs) in a for each loop. The get metadata activity at the start of the pipeline is using a dataset called “genericfolder ” (Figure 2-5). This dataset points to a folder in my Azure Blob Store and will loop through each of the files to process different files for each dimension in the target analytical model. Figure 2-5 Dataset called “genericfolder” The definition of the blob storage account is stored in the Linked Service property of the dataset. In the linked service settings, you set the authentication method and provide credentials for the dataset to use when connecting to your data. In my example, “AzureBlobStorage1
  • 42. ”, I’m using the account key authentication method in the linked service (see Figure 2-6). Figure 2-6 Linked service Note that in the dataset “genericfolder ” in Figure 2-5, I am pointing to a folder in my blob store located at mycontainer/SampleData. This will tell ADF to find all files in that folder and return the list to the get metadata activity. That metadata will now be available to the next activity in the pipeline, which is a For Each activity. The For Each iterates over each item in a collection. In this case, the collection will be the list of files found in the folder from the get metadata activity . To set an iterator inside the For Each, reference the name of the for each activity and access the output.childItems array from the activity: @activity('Get
  • 43. files').output.childItems. That will contain the list of files from the dataset folder. The formulas you write in the ADF pipeline expression editor are known as pipeline expression language ( https://docs.microsoft.com/azure/data- factory/control-flow-expression-language-functions ). To enter expressions, click on the “Add dynamic content” link next to properties and fields in the ADF pipeline designer that allows for custom expressions. The expression editor will slide in (see Figure 2-7), which is where you can enter your expression. To enter the earlier expression for the for each iterator, click on Settings ➤ Items in the get metadata activity settings panel.
  • 44. Figure 2-7 Pipeline expression editor Figure 2-8 Get metadata settings Now that the For Each is connected to the get metadata activity (Figure 2-8), you can add activities inside of the for each. The activities will execute one time for each item in the list. Inside of the for each is a data flow activity (Figure 2-9) that executes the first of two Mapping Data Flows. This first data flow will clean and prep the data from each of the files. You’ll learn how to build data cleaning, data quality, and data prep data flows later in this book. To add activities to the For Each, you can double-click on the node in the pipeline graph.
  • 45. Discovering Diverse Content Through Random Scribd Documents
  • 46. after returning home met an accidental death by a falling tree. The ancestral home of the Needhams is near Frankfort, Ky. The Gundy family is held in high esteem in their home county and the members of the family are well respected by their friends and acquaintances. Charles T. Gundy was educated in the rural schools and attended the Memphis Academy for one year. Circumstances were such that he found it necessary to do considerable studying at home and “burned the midnight oil” in the pursuit of an education. He fitted himself for teaching and taught for four years in the schools of his native county. In the meantime he read law and was successful in being admitted to the bar in 1902. For three years thereafter he practiced his profession in Memphis. He then secured a Government position in the postoffice department at Washington, D. C., and pursued his law studies in the National University at Washington. He graduated from that institution May 30, 1908. Having small desire to become a mere cog in a great machine, as seemed to be the lot of thousands of Government employes, he resigned his position in October of the same year and located in Keokuk, Iowa, and had charge of the farm loan department of the State Central Savings Bank. He resigned this position in March of 1910 and came to Atchison, opening an office in the Auld building on Commercial street. Since this time he has built up an excellent practice. He was appointed city judge in December of 1910 to fill a vacancy caused by the resignation of Judge J. P. Adams. He was elected to the office in 1912 and again elected in 1914. Judge Gundy was united in marriage with Eleanor M. McCormick on August 12, 1909. Mrs. Gundy was a resident of Washington, D. C., and is a daughter of John McCormick, who died in 1905. Judge Gundy is a member of the Baptist church and he and Mrs. Gundy have a wide circle of friends who esteem them for their many likable qualities. The Republican party has always claimed the allegiance of Judge Gundy and he takes an active and influential interest in political affairs.
  • 47. LOUIS R. KUEHNHOFF. Louis R. Kuehnhoff, farmer and stockman, of Lancaster township, Atchison county, Kansas, was born January 1, 1880, on the farm where he now resides. He is a son of Charles and Caroline Kuehnhoff, and is one of nine children, six of whom are living. The father was born in Germany in 1841, and left there when a boy of sixteen years and sailed for New York. He remained there a short time when he went west, arriving at St. Joseph, Mo. He had not been there very long when the Civil war broke out and he enlisted at St. Joseph in Company B of the Volunteer infantry. After the war was over he was mustered out at Lexington, Mo., having won a praiseworthy military record in his country’s service. He then returned to civil life in St. Joseph, Mo., where he worked for a time as a laborer, receiving eight dollars a month. Shortly afterward he came to Atchison county, Kansas, and bought eighty acres of land in section 10, Lancaster township. Using oxen, he broke the ground on his newly acquired farm and began to improve it as rapidly as his resources would permit. In 1894 he retired and went to live at the National Soldiers’ Home at Leavenworth, Kan., where he died in 1903. The mother was born in Germany in 1845, and died in 1899. Louis R. Kuehnhoff grew up on his father’s farm, and attended Eden district school, and also District No. 3, Lancaster township. He remained at home until he was nineteen years of age, and the next five years worked as a farm hand, and then he bought the old home place of 200 acres. Louis Kuehnhoff is an industrious worker. He keeps graded stock of all kinds and takes a special interest in fine mules. He always attends the county fairs in Atchison county and occasionally makes entries. On April 26, 1905, he was married to Lena Werner, who was born in Germany November 2, 1881. Her parents were John and Marie (Earhart) Werner. The father was born in Germany in 1815. He belonged to the Masonic lodge in Germany. In 1889, when he was quite an old man, he came to America and settled at Leavenworth, where he died in 1891. The mother was born in Germany January 17, 1843, and is now living with her children, of whom there are six, as follows: Adam, teamster, Leavenworth, Kan.;
  • 48. Martha Nolan, deceased; Lizzie Loman, Bowling, Kan.; Katherine Weimer, Wallula, Wyandotte county, Kansas; Lena, wife of Mr. Kuehnhoff, of this review. Mrs. Kuehnhoff attended the Pleasant Ridge school and the German school, north of Potter, Kan. She is a good, loyal, hard-working mother, and has three children: Marie, Edna and Edwin. The last two are twins and are three years old. In politics Mr. Kuehnhoff is independent. He is a member of the Independent Order of Odd Fellows. He is a progressive farmer and is constantly on the lookout for improvements in agricultural methods. He has a fine eight-room house and a large barn equipped with modern conveniences. He also has a stone milk-house which was built by his father years ago. He has a small but thriving orchard and has twelve head of fine cattle. Besides these, he has four horses and a span of excellent mules. Mr. Kuehnhoff takes a lively interest in his stock and in his farm generally.
  • 49. BENJAMIN FRANKLIN SANDERS. All honor to the pioneer settlers of Kansas. It was they who broke the way in the unpeopled wilderness and endured the hardships and privations on the frontier of advancing civilization in order that the path of empire might be pushed steadily westward, ever onward toward the setting sun. Their work is done; the halcyon pioneer days when this broad land was but a vast unbroken wilderness of waving prairie grass, dotted here and there with belts of timber along the streams, is no more; towns and cities have sprung up; the locomotive shrieks its way over the ribbon-like rails, hauling the products of the land to the millions in need of sustenance, where once the hardy freighters drove their mule teams and guarded the precious freight overland to the homes of the settlers in the West. Benjamin Franklin Sanders is one of the few remaining members of the “old guard,” who sixty years ago began the task of reclaiming a wilderness. He is one of the ranking old pioneer settlers of Atchison county and has lived a record which is thrilling and interesting to a high degree. He is the only living “ye old time fiddler” in Atchison county, who with his comrade was wont to play at the old-time dances and “hoe downs” in northeast Kansas fifty years and more ago. Benjamin Franklin Sanders is now living retired in Center township, Atchison county. He was born August 8, 1833, in Franklin county, Missouri, and is a son of George and Elizabeth (Graham) Sanders, who were the parents of the following children: Nancy married William McQuillan, and by her second marriage became Mrs. William Burns, and died in Benton county, Missouri; Robert, deceased; Oliver died in Jewell county, Kansas; Lydia married Fred Wilming, and died in Shannon township, Atchison county; William died in Franklin county, Missouri; and Benjamin, the subject of this sketch. Benjamin F. Sanders was sent to the country school in Franklin county, Missouri, but the school was poor and the roads were bad in the winter time, and, altogether, he had little opportunity to learn. His whole time in school, he estimates, did not amount to more than three months. His father was a Kentuckian and
  • 50. followed farming all of his life, and died in 1856, at the age of fifty- five years. The mother was a native of Missouri and of Scotch descent. She died in Kansas, in 1872, at the age of seventy-six years. B. F. Sanders B. F. Sanders and His Great-Granddaughter, Gail Maxine Keirns, Daughter of Mr. and Mrs. Art Keirns. At the age of twelve Benjamin F. Sanders was apprenticed to a carriage and wagon-maker in St. Louis, Mo. He remained there twelve years, coming to Kansas in 1856. He returned to Missouri for a short time and then came back to Kansas the following year. He opened a wagon-maker’s shop at Monrovia, Atchison county, which he operated for two years. He then engaged in farming, taking up a claim near where Effingham now stands. This was ten miles from any settlement then and Mr. Sanders, fearing that the district would not be settled, gave up his claim and preëmpted eighty acres one and one-half miles north of where he now lives, in Center township, and began his life as a real farmer. He hired a man from Iowa who had six yoke of oxen to break up his land. He lived in the most primitive way during the first years on this place. Coffee, for one thing, was
  • 51. very high in price at that time, and there also was very little money in the territory, so a substitute for coffee was used. They mixed wheat and rye, calling it essence of coffee, and used this as a beverage in place of the regular coffee. It was the same way with flour. When he needed flour he would take a quantity of wheat to the gristmill where it would be ground into coarse flour, nearest mills being at Valley Falls and Kickapoo. His nearest postoffice was at Oceana, just north of Pardee, where the postoffice was located later. In 1860 Mr. Sanders bought more land. At one time he owned as high as 400 acres of land in Center township, Atchison county, Kansas. He went through the whole evolution of civilization, beginning in a little log house on his first eighty acres of land and passed through the wild days of the border war. In 1863 he was a member of Captain Whittaker’s company of Colonel McQuigg’s regiment of the Kansas State militia. He participated in several skirmishes and was honorably discharged at Ft. Leavenworth in 1864. In 1859 Mr. Sanders married Margaret Ramsey in Putnam county, Ohio, who was born in 1840. She was a daughter of John and Elizabeth (Dorothy) Ramsey, natives of Ohio. She died in 1868, leaving the following children: Ira, farmer, Whiting, Kan.; Bertha (Mrs. C. G. Moore), deceased; William and Little Joy, both deceased. Mr. Sanders was married a second time in 1870 to Mrs. Elizabeth (Ramsey) Keirns, a sister of his first wife. She died in May, 1904. She was the widow of Rufus Keirns, and by her last marriage three children were born: Henry R., farmer, Pardee, Kan.; Mrs. Etta C. Browne, Pardee, Kan.; Benjamin, Jr., died when seventeen years of age. Mr. Sanders is a Republican and a member of the Methodist Episcopal church. He is now living with Arthur Keirns, a son of his step-son. In these days his life is rather quiet compared with the early-day existence which he passed through. Indians camped near his farm when he first came to Kansas. The trail to the Kickapoo reservation passed near his farm and the Indians were constantly traveling back and forth along it. He has a hobby of “fiddling.” He calls himself a “fiddler” in distinction from a violinist. He played at the first corn carnival held in Atchison and won a prize. He used to play with Samuel King, a well known “fiddler,” and they played for all the old “hoe down” or “break down” dances. Although he is eighty-
  • 52. three years old, he still plays his “fiddle” with as much vim as ever and his ear is just as ready as it was when he was a young man. In addition to being a farmer, Mr. Sanders has done a large amount of carpenter work in Kansas. He has built a number of barns and other buildings. Mr. Sanders was elected to the office of township trustee and held the office two terms, having been reëlected at the close of his first term.
  • 53. KARL AUGUST KAMMER. Karl August Kammer, farmer and stockman, Lancaster township, Atchison county, Kansas, was born on the farm where he now lives, October 12, 1869, and is a son of Karl and Joehanna (Hida) Kammer. He is one of six children: Joehanna (Gutzman), deceased; Emma (Fuhrman), Lancaster township; Karl, subject of this sketch; Julius, Lancaster township; Bertha H. (Buttron), Lancaster township; one child who died in infancy. The father was born in Germany in 1840. Leaving there in 1862, he came to Atchison county, Kansas, where he worked in a vineyard for two years. The following four years he was employed in a brewery at Atchison, and then farmed two years in Lancaster township. At that time he had a chance to buy 160 acres in section 16 of Lancaster township, and with the aid of a partner, the land was bought. He built a one-room shanty and a thatched barn, and broke prairie with the oxen and planted the first crop. Later a better house and barn were built, and gradually, other improvements were added and a fine orchard planted. At the time of his death, in October, 1910, Mr. Kammer owned 240 acres of land. The mother was born in Germany, February 20, 1840, and married in her native land just before coming to America. She died in 1904. Karl Kammer, the subject of this sketch, was reared on his father’s farm in Lancaster township. He attended High Prairie district school, No. 3, and remained on the home farm until he was twenty-six years old, when he rented some land from his father, and six years later he was able to buy the land he had been renting. He improved the farm considerably and stocked it with graded cattle, and now has an excellent farm, modern in every respect, consisting of 160 acres of land, and also has a fine orchard of two acres. Mr. Kammer was married October 23, 1895, to Emma Buttron, a native of Lancaster township, Atchison county, born August 14, 1870. She is a daughter of Henry and Rosa (Scheu) Buttron, the father a native of Germany, born in 1833. When a young man he left his native land and came to America, locating in Pennsylvania where he
  • 54. worked as a blacksmith. From there he went to Elgin, Ill., and continued at his trade, and in 1857, he moved to Atchison, Kan., following blacksmithing for a short time. He then preëmpted 160 acres of land in Lancaster township, where he built a house. The first crop was destroyed by grasshoppers, and he was forced to return to his trade during the following winter. When spring came, he went back to his farm and that year was successful and his start was assured. Mr. Buttron bought more land and continued to make improvements, and after a long and prosperous career he died in 1914. Mr. and Mrs. Kammer are the parents of three children: Katherine, Rosa and Henrietta, all living at home with their parents. Mr. Kammer is a Republican, and is a member of the Independent Order of Odd Fellows. Mr. and Mrs. Kammer and family are members of the Evangelical Lutheran church of High Prairie neighborhood.
  • 55. MARSHALL J. CLOYES. The demise of Marshall J. Cloyes May 5, 1915, marked the passing of one of the sturdy figures who assisted in developing Atchison county, and was one of the grand old men of the city. At the time of his death he was probably the oldest living pioneer settler of Atchison county, in point of age and years of residence in the county. For over half a century he had been one of the well known and distinguished characters whom people trusted and respected. In the days when strong men were required to redeem a wilderness and make it habitable for men and their progeny, Marshall Cloyes was one of those who never gave up the fight. During the terrible drought of 1860, when scores of families deserted their homes and left the State, he and his family were among those who decided to remain and win out over the vagaries of nature. His faith in the future of Kansas was amply justified as the years rolled on and ever increasing prosperity came to him and his, as a just and equitable reward for a faith and confidence bestowed upon the new country during a time which tried men’s souls and caused weaker mortals to give up the fight. He was born at Salisbury, Vt., October 24, 1826, and descended from sturdy New England ancestry. His parents were Elijah and Mary (Beach) Cloyes. On his father’s side his ancestry can be traced back in the centuries to two brothers who settled in New England in the seventeenth century. His grandfather was William Cloyes, who fought for his country in the War of 1812. The boyhood days of Marshall were spent in the town of Salisbury, where he attended the public schools and later pursued his education in a private school. He learned the trade of shoemaker but did not follow it to any great extent. In 1847 he engaged in the lumber business at Ripton, Vt., and was there for twelve years prior to coming to Kansas. From the town in which he was born he came to Kansas, arriving here in Atchison June 2, 1859. The following autumn his wife and sons followed him and during the ensuing winter the family lived in a two room hut, on the rear of the lots where Mrs. Jacob Leu’s residence now stands. On February 21, 1860, they loaded all their goods in a wagon, and with
  • 56. an ox team moved to a farm north of Lancaster. During the night an old-time Kansas blizzard gave them a cold reception in their new home. When Mr. Cloyes had agreed to pay $650 for his first quarter section of land he was still shy $2.50 of the necessary amount, and was forced to borrow this small sum from a kind neighbor. During the following summer he worked in Oliver Davis’ sawmill and got enough lumber to build a shanty on his farm. While this was building the family lived in two rooms in the home of John S. Rust. In the fall of the bad year of 1860, Mr. Cloyes decided to try to cash in on the reputation he had left behind him in Vermont, and applied to an uncle for a loan of $400. The uncle readily responded with the statement in his letter, “If you are ever able, I know you will pay it back; if you are never able to pay it back I can get along without it.” During the summer Mr. Cloyes put in his spare time cutting prairie hay and stacking it. When fall and winter came on, the returning freighters from Pike’s Peak were willing to sell their oxen and wagons for almost any price. Mr. Cloyes invested a part of his $400 capital in these outfits, wintered the cattle on the hay, and in the spring was able to dispose of the oxen for more than double the purchase prices. During the next two years he was enabled to pay off all of his debts, and prosperity attended his efforts from that time on. By the hard work and good management of himself and his two sons he increased his holdings to an entire section of land. He remained on the farm until 1872, then gave the farm to his sons and removed to a home at 417 North Seventh street in Atchison. On July 5, 1848, Mr. Cloyes was married to Miss Betsy Henderson, of Middlebury, Vt., who died in Atchison in 1893, leaving two sons, Frank E. and Mark S. On September 15, 1909, he took a second wife, the bride being this time Mrs. Matilda Franke, of Atchison. She was born at Thuringen, Germany, November 16, 1855, a daughter of John and Christiana (Temme) Franke, who immigrated to America in 1858, making the long sea voyage in a sailing vessel which took six long weeks to make a trip, which is now made in six days. From New York City the Frankes came directly to St. Louis, and there made their home until their removal to Atchison. At the outbreak of the Civil war, John Franke volunteered his services in defense of the Union which had given him a home. He served in a Missouri regiment of volunteers for one year, and was then discharged on account of serious disability, caused by the
  • 57. hardships which he had undergone. He was never the same man afterwards, and died in 1865 as a direct result of his disabilities incurred in behalf of his adopted country. The mother and family lived in St. Louis until 1883 when they removed to Atchison. Mrs. Franke died some years later at the home of her daughter, Mrs. Cloyes. Matilda Franke was first married to Theo A. Franke, a native of Saxony, Germany, in 1879, and who came to America when a youth of eighteen years of age, and settled in Pittsburgh, Penn. Theo A. Franke was also a veteran of the Civil war, having enlisted in 1861 in Company D, Seventy-fourth regiment, Pennsylvania infantry. He served throughout the great conflict and was wounded several times while participating in the battles fought by the Army of the Potomac. He enlisted again, after being discharged on account of a serious wound, and was a brave and valiant soldier who fought for sheer love of his adopted country. Mr. Franke’s first trip to Atchison was made in 1859, but he returned to Pittsburg upon the outbreak of the Civil war and there proffered his services as stated above. He returned to Atchison after the close of the war and here met, in the course of years, Matilda, who was visiting friends in Atchison. Their acquaintance ripened into a warm friendship which gave place to love and they were married March 10, 1879. A happy wedded life endured until Mr. Franke’s death in 1882. Children blessed this union as follows: Rose M., wife of Bert Gilmore, an electrician of Atchison; Elsa, wife of Fred Moore, a railway engineer of Falls City, Neb.; Theo Franke, of Pierce, Ariz. During Mr. Franke’s first year of residence in Atchison he was a freighter across the plains. Upon his return in 1865 he entered the grocery business and prospered, accumulating considerable property interests. He was well known in Atchison and was considered to be one of the city’s most substantial men. Mr. Cloyes was prominently identified with the political affairs of the county and was an influential leader of the Republican party for many years. Even before coming to Atchison from the farm he had taken an active interest in politics in his home township and county. He was elected to represent his district in the State legislature in 1867, leaving the impress of his individuality upon laws passed in the following session. For eight years he served in the Atchison city council and in 1891 was elected mayor. Two years later he was reëlected. Honorable and thoroughly upright in all his
  • 58. dealings, his administrations were characterized by integrity, sound judgment and an unusual amount of good sense. He was a member of Washington Lodge, No. 5, Ancient Free and Accepted Masons, and all who knew him respected him for his sterling worth.
  • 59. MARK D. SNYDER. Mark D. Snyder, retired farmer, living in Monrovia, Atchison county, Kansas, is a native son of Kansas, having been born in Atchison county November 2, 1858. He is a son of Hon. Solomon J. H. Snyder, one of the influential figures of the early pioneer days of Kansas, and who was a stanch and uncompromising adherent of the Free State principles. The father of Mark D. was born in Washington county, Maryland, February 7, 1812, and died at Monrovia, Atchison county, November 28, 1873. When eight years of age he accompanied his father to Tuscarawas county, Ohio, where he was educated in the district schools and a graded school at Canton, Ohio. Between 1830 and 1833 he cleared a farm of 160 acres of heavily timbered land. In 1838 he married Susan Winklepleck and then cleared and cultivated a tract of timber land which he purchased until 1848. His wife died in that year, leaving him with three small children. He sold all of his holdings, placed his children with neighborhood families and then traveled 4,000 miles in an endeavor to forget his great loss and overcome his grief over the death of his wife. Later, he married Eliza Fisher, and in 1852 removed to Indiana, and then came west to Ft. Leavenworth in 1854. On the morning of May 4, 1854, he made the first legal homestead claim ever entered in the State of Kansas, comprising the land upon which the southern part of the city of Leavenworth now stands, and then returned to Indiana for his family. On his return to his homestead he found his claim “jumped” and the country in the hands of border ruffians. He was driven from the polls at the first election held in the Territory on account of his Free Soil principles. Two other claims which he bought were wrested from him by a pro-slavery “squatter court,” his life threatened, and he sought refuge in an unsettled part of the State where Monrovia now stands. Here he made his home and became prominently identified with the politics of the new State of Kansas. In 1862 Mr. Snyder was elected to the State legislature and served for two terms in the house of representatives, and one in the senate, where he did faithful and conscientious work in behalf of the people of Kansas.
  • 60. Solomon J. H. Snyder was a devoted Christian, and was one of the organizers of the first Lutheran church organization in the State, at Monrovia, of which he remained a member until his demise. He was a great Sunday school worker and wrote two very interesting and valuable Sunday school books, “The Lost Children” and “Scenes in the Far West,” and at the time of his death was engaged in the preparation of a work entitled, “The Evidences of Christianity.” His influence was ever in behalf of the betterment of mankind and his Christianity was of the practical kind which introduces helpfulness, kindness and forbearance into our daily lives. The children of S. J. H. and Eliza (Fisher) Snyder were as follows: Angeline (Conley), deceased; Mrs. Sarah Dunn, of Anadarko, Okla.; Mrs. Cora Shifflet, deceased; and Mark D. The three children by his first wife were: Mrs. Susan Reck, deceased; Mrs. Anna Berndt, of Mexico City; and J. H., San Diego, Cal. The mother of these children was born in Ohio in 1838, and died at her home near Monrovia, in 1896. Mark D. Snyder, with whom this review is directly concerned, was born, reared, and reared his own family in Atchison county. He is one of the real native born citizens of the county. Upon the death of his father he took charge of the old home place, and when his mother died he purchased the family estate. By the exercise of industry and economy, aided by good financial judgment, he has become the owner of 240 acres of excellent land which is well improved and one of the most productive tracts of land in northeast Kansas. He cultivated his broad acres assiduously until 1909, when he turned over the management of his farm to his son, and retired to Monrovia, where he now resides. Mr. Snyder was married November 30, 1881, to Helen M. Maxfield, and this union has been blessed with eight children, namely: Elsie and Minnie, deceased; John, who is farming the home place; Mark, living in Omaha, Neb.; Mildred, deceased; Margaret and Marguerette, twins, deceased; James, a boy twelve years old, living with John on the home farm. The mother of these children was born in Henry county, Illinois, a daughter of David and Anna (Freeze) Maxfield, who first emigrated from Illinois to Sedgwick county, Kansas, and in 1873 came to Atchison county. Mrs. Snyder died in 1909. Mr. Snyder has always been a loyal supporter of the Republican party, is an attendant of the Lutheran church, and is a
  • 61. member of the Ancient Order of United Workmen, of Effingham, Kan.
  • 62. EDWARD PERDUE. Edward Perdue, president of the First National Bank of Atchison, and extensive farmer, of Huron, Kan., has been a resident of Atchison county for the past forty-five years. Like other successful men who were pioneers in Kansas, he arrived here from Canada when a young man of twenty years of age without money, but possessed of strength, a willingness to work at honest labor and an ambition to succeed. How well he has succeeded is seen in the substantial fortune which he has accumulated and the honors which have been conferred upon him by his fellow citizens. Mr. Perdue was born on a farm in Peterboro county, Ontario, Canada, June 27, 1850, a son of Thomas and Catharine Perdue, natives of Ireland, who left the Emerald Isle in their youth and settled in Canada. Edward Perdue was reared to sturdy young manhood on the parental farm and attended the country school in the vicinity of his home as opportunity afforded. In March of 1870 he arrived in Atchison, and during his first year worked at any odd jobs which were presented, including labor on the streets and harvesting on the nearby farms. During the following five years he was employed as a construction foreman on the grading and building of the Santa Fe railroad from Atchison to the Colorado-Kansas State line. He saved his money and by the exercise of strict economy, which meant the denial to himself of all but the actual necessities of life, he was enabled to accumulate sufficient funds to invest in a farm near the town of Huron, on which he resided for the next five years. He then sold this farm and bought another one about one and one- half miles east from Huron, which remains his home to the present time. Mr. Perdue has given his attention mostly to the raising and feeding of live stock in his farming operations and has succeeded in amassing a comfortable fortune during the forty years he has been an agriculturist. He has increased his land holdings until at the present time he is the owner of 1,040 acres of splendid farm lands in Lancaster township. His home farm is one of the best improved tracts of farm land in the county and all of his farms show the results obtained from soil conservation and advanced methods of farming.
  • 63. Edward Perdue While Mr. Perdue has been primarily a farmer, he has given his attention to other matters as betokens a man of influence and substance. In the year 1891 he assisted in the organization of the Huron State Bank and is president of this thriving concern. In 1906 he took part in the organization of the Commercial State Bank of Atchison, which was succeeded later by the First National Bank, of which banking institution he has served as president since 1900. He is also a stockholder of the State Savings Bank of Leavenworth, Kansas. Mr. Perdue was married in 1878 to Mary Viola Davey, of Brown county, Kansas, a daughter of Charles Davey, which marriage has resulted in the birth of seven children, as follows: Mrs. Maria Walters, living on a farm near Huron; Edna, wife of J. M. Delaney, merchant, of Huron, Kan.; Mrs. Mabel Schmidt, wife of the assistant cashier of the Huron State Bank; Charles, who is cultivating the home farm; Thomas Hendricks, at home; George, a farmer in North Dakota; and Edward, Jr. Mr. Perdue has been a life-long Democrat, who has always taken a more or less active part in the political affairs of the county. He was elected county commissioner in 1897 and served one term. In 1904 he served one term as a member of the State legislature, representing this district, declining reëlection when his term of office expired.
  • 64. While he was reared in the Catholic belief, Mr. Perdue is tolerant of all creeds and takes a broad-minded view of religious matters. He belongs to the Ancient Order of United Workmen and the Modern Woodmen.
  • 65. DR. CHARLES L. HIXON. Dr. Charles L. Hixon, a leading dental practitioner of Atchison is a native son of Kansas and comes of a pioneer family of the State. He was born on a farm in Jackson county, Kansas, January 14, 1872, and is a son of John S. and Alice (Clark) Hixon. His father, John S. Hixon, was born in Ohio in 1850, a son of Jacob and Cassandra (Stonebraker) Hixon, who resided in Ashland county, Ohio, until their removal to Putnam county, Indiana, in the early pioneer days when that part of the Hoosier State was being settled by large numbers of Ohio people. Alice Clark Hixon, mother of Dr. Hixon, was likewise born in 1850 in Putnam county, Indiana, a daughter of Andrew Jackson and Harriet (Mann) Clark, natives of New York State, and also pioneer settlers of Putnam county, Indiana. While John S. Hixon and Alice Clark were attending the district school in the neighborhood of their respective homes, they became great friends, and the warm friendship ripening into love which culminated in their marriage several years later in Jackson county, Kansas. The Hixons and Clarks were essentially pioneers, and the history of the family for generations shows that some member of the family, or several of them, have been continually pushing westward and settling in the newer countries. Jacob Hixon was one of the first men in his neighborhood to hearken to the call of the West, and, after disposing of his land holdings in Putnam county, Indiana, he with all of his family migrated to Kansas, settling in Jackson county. They arrived in Atchison during the stormy days of the Civil war, and at a time when the local vigilance committee was in control of community affairs and were naturally very suspicious of all strangers. There had been considerable lawlessness in Atchison and neighboring towns and many outrages had been perpetrated by border ruffians and outlaws. The vigilance committee had taken charge of the affairs and had summarily lynched three men on the banks of White Clay creek just previous to the arrival of the Hixon family. Mr. Hixon was interrogated as to his loyalty to the Union and asked his intentions. His replies being satisfactory to the members of
  • 66. the committee, he was allowed to proceed on his way to Jackson county and arrived at Holton, Kansas, without further delay. Jacob Hixon settled on a fine farm near Holton, developed it and prospered as the years rolled on and the country became more and more settled. He died in 1905, at the advanced age of eighty-four years, his wife, Cassandra, departing this life in 1885. The Clark family came to Kansas from Indiana in 1868, and Andrew Jackson Clark naturally settled in that part of Jackson county where his old friend and neighbor had chosen his place of residence. The intimacy which had existed between the two families in Putnam county, Indiana, was renewed, and as time went on, John S. Hixon and Alice Clark grew to maturity and were united in marriage. Their married life has been a happy and prosperous one, and five children have blessed this union: Dr. Charles L. Hixon, with whom this review is directly concerned; Mrs. J. C. Neeley, of Weiser, Idaho; Ernest H. Hixon, of Kansas City, Mo.; one child died in infancy. John S. Hixon became prominently identified with the civic life of Jackson county and is serving his county well and faithfully as treasurer for two terms, having been elected on the Republican ticket in 1912 and again in 1914. Mr. and Mrs. John S. Hixon reside in Holton, in Jackson county, and are prosperous and well respected in the neighborhood. Dr. C. L. Hixon spent his boyhood days on the farm and early learned to assist in the farm work. He received his elementary education in the district schools, and was ambitious to secure a higher education. He has practically educated himself, and after learning all that was possible for him to learn in the country school, he attended Campbell College, at Holton, Kan., for two years. His ambition was to become a dentist, and with this end in view he matriculated in the University of Iowa in 1895. After spending two profitable years in this institution in the study of dentistry he returned home, and a short time later opened an office in Atchison, where he has practiced continuously for the past eighteen years. After seven years of practice in his first location, he opened well equipped offices at 519 Commercial street, and remained there until his removal to his present location at 613 Commercial street, where he has offices equipped with all the latest appliances for facilitating his work. Dr. Hixon is kept very busy attending to the calls made
  • 67. upon him in the practice of his profession, and during the many years he has been located in Atchison, he has built up an extensive and lucrative practice. He finds time, however, to keep abreast of the latest developments made in his profession, and is ever seeking to better his skill and knowledge of dentistry. He has been distinctly honored by the members of his profession, having served as president of the Northeast Kansas Dental Association, and is at present an active member of this association. He is a leading member of the Atchison Dental Association, and ranks high in his profession, not only as a successful practitioner, but as a citizen who has the best interests of his home city at heart. He is a member of the Ancient Free and Accepted Masons, Washington Lodge, No. 5, and is fraternally affiliated with the Odd Fellows, the Modern Woodmen of America, the Rebekah and Eastern Star lodges. Dr. Hixon was united in marriage with Miss Inez B. Horn in 1902, and one child has been born to this union, Charles Horn Hixon, born May 25, 1907. Mrs. Inez B. Hixon was born in Atchison county, a daughter of J. H. and Catharine (Wallick) Horn, who reside at 1126 North Third street, Atchison. Mrs. Horn is a daughter of Benjamin Wallick, who served as sheriff of the county during the time of the Civil war.
  • 68. LOUIS KLOEPPER. Louis Kloepper, farmer and stockman of Lancaster township, Atchison county, was born January 18, 1888, on the farm where he now lives. He is a son of William and Fredericka (Von Derahe) Kloepper, who were the parents of four children as follows: Louis, subject of this sketch; Emma, deceased; William, deceased; Pauline, living at home. The father was born in Germany, December 14, 1853. He left there in 1883 and came directly to Atchison county, Kansas, where he bought eighty acres of land in section 27, Lancaster township. He farmed this one year, and in 1885 returned to Germany to be married. In 1886 he returned to his farm and began to improve it, building a large eight-room house in 1899 in place of the little three-room affair which stood on the place. In 1903 he built a fine 32×40 feet granary, and in 1904 he erected a large barn, 40×48 feet. The following year he bought more land and put up additional buildings, building in 1908 another barn, 32×40 feet. At the time of his death, February 7, 1913, he owned 240 acres of well improved land under cultivation, and thirteen acres of fine timber land. This achievement is the more remarkable in view of the fact that he landed with only $1,200. But he was industrious, and worked faithfully to improve his farm. He was a member, trustee and steward of the German Lutheran church. His wife was born in Germany, February 15, 1858, and is a daughter of Henry and Fredericka (Von Behren) Von Derahe, natives of Germany. The mother is now living with her son, Louis. Louis Kloepper attended the old Huron school of Lancaster township, and grew to manhood on the farm which he now operates. Since the death of his father he has had charge of the farm and has worked to the extent of his ability in installing modern improvements on his place. He owns 160 acres in section 27, Lancaster township, in addition to the home place, and has three acres of orchard and grove. He also has a vineyard which was the feature of the place which Louis, and his father before him, always loved most. Special attention has been given to the vineyard when other things had to be neglected, perhaps. It is the pride of Mr.
  • 69. Kloepper’s place. He keeps graded stock and is a practical farmer. He now is operating 400 acres of land, 114 acres of which are in corn, and ninety-three acres are in cloves, the latter having been unusually successful. He owns a threshing outfit and two clover hullers, a corn shredder, and three gas engines. He utilizes these engines in numerous ways, including pumping and threshing and plowing. Mr. Kloepper has a modern farm in every way and has all up-to-date improvements of a labor and time saving kind, as well as an automobile. He is a stockholder in the Farmers’ Mercantile Association of Effingham, Kan. He is a practical farmer, of the progressive type. In 1911 he married Marie Meier, a native of Germany, born July 3, 1888. She is a daughter of Henry and Fredericka (Finke) Meier, and was educated in Germany and left her native land at the age of seventeen. Mr. and Mrs. Kloepper have two children, Fredia, born November 13, 1911, and Emma, born April 21, 1913. Mr. Kloepper is an independent voter. He belongs to the German Lutheran church.
  • 70. CHARLES W. FERGUSON. Charles W. Ferguson, vice-president of the Atchison Savings Bank, is one of the best known men in financial circles of northeastern Kansas, and he is equally as well known over a large section of western Missouri. Mr. Ferguson was born at Plattsburg, Mo., December 29, 1862, and is a son of William L. and Fannie A. (Carpenter) Ferguson, both natives of Kentucky, whose parents were Virginians and very early settlers of the Blue Grass State. The Ferguson family removed from Kentucky to Missouri about 1851. They came up the Missouri river by boat as far as Liberty Landing, and later located in Clinton county, Missouri. The father was a merchant and also engaged in the grain business, and was an all around progressive business man. He was a Republican, and in 1862 was elected sheriff of Clinton county, being the first Republican elected to office in that county within a period of twenty-five years. During the Civil war he was captain of the Home Guards. He died in 1893, age 64 years. Charles W. Ferguson is one of a family of six children, as follows: John L., assistant general passenger agent of the Chicago & Northwestern railroad, Chicago, Ill.; Mary F., widow of M. B. Riley, and resides in St. Joseph, Mo.; Adelia M., Plattsburg, Mo.; Katherine, Plattsburg, Mo.; Charles W., the subject of this sketch, and Louis, a conductor on the Chicago & Northwestern railroad, resides at Highland Park, Ill. Charles W. Ferguson attended the public schools in Plattsburg until he was thirteen years old, and at that early age went to work in the express office at Plattsburg, where he remained about five years. He then entered the employ of Stonum Brothers, remaining with that company two years. He then accepted a position in the Plattsburg Bank, as bookkeeper and assistant cashier, remaining with that institution for seven years. He then went with the Schuster-Hax National Bank, St. Joseph, Mo., as receiving teller, and served in that capacity for four years. He resigned that position in June, 1894, to become bookkeeper of the Exchange National Bank of Atchison. He served with that institution in the capacity of paying teller, assistant cashier and cashier, resigning the latter position February 1, 1914. In November, 1914, he
  • 71. accepted a position with the Federal Reserve Bank, of Kansas City, Mo., and was with that institution for eight months, and in July, 1915, became vice-president of the Atchison Savings Bank. Mr. Ferguson has had a vast experience in the field of banking, and is well posted on the intricate problems of finance, and possesses the keen discriminating qualities of the successful banker. Mr. Ferguson was married April 28, 1892, to Miss Sallie Clay, of Plattsburg, Mo. She is a daughter of James M. Clay, a member of the Kentucky branch of the Clay family. Mr. Ferguson is a member of the Masonic lodge, the Benevolent and Protective Order of Elks and the Modern Woodmen of America.
  • 72. EARL V. JONES. Signal success in any one field of endeavor is worthy of recognition by the public, whether it be professional, inventive, mercantile or of an industrial nature. Some men are naturally gifted with the ability to become successful in the industrial and manufacturing field, and are mentally equipped with a certain amount of mechanical genius, along with decided business ability to take hold of a proposition, and makes it succeed, despite difficulties. E. V. Jones, treasurer and manager of the Bailor Plow Company, of Atchison, is one of the latter type who is fast climbing to a place of eminence in his chosen field of endeavor, and holds a high place among the manufacturing and mercantile interests of Atchison and the Middle West. Mr. Jones was born in Livingston county, Missouri, January 21, 1878, a son of Charles Jones, a building contractor, who was a native of Kentucky and a son of William Jones, owner of a large plantation in Kentucky, which was lost as one of the misfortunes which befell the family as a result of the Civil war’s ravages in Kentucky. Desirous of making a new start in a land further removed from internecine strife, and where opportunities for success seemed greater, William Jones removed to Missouri, and here Charles, the father of E. V., was reared and became successful in agricultural pursuits, the son, Earl V., being reared on the family estate in Livingston county, Missouri. The Jones family is originally of Scotch-Irish stock, the founder of the family emigrating from the north of Ireland to this country several generations ago. Charles Jones married Miss Jennie Wills, a daughter of John Wills, native of the east coast of England, and who immigrated to this country with his brother, George, and followed his trade of wagon maker successfully. John Wills owned and operated an extensive blacksmith and wagon maker’s shop at Chillicothe, Mo., which did a large business and made moderate wealth for its proprietor. Earl V. Jones, with whom this review is directly concerned, was educated in the common and high schools of his native county, and