Explore the full ebook collection and download it now at textbookfull.
com
Data Mining Models David L. Olson
https://textbookfull.com/product/data-mining-models-david-l-
olson/
OR CLICK HERE
DOWLOAD EBOOK
Browse and Get More Ebook Downloads Instantly at https://textbookfull.com
Click here to visit textbookfull.com and download textbook now
Your digital treasures (PDF, ePub, MOBI) await
Download instantly and pick your perfect format...
Read anywhere, anytime, on any device!
Enterprise Risk Management Models 2nd Edition David L.
Olson
https://textbookfull.com/product/enterprise-risk-management-
models-2nd-edition-david-l-olson/
textbookfull.com
Data Mining and Big Data Ying Tan
https://textbookfull.com/product/data-mining-and-big-data-ying-tan/
textbookfull.com
Data Mining Yee Ling Boo
https://textbookfull.com/product/data-mining-yee-ling-boo/
textbookfull.com
Mobile Data Mining Yuan Yao
https://textbookfull.com/product/mobile-data-mining-yuan-yao/
textbookfull.com
Learning Data Mining with Python Layton
https://textbookfull.com/product/learning-data-mining-with-python-
layton/
textbookfull.com
Learning Data Mining with Python Robert Layton
https://textbookfull.com/product/learning-data-mining-with-python-
robert-layton/
textbookfull.com
Mobile Data Mining and Applications Hao Jiang
https://textbookfull.com/product/mobile-data-mining-and-applications-
hao-jiang/
textbookfull.com
R Data Mining Implement data mining techniques through
practical use cases and real world datasets 1st Edition
Andrea Cirillo
https://textbookfull.com/product/r-data-mining-implement-data-mining-
techniques-through-practical-use-cases-and-real-world-datasets-1st-
edition-andrea-cirillo/
textbookfull.com
Computational Intelligence in Data Mining Himansu Sekhar
Behera
https://textbookfull.com/product/computational-intelligence-in-data-
mining-himansu-sekhar-behera/
textbookfull.com
Data Mining Models
Data Mining Models
Second Edition
David L. Olson
Data Mining Models, Second Edition
Copyright © Business Expert Press, LLC, 2018.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any
other except for brief quotations, not to exceed 400 words, without the prior permission of the
publisher.
First published in 2016 by
Business Expert Press, LLC
222 East 46th Street, New York, NY 10017
www.businessexpertpress.com
ISBN-13: 978-1-94858-049-6 (paperback)
ISBN-13: 978-1-94858-050-2 (e-book)
Business Expert Press Big Data and Business Analytics Collection
Collection ISSN: 2333-6749 (print)
Collection ISSN: 2333-6757 (electronic)
Cover and interior design by Exeter Premedia Services Private Ltd., Chennai, India
Second edition: 2018
10 9 8 7 6 5 4 3 2 1
Printed in the United States of America.
Abstract
Data mining has become the fastest growing topic of interest in business programs in
the past decade. This book is intended to first describe the benefits of data mining in
business, describe the process and typical business applications, describe the workings
of basic data mining models, and demonstrate each with widely available free
software. This second edition updates Chapter 1, and adds more details on Rattle data
mining tools.
The book focuses on demonstrating common business data mining applications. It
provides exposure to the data mining process, to include problem identification, data
management, and available modeling tools. The book takes the approach of
demonstrating typical business data sets with open source software. KNIME is a very
easy-to-use tool, and is used as the primary means of demonstration. R is much more
powerful and is a commercially viable data mining tool. We will demonstrate use of R
through Rattle. We also demonstrate WEKA, which is a highly useful academic
software, although it is difficult to manipulate test sets and new cases, making it
problematic for commercial use. We will demonstrate methods with a small but
typical business dataset. We use a larger (but still small) realistic business dataset for
Chapter 9.
Keywords
big data, business analytics, clustering, data mining, decision trees, neural network
models, regression models
Contents
Acknowledgments
Chapter 1 Data Mining in Business
Chapter 2 Business Data Mining Tools
Chapter 3 Data Mining Processes and Knowledge Discovery
Chapter 4 Overview of Data Mining Techniques
Chapter 5 Data Mining Software
Chapter 6 Regression Algorithms in Data Mining
Chapter 7 Neural Networks in Data Mining
Chapter 8 Decision Tree Algorithms
Chapter 9 Scalability
Notes
References
Index
Acknowledgments
I wish to recognize some of the many colleagues I have worked and published with,
specifically Yong Shi, Dursun Delen, Desheng Wu, and Ozgur Araz. There are many
others I have learned from in joint efforts as well, both students and colleagues, all of
whom I wish to recognize with hearty thanks.
CHAPTER 1
Data Mining in Business
Introduction
Data mining refers to the analysis of large quantities of data that are stored in
computers. Bar coding has made checkout very convenient for us and provides retail
establishments with masses of data. Grocery stores and other retail stores are able to
quickly process our purchases and use computers to accurately determine the product
prices. These same computers can help the stores with their inventory management,
by instantaneously determining the quantity of items of each product on hand. -
Computers allow the store’s accounting system to more accurately measure costs and
determine the profit that store stockholders are concerned about. All of this
information is available based on the bar coding information attached to each product.
Along with many other sources of information, information gathered through bar
coding can be used for data mining analysis.
The era of big data is here, with many sources pointing out that more data are
created over the past year or two than was generated throughout all prior human
history. Big data involves datasets so large that traditional data analytic methods no
longer work due to data volume. Davenport1 gave the following features of big data:
Data too big to fit on a single server
Data too unstructured to fit in a row-and-column database
Data flowing too continuously to fit into a static data warehouse
Lack of structure is the most important aspect (even more than the size)
The point is to analyze, converting data into insights, innovation, and
business value
Big data has been said to be more about analytics than about the data itself. The era
of big data is expected to emphasize focusing on knowing what (based on correlation)
rather than the traditional obsession for causality. The emphasis will be on
discovering patterns offering novel and useful insights.2Data will become a raw
material for business, a vital economic input and source of value. Cukier and Mayer–
Scheonberger3 cite big data providing the following impacts on the statistical body of
theory established in the 20th century: (1) There is so much data available that
sampling is usually not needed (n = all). (2) Precise accuracy of data is, thus, less
important as inevitable errors are compensated for by the mass of data (any one
observation is flooded by others). (3) Correlation is more important than causality—
most data mining applications involving big data are interested in what is going to
happen, and you don’t need to know why. Automatic trading programs need to detect
the trend changes, not figure out that the Greek economy collapsed or the Chinese
government will devalue the Renminbi (RMB). The programs in vehicles need to
detect that an axle bearing is getting hot and the vehicle is vibrating and the wheel
should be replaced, not whether this is due to a bearing failure or a housing rusting
out.
There are many sources of big data.4 Internal to the corporation, e-mails, blogs,
enterprise systems, and automation lead to structured, unstructured, and
semistructured information within the organization. External data is also widely
available, much of it free over the Internet, but much also available from the
commercial vendors. There also is data obtainable from social media.
Data mining is not limited to business. Both major parties in the U.S. elections
utilize data mining of potential voters.5 Data mining has been heavily used in the
medical field, from diagnosis of patient records to help identify the best practices.6
Business use of data mining is also impressive. Toyota used data mining of its data
warehouse to determine more efficient transportation routes, reducing the time to
deliver cars to their customers by an average 19 days. Data warehouses are very large
scale database systems capable of systematically storing all transactional data
generated by a business organization, such as Walmart. Toyota also was able to
identify the sales trends faster and to identify the best locations for new dealerships.
Data mining is widely used by banking firms in soliciting credit card customers, by
insurance and telecommunication companies in detecting fraud, by manufacturing
firms in quality control, and many other applications. Data mining is being applied to
improve food product safety, criminal detection, and tourism. Micromarketing targets
small groups of highly responsive customers. Data on consumer and lifestyle data is
widely available, enabling customized individual marketing campaigns. This is
enabled by customer profiling, identifying those subsets of customers most likely to
be profitable to the business, as well as targeting, determining the characteristics of
the most profitable customers.
Data mining involves statistical and artificial intelligence (AI) analysis, usually
applied to large-scale datasets. There are two general types of data mining studies.
Hypothesis testing involves expressing a theory about the relationship between actions
and outcomes. This approach is referred to as supervised. In a simple form, it can be
hypothesized that advertising will yield greater profit. This relationship has long been
studied by retailing firms in the context of their specific operations. Data mining is
applied to identifying relationships based on large quantities of data, which could
include testing the response rates to various types of advertising on the sales and
profitability of specific product lines. However, there is more to data mining than the
technical tools used. The second form of data mining study is knowledge discovery.
Data mining involves a spirit of knowledge discovery (learning new and useful
things). Knowledge discovery is referred to as unsupervised. In this form of analysis,
a preconceived notion may not be present, but rather relationships can be identified by
looking at the data. This may be supported by visualization tools, which display data,
or through fundamental statistical analysis, such as correlation analysis. Much of this
can be accomplished through automatic means, as we will see in decision tree
analysis, for example. But data mining is not limited to automated analysis.
Knowledge discovery by humans can be enhanced by graphical tools and
identification of unexpected patterns through a combination of human and computer
interaction.
Requirements for Data Mining
Data mining requires identification of a problem, along with the collection of data that
can lead to better understanding, and computer models to provide statistical or other
means of analysis. A variety of analytic computer models have been used in data
mining. In the later sections, we will discuss various types of these models. Also
required is access to data. Quite often, systems including data warehouses and data
marts are used to manage large quantities of data. Other data mining analyses are done
with smaller sets of data, such as can be organized in online analytic processing
systems.
Masses of data generated from cash registers, scanning, and topic-specific databases
throughout the company are explored, analyzed, reduced, and reused. Searches are
performed across different models proposed for predicting sales, marketing response,
and profit. The classical statistical approaches are fundamental to data mining.
Automated AI methods are also used. However, a systematic exploration through
classical statistical methods is still the basis of data mining. Some of the tools
developed by the field of statistical analysis are harnessed through automatic control
(with some key human guidance) in dealing with data.
Data mining tools need to be versatile, scalable, capable of accurately predicting the
responses between actions and results, and capable of automatic implementation.
Versatile refers to the ability of the tool to apply a wide variety of models. Scalable
tools imply that if the tools works on a small dataset, it should also work on a larger
dataset. Automation is useful, but its application is relative. Some analytic functions
are often automated, but human setup prior to implementing procedures is required. In
fact, analyst judgment is critical to successful implementation of data mining. Proper
selection of data to include in searches is critical. Data transformation also is often
required. Too many variables produce too much output, while too few can overlook
the key relationships in the data.
Data mining is expanding rapidly, with many benefits to business. Two of the most
profitable application areas have been the use of customer segmentation by marketing
organizations to identify those with marginally greater probabilities of responding to
different forms of marketing media, and banks using data mining to more accurately
predict the likelihood of people to respond to the offers of different services offered.
Many companies are using this technology to identify their blue-chip customers, so
that they can provide them with the service needed to retain them.
The casino business has also adopted data warehousing and data mining.
Historically, casinos have wanted to know everything about their customers. A typical
application for a casino is to issue special cards, which are used whenever the
customer plays at the casino, or eats, or stays, or spends money in other ways. The
points accumulated can be used for complimentary meals and lodging. More points
are awarded for activities that provide Harrah’s more profit. The information obtained
is sent to the firm’s corporate database, where it is retained for several years. Instead
of advertising the loosest slots in town, Bellagio and Mandalay Bay have developed
the strategy of promoting luxury visits. Data mining is used to identify high rollers, so
that these valued customers can be cultivated. Data warehouses enable casinos to
estimate the lifetime value of the players. Incentive travel programs, in-house
promotions, corporate business, and customer follow-up are the tools used to maintain
the most profitable customers. Casino gaming is one of the richest datasets available.
Very specific individual profiles can be developed. Some customers are identified as
those who should be encouraged to play longer. Other customers are identified as
those who are discouraged from playing.
Business Data Mining
Data mining has been very effective in many business venues. The key is to find
actionable information or information that can be utilized in a concrete way to
improve profitability. Some of the earliest applications were in retailing, especially in
the form of market basket analysis. Table 1.1 shows the general application areas we
will be discussing. Note that they are meant to be representative rather than
comprehensive.
Table 1.1 Data mining application areas
Application area Applications Specifics
Retailing Affinity positioning Position products effectively
Cross-selling; develop and maintain Find more products for customers
customer loyalty
Banking Customer relationship management (CRM) Identify customer value
Develop programs to maximize the
revenue
Credit card management Lift Identify effective market segments
Churn Identify likely customer turnover
(Loyalty)
Insurance Fraud detection Identify claims meriting -
investigation
Telecommunications Churn Identify likely customer turnover
Telemarketing Online information Aid telemarketers with easy data
Recommender systems access
Human resource - Churn (Retention) Identify potential employee turnover
management
Retailing
Data mining offers retailers, in general, and grocery stores, specifically, valuable
predictive information from mountains of data. Affinity positioning is based on the
identification of products that the same customer is likely to want. For instance, if you
are interested in cold medicine, you probably are interested in tissues. Thus, it would
make marketing sense to locate both items within easy reach of the other. Cross-
selling is a related concept. The knowledge of products that go together can be used
by marketing the complementary product. Grocery stores do that through position
product shelf location. Retail stores relying on advertising can send ads for sales on
shirts and ties to those who have recently purchased suits. These strategies have long
been employed by wise retailers. Recommender systems are effectively used by
Amazon and other online retailers. Data mining provides the ability to identify less
expected product affinities and cross-selling opportunities. These actions develop and
maintain customer loyalty.
Grocery stores generate mountains of cash register data that require automated tools
for analysis. Software is marketed to service a spectrum of users. In the past, it was
assumed that cash register data was so massive that it couldn’t be quickly analyzed.
However, the current technology enables the grocers to look at customers who have
defected from a store, their purchase history, and characteristics of other potential
defectors.
Banking
The banking industry was one of the first users of data mining. Banks are turning to
technology to find out what motivates their customers and what will keep their
business (customer relationship management—CRM). CRM involves the application
of technology to monitor customer service, a function that is enhanced through data
mining support. Understanding the value a customer provides the firm makes it
possible to rationally evaluate if extra expenditure is appropriate in order to keep the
customer. There are many opportunities for data mining in banking. Data mining
applications in finance include predicting the prices of equities involve a dynamic
environment with surprise information, some of which might be inaccurate and some
of which might be too complex to comprehend and reconcile with intuition.
Data mining provides a way for banks to identify patterns. This is valuable in
assessing loan applications as well as in target marketing. Credit unions use data
mining to track member profitability as well as monitoring the effectiveness of
marketing programs and sales representatives. They also are used in the effort of
member care, seeking to identify what credit union customers want in the way of
services.
Credit Card Management
The credit card industry has proven very profitable. It has attracted many card issuers,
and many customers carry four or five cards. Balance surfing is a common practice,
where the card user pays an old balance with a new card. These are not considered
attractive customers, and one of the uses of data warehousing and data mining is to
identify balance surfers. The profitability of the industry has also attracted those who
wish to push the edge of credit risk, both from the customer and the card issuer
perspective. Bank credit card marketing promotions typically generate 1,000
responses to mailed solicitations, a response rate of about 1 percent. This rate is
improved significantly through data mining analysis.
Data mining tools used by banks include credit scoring. Credit scoring is a
quantified analysis of credit applicants with respect to the prediction of on-time loan
repayment. A key is a consolidated data warehouse, covering all products, including
demand deposits, savings, loans, credit cards, insurance, annuities, retirement
programs, securities underwriting, and every other product banks provide. Credit
scoring provides a number for each applicant by multiplying a set of weighted
numbers determined by the data mining analysis multiplied times ratings for that
applicant. These credit scores can be used to make accept or reject recommendations,
as well as to establish the size of a credit line. Credit scoring used to be conducted by
bank loan officers, who considered a few tested variables, such as employment,
income, age, assets, debt, and loan history. Data mining makes it possible to include
many more variables, with greater accuracy.
The new wave of technology is broadening the application of database use and
targeted marketing strategies. In the early 1990s, nearly all credit card issuers were
mass-marketing to expand their card-holder bases. However, with so many cards
available, broad-based marketing campaigns have not been as effective as they
initially were. Card issuers are more carefully examining the expected net present
value of each customer. Data warehouses provide the information, giving the issuers
the ability to try to more accurately predict what the customer is interested in, as well
as their potential value to the issuer. Desktop campaign management software is used
by the more advanced credit card issuers, utilizing data mining tools, such as neural
networks, to recognize customer behavior patterns to predict their future relationship
with the bank.
Insurance
The insurance industry utilizes data mining for marketing, just as retailing and
banking organizations do. But, they also have specialty applications. Farmers
Insurance Group has developed a system for underwriting, which generates millions
of dollars in higher revenues and lower claims. The system allows the firm to better
understand narrow market niches and to predict losses for specific lines of insurance.
One discovery was that it could lower its rates on sports cars, which increased their
market share for this product line significantly.
Unfortunately, our complex society leads to some inappropriate business operations,
including insurance fraud. Specialists in this underground industry often use multiple
personas to bilk insurance companies, especially in the automobile insurance
environment. Fraud detection software use a similarity search engine, analyzing
information in company claims for similarities. By linking names, telephone numbers,
streets, birthdays, and other information with slight variations, patterns can be
identified, indicating a fraud. The similarity search engine has been found to be able
to identify up to seven times more fraud than the exact-match systems.
Telecommunications
Deregulation of the telephone industry has led to widespread competition. Telephone
service carriers fight hard for customers. The problem is that once a customer is
obtained, it is attacked by competitors, and retention of customers is very difficult.
The phenomenon of a customer switching carriers is referred to as churn, a
fundamental concept in telemarketing as well as in other fields.
A director of product marketing for a communications company considered that
one-third of churn is due to poor call quality and up to one-half is due to poor
equipment. That firm has a wireless telephone performance monitor tracking
telephones with poor performances. This system reduced churn by an estimated 61
percent, amounting to about 3 percent of the firm’s overall subscribers over the course
of a year. When a telephone begins to go bad, the telemarketing personnel are alerted
to contact the customer and suggest bringing in the equipment for service.
Another way to reduce churn is to protect customers from subscription and cloning
fraud. Cloning has been estimated to have cost the wireless industry millions. A
number of fraud prevention systems are marketed. These systems provide verification
that is transparent to the legitimate subscribers. Subscription fraud has been estimated
to have an economic impact of $1.1 billion. Deadbeat accounts and service shutoffs
are used to screen potentially fraudulent applicants.
Churn is a concept that is used by many retail marketing operations. Banks widely
use churn information to drive their promotions. Once data mining identifies
customers by characteristic, direct mailing and telemarketing are used to present the
bank’s promotional program. The mortgage market has seen massive refinancing in a
number of periods. Banks were quick to recognize that they needed to keep their
mortgage customers happy if they wanted to retain their business. This has led to
banks contacting the current customers if those customers hold a mortgage at a rate
significantly above the market rate. While they may cut their own lucrative financial
packages, banks realize that if they don’t offer a better service to borrowers, a
competitor will.
Human Resource Management
Business intelligence is a way to truly understand markets, competitors, and
processes. Software technology such as data warehouses, data marts, online analytical
processing (OLAP), and data mining make it possible to sift through data in order to
spot trends and patterns that can be used by the firm to improve profitability. In the
human resources field, this analysis can lead to the identification of individuals who
are liable to leave the company unless additional compensation or benefits are
provided.
Data mining can be used to expand upon things that are already known. A firm
might know that 20 percent of its employees use 80 percent of services offered, but
may not know which particular individuals are in that 20 percent. Business
intelligence provides a means of identifying segments, so that programs can be
devised to cut costs and increase productivity. Data mining can also be used to
examine the way in which an organization uses its people. The question might be
whether the most talented people are working for those business units with the highest
priority or where they will have the greatest impact on profit.
Companies are seeking to stay in business with fewer people. Sound human
resource management would identify the right people, so that organizations could treat
them well to retain them (reduce churn). This requires tracking key performance
indicators and gathering data on talents, company needs, and competitor requirements.
Summary
The era of big data is here, flooding businesses with numbers, text, and often more
complex data forms, such as videos or pictures. Some of this data is generated
internally, through enterprise systems or other software tools to manage a business’s
information. Data mining provides a tool to utilize this data. This chapter reviewed the
basic applications of data mining in business, to include customer profiling, fraud
detection, and churn analysis. These will all be explored in greater depth in Chapter 2.
But, here our intent is to provide an overview of what data mining is useful for in
business.
The process of data mining relies heavily on information technology, in the form of
data storage support (data warehouses, data marts, or OLAP tools) as well as software
to analyze the data (data mining software). However, the process of data mining is far
more than simply applying these data mining software tools to a firm’s data.
Intelligence is required on the part of the analyst in selection of model types, in
selection and transformation of the data relating to the specific problem, and in
interpreting results.
CHAPTER 2
Business Data Mining Tools
Have you ever wondered why your spouse gets all of these strange catalogs for
obscure products in the mail? Have you also wondered at his or her strong interest in
these things, and thought that the spouse was overly responsive to advertising of this
sort? For that matter, have you ever wondered why 90 percent of your telephone calls,
especially during meals, are opportunities to purchase products? (Or for that matter,
why calls assuming you are a certain type of customer occur over and over, even
though you continue to tell them that their database is wrong?)
One of the earliest and most effective business applications of data mining is in
support of customer segmentation. This insidious application utilizes massive
databases (obtained from a variety of sources) to segment the market into categories,
which are studied with data mining tools to predict the response to particular
advertising campaigns. It has proven highly effective. It also represents the
probabilistic nature of data mining, in that it is not perfect. The idea is to send catalogs
to (or call) a group of target customers with a 5 percent probability of purchase rather
than waste these expensive marketing resources on customers with a 0.05 percent
probability of purchase. The same principle has been used in election campaigns by
party organizations—give free rides to the voting booth to those in your party;
minimize giving free rides to voting booths to those likely to vote for your opponents.
Some call this bias. Others call it sound business.
Data mining offers the opportunity to apply technology to improve many aspects of
business. Some standard applications are presented in this chapter. The value of
education is to present you with past applications, so that you can use your
imagination to extend these application ideas to new environments.
Data mining has proven valuable in almost every academic discipline.
Understanding business application of data mining is necessary to expose business
college students to current analytic information technology. Data mining has been
instrumental in customer relationship management,1 credit card management,2
banking,3 insurance,4 telecommunications,5 and many other areas of statistical support
to business. Business data mining is made possible by the generation of masses of
data from computer information systems. Understanding this information generation
system and tools available leading to analysis is fundamental for business students in
the 21st century. There are many highly useful applications in practically every field
of scientific study. Data mining support is required to make sense of the masses of
business data generated by computer technology.
This chapter will describe some of the major applications of data mining. By doing
so, there will also be opportunities to demonstrate some of the different techniques
that have proven useful. Table 2.1 compares the aspects of these applications.
Table 2.1 Common business data mining applications
Application Function Statistical technique AI tool
Catalog sales Customer segmentation Cluster analysis K-means
Mail stream optimization Neural network
CRM (telecom) Customer scoring Cluster analysis Neural network
Churn analysis
Credit scoring Loan applications Cluster analysis K-means
Pattern search
Banking (loans) Bankruptcy prediction Prediction Decision tree
Discriminant analysis
Investment risk Risk prediction Prediction Neural network
Insurance Customer retention (churn) Prediction Decision tree
Pricing Logistic regression Neural network
A wide variety of business functions are supported by data mining. Those
applications listed in Table 2.1 represent only some of these applications. The
underlying statistical techniques are relatively simple—to predict, to identify the case
closest to past instances, or to identify some pattern.
Customer Profiling
We begin with probably the most spectacular example of business data mining.
Fingerhut, Inc. was a pioneer in developing methods to improve business. In this case,
they sought to identify the small subset of the most likely purchasers of their specialty
catalogs. They were so successful that they were purchased by Federated Stores.
Ultimately, Fingerhut operations were a victim to the general malaise in IT business in
2001 and 2002. But, they still represent a pioneering development of data mining
application in business.
Lift
This section demonstrates the concept of lift used in customer segmentation models.
We can divide the data into groups as fine as we want (here, we divide them into 10
equal portions of the population, or groups of 10 percent each). These groups have
some identifiable features, such as zip code, income level, and so on (a profile). We
can then sample and identify the portion of sales for each group. The idea behind lift
is to send promotional material (which has a unit cost) to those groups that have the
greatest probability of positive response first. We can visualize lift by plotting the
responses against the proportion of the total population of potential customers, as
shown in Table 2.2. Note that the segments are listed in Table 2.2 sorted by expected
customer response.
Table 2.2 Lift calculation
Ordered Expected Proportion Cumulative Random average Lift
segment customer (expected response proportion
response responses) proportion
Origin 0 0 0 0 0
1 0.20 0.172 0.172 0.10 0.072
2 0.17 0.147 0.319 0.20 0.119
3 0.15 0.129 0.448 0.30 0.148
4 0.13 0.112 0.560 0.40 0.160
5 0.12 0.103 0.664 0.50 0.164
6 0.10 0.086 0.750 0.60 0.150
7 0.09 0.078 0.828 0.70 0.128
8 0.08 0.069 0.897 0.80 0.097
9 0.07 0.060 0.957 0.90 0.057
10 0.05 0.043 1.000 1.00 0.000
Both the cumulative responses and cumulative proportion of the population are
graphed to identify the lift. Lift is the difference between the two lines in Figure 2.1.
Figure 2.1 Lift identified by the mail optimization system
The purpose of lift analysis is to identify the most responsive segments. Here, the
greatest lift is obtained from the first five segments. We are probably more interested
in profit, however. We can identify the most profitable policy. What needs to be done
is to identify the portion of the population to send promotional materials to. For
instance, if an average profit of $200 is expected for each positive response and a cost
of $25 is expected for each set of promotional material sent out, it obviously would be
more profitable to send to the first segment containing an expected 0.2 positive
responses ($200 times 0.2 equals an expected revenue of $40, covering the cost of $25
plus an extra $15 profit). But, it still might be possible to improve the overall profit by
sending to other segments as well (always selecting the segment with the larger
response rates in order). The plot of cumulative profit is shown in Figure 2.2 for this
set of data. The second most responsive segment would also be profitable, collecting
$200 times 0.17 or $34 per $25 mailing for a net profit of $9. It turns out that the
fourth most responsive segment collects 0.13 times $200 ($26) for a net profit of $1,
while the fifth most responsive segment collects $200 times 0.12 ($24) for a net loss
of $1. Table 2.3 shows the calculation of the expected payoff.
Figure 2.2 Profit impact of lift
Table 2.3 Calculation of the expected payoff
Segment Expected segment Cumulative Random cumulative Expected
revenue ($200 × P) expected revenue cost ($25 × i) payoff
0 0 0 0 0
1 40 40 25 15
2 34 74 50 24
3 30 104 75 29
4 26 130 100 30
5 24 154 125 29
6 20 174 150 24
7 18 192 175 17
8 16 208 200 8
9 14 222 225 –3
10 10 232 250 –18
The profit function in Figure 2.2 reaches its maximum with the fourth segment.
It is clear that the maximum profit is found by sending to the four most responsive
segments of the ten in the population. The implication is that in this case, the
promotional materials should be sent to the four segments expected to have the largest
response rates. If there was a promotional budget, it would be applied to as many
segments as the budget would support, in order of the expected response rate, up to
the fourth segment.
It is possible to focus on the wrong measure. The basic objective of lift analysis in
marketing is to identify those customers whose decisions will be influenced by
marketing in a positive way. In short, the methodology described earlier identifies
those segments of the customer base that would be expected to purchase. This may or
may not have been due to the marketing campaign effort. The same methodology can
be applied, but more detailed data is needed to identify those whose decisions would
have been changed by the marketing campaign, rather than simply those who would
purchase.
Another method that considers multiple factors is Recency, Frequency, and
Monetary (RFM) analysis. As with lift analysis, the purpose of an RFM is to identify
customers who are more likely to respond to new offers. While lift looks at the static
measure of response to a particular campaign, RFM keeps track of customer
transactions by time, by frequency, and by amount. Time is important as some
customers may not have responded to the last campaign, but might now be ready to
purchase the product being marketed. Customers can also be sorted by the frequency
of responses and by the dollar amount of sales. The subjects are coded on each of the
three dimensions (one approach is to have five cells for each of the three measures,
yielding a total of 125 combinations, each of which can be associated with a positive
response to the marketing campaign). The RFM still has limitations, in that there are
usually more than three attributes important to a successful marketing program, such
as product variation, customer age, customer income, customer lifestyle, and so on.6
The approach is the basis for a continuing stream of techniques to improve customer
segmentation marketing.
Understanding lift enables understanding the value of specific types of customers.
This enables more intelligent customer management, which is discussed in the next
section.
Comparisons of Data Mining Methods
Initial analyses focus on discovering patterns in the data. The classical statistical
methods, such as correlation analysis, is a good start, often supplemented with visual
tools to see the distributions and relationships among variables. Clustering and pattern
search are typically the first activities in data analysis, good examples of knowledge
discovery. Then, appropriate models are built. Data mining can then involve model
building (extension of the conventional statistical model building to very large
datasets) and pattern recognition. Pattern recognition aims to identify groups of
interesting observations. Often, experts are used to assist in pattern recognition.
There are two broad categories of models used for data mining. Continuous,
especially time series, data often calls for forecasting. Linear regression provides one
tool, but there are many others. Business data mining has widely been used for
classification or developing models to predict which category a new case will most
likely belong to (such as a customer profile relative to the expected purchases,
whether or not loans will be problematic, or whether insurance claims will turn out to
be fraudulent). The classification modeling tools include statistically based logistic
regression as well as artificial intelligence-based neural networks and decision trees.
Sung et al. compared a number of these methods with respect to their advantages
and disadvantages. Table 2.4 draws upon their analysis and expands it to include the
other techniques covered.
Table 2.4 Comparison of data mining method features7
Method Advantages Disadvantages Assumptions
Cluster Can generate understandable Computation time increases Need to make data
analysis formula with dataset size numerical
Can be applied Requires identification of
automatically parameters, with results
sensitive to choices
Discriminant Ability to incorporate Violates normality and Assume multivariate
analysis multiple financial ratios independence assumptions normality within groups
simultaneously Reduction of dimensionality Assume equal group
Coefficients for combining issues covariances across all
the independent variables Varied interpretation of the groups
Ability to apply to new data relative importance of variables Groups are discrete,
Difficulty in specifying the nonoverlapping, and
classification algorithm identifiable
Difficulty in interpreting the
time-series prediction tests
Regression Can generate understandable Computation time increases Normality of errors
formula with dataset size No error autocorrelation, -
Widely understood Not very good with nonlinear heteroskedasticity,
Strong body of theory data multicollinearity
Neural Can deal with a wide range Require inputs in the range of 0 Groups are discrete,
network of problems to 1 nonoverlapping, and
models Produce good results in Do not explain results identifiable
complicated domains May prematurely converge to an
(nonlinear) inferior solution
Can deal with both
continuous and categorical
variables
Have many software
packages available
Decision Can generate understandable Some algorithms can only deal Groups are discrete,
trees rules with binary-valued target nonoverlapping, and
Can classify with minimal classes identifiable
computation Most algorithms only examine a
Use easy calculations single field at a time
Can deal with continuous Can be computationally
and categorical variables expensive
Provide a clear indication of
variable importance
Knowledge Discovery
Clustering: One unsupervised clustering technique is partitioning, the process of
examining a set of data to define a new categorical variable partitioning the space into
a fixed number of regions. This amounts to dividing the data into clusters. The most
widely known partitioning algorithm is k-means, where k center points are defined,
and each observation is classified to the closest of these center points. The k-means
algorithm attempts to position the centers to minimize the sum of distances. Centroids
are used as centers, and the most commonly used distance metric is Euclidean. Instead
of k-means, k-median can be used, providing a partitioning method expected to be
more stable.
Pattern search: Objects are often grouped to seek patterns. Clusters of customers
might be identified with particularly interesting average outcomes. On the positive
side, you might look for patterns in highly profitable customers. On the negative side,
you might seek patterns unique to those who fail to pay their bills to the firm.
Both clustering and pattern search seek to group the objects. Cluster analysis is
attractive, in that it can be applied automatically (although ample computational time
needs to be available). It can be applied to all types of data, as demonstrated in our
example. Cluster analysis is also easy to apply. However, its use requires selection
from among alternative distance measures, and weights may be needed to reflect
variable importance. The results are sensitive to these measures. Cluster analysis is
appropriate when dealing with large, complex datasets with many variables and
specifically identifiable outcomes. It is often used as an initial form of analysis. Once
different clusters are identified, pattern search methods are often used to discover the
rules and patterns. Discriminant analysis has been the most widely used data mining
technique in bankruptcy prediction. Clustering partitions the entire data sample,
assigning each observation to exactly one group. Pattern search seeks to identify local
clusterings, in that there are more objects with similar characteristics than one would
expect. Pattern search does not partition the entire dataset, but identifies a few groups
exhibiting unusual behavior. In the application on real data, clustering is useful for
describing broad behavioral classes of customers. Pattern search is useful for
identifying groups of people behaving in an anomalous way.
Predictive Models
Regression is probably the most widely used analytical tool historically. A main
benefit of regression is the broad understanding people have about regression models
and tests of their output. Logistic regression is highly appropriate in data mining, due
to the categorical nature of resultant variables that is usually present. While regression
is an excellent tool for statistical analysis, it does require assumptions about
parameters. Errors are assumed to be normally distributed, without autocorrelation
(errors are not related to the prior errors), without heteroskedasticity (errors don’t
grow with time, for instance), and without multicollinearity (independent variables
don’t contain high degrees of overlapping information content). Regression can deal
with nonlinear data, but only if the modeler understands the underlying nonlinearity
and develops appropriate variable transformations. There usually is a tradeoff—if the
data are fit well with a linear model, regression tends to be better than neural network
models. However, if there is nonlinearity or complexity in the data, neural networks
(and often, genetic algorithms) tend to do better than regression. A major relative
advantage of regression relative to neural networks is that regression provides an
easily understood formula, while neural network models have a very complex model.
Neural network algorithms can prove highly accurate, but involve difficulty in the
application to new data or interpretation of the model. Neural networks work well
unless there are many input features. The presence of many features makes it difficult
for the network to find patterns, resulting in long training phases, with lower
probabilities of convergence. Genetic algorithms have also been applied to data
mining, usually to bolster operations of other algorithms.
Decision tree analysis requires only the last assumption, that groups are discrete,
nonoverlapping, and identifiable. They provide the ability to generate understandable
rules, can perform classification with minimal computation, and these calculations are
easy. Decision tree analysis can deal with both continuous and categorical variables,
and provide a clear indication of variable importance in prediction and classification.
Given the disadvantages of the decision tree method, it is a good choice when the data
mining task is classification of records or prediction of outcomes.
Summary
Data mining applications are widespread. This chapter sought to give concrete
examples of some of the major business applications of data mining. We began with a
review of Fingerhut data mining to support catalog sales. That application was an
excellent demonstration of the concept of lift applied to retail business. We also
reviewed five other major business applications, intentionally trying to demonstrate a
variety of different functions, statistical techniques, and data mining methods. Most of
those studies applied multiple algorithms (data mining methods). Software such as
Enterprise Miner has a variety of algorithms available, encouraging data miners to
find the method that works best for a specific set of data.
The second portion of the book seeks to demonstrate these methods with small
demonstration examples. The small examples can be run on Excel or other simple
spreadsheet packages with statistical support. Businesses can often conduct data
mining without purchasing large-scale data mining software. Therefore, our
philosophy is that it is useful to understand what the methods are doing, which also
provides the users with better understanding of what they are doing when applying
data mining.
CHAPTER 3
Data Mining Processes and Knowledge
Discovery
In order to conduct data mining analysis, a general process is useful. This chapter
describes an industry standard process, which is often used, and a shorter vendor
process. While each step is not needed in every analysis, this process provides a good
coverage of the steps needed, starting with data exploration, data collection, data
processing, analysis, inferences drawn, and implementation.
There are two standard processes for data mining that have been presented. CRISP-
DM (cross-industry standard process for data mining) is an industry standard, and
SEMMA (sample, explore, modify, model, and assess) was developed by the SAS
Institute Inc., a leading vendor of data mining software (and a premier statistical
software vendor). Table 3.1 gives a brief description of the phases of each process.
You can see that they are basically similar, only with different emphases.
Table 3.1 CRISP-DM and SEMMA
CRISP-DM SEMMA
Business understanding Assumes well-defined questions
Data understanding Sample
Data preparation Explore
Modeling Modify data
Evaluation Model
Deployment Assess
Industry surveys indicate that CRISP-DM is used by over 70 percent of the industry
professionals, while about half of these professionals use their own methodologies.
SEMMA has a lower reported usage, as per the KDNuggets.com survey.
CRISP-DM
CRISP-DM is widely used by the industry members. This model consists of six
phases intended as a cyclical process shown in Figure 3.1.
CRISP-DM process
This six-phase process is not a rigid, by-the-numbers procedure. There is usually a
great deal of backtracking. Additionally, experienced analysts may not need to apply
each phase for every study. But, CRISP-DM provides a useful framework for data
mining.
Business Understanding
The key element of a data mining study is understanding the purpose of the study.
This begins with the managerial need for new knowledge and the expression of the
business objective of the study to be undertaken. Goals in terms of things, such as
which types of customers are interested in each of our products or what are the typical
profiles of our customers, and how much value do each of them provide to us, are
needed. Then, a plan for finding such knowledge needs to be developed, in terms of
those responsible for collecting data, analyzing data, and reporting. At this stage, a
budget to support the study should be established, at least in preliminary terms.
Data Understanding
Once the business objectives and the project plan are established, data understanding
considers data requirements. This step can include initial data collection, data
Other documents randomly have
different content
The Project Gutenberg eBook of North
American Wild Flowers
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Title: North American Wild Flowers
Illustrator: Agnes Fitzgibbon
Author: Catharine Parr Strickland Traill
Release date: January 2, 2018 [eBook #56288]
Most recently updated: October 23, 2024
Language: English
Credits: Produced by Marcia Brooks, Mardi Desjardins & the
online
Distributed Proofreaders Canada team at
http://www.pgdpcanada.net
*** START OF THE PROJECT GUTENBERG EBOOK NORTH
AMERICAN WILD FLOWERS ***
NORTH AMERICAN
Wild Flowers.
Painted and Lithographed
BY
Agnes Fitz Gibbon
WITH
BOTANICAL DESCRIPTIONS
BY
C. P. TRAILL.
AUTHORESS OF “THE BACKWOODS OF CANADA” “THE CANADIAN CRUSOES”
ET.C. ET.C.
CONTENTS.
PAGE
PLATE I.
Liver-Leaf—Wind-Flower.—(Sharp Lobed Hepatica.)—Hepatica 9
Acutiloba
Bellwort—(Wood Daffodil.)—Uvularia perfoliata 11
Wood Anemone.—Anemone Nemorosa 13
Spring Beauty.—Claytonia Virginica 16
PLATE II.
Adders-Tongue.—Dog-Toothed Violet.—Erythronium 19
Americanum
White Trillium.—Death-Flower.—Trillium Grandiflorum 21
Rock Columbine.—Aquilegia Canadensis 24
PLATE III.
Squirrel Corn.—Dicentra Canadensis 27
Purple Trillium.—Death-Flower.—Birth-Root.—Trillium erectum 29
Wood Geranium.—Cranes-Bill.—Geranium maculatum 31
Chickweed Wintergreen.—Trientalis Americana 34
PLATE IV.
Sweet Wintergreen.—Pyrola elliptica 35
One Flowered Pyrola.—Moneses uniflora 39
Flowering Raspberry.—Rubus Odoratus 41
Speedwell.—American Brooklime.—Veronica Americana 43
PLATE V.
Yellow Lady’s Slippers.—Cypripedium parviflorum and 45
Cypripedium pubescens
Large Blue Flag.—Iris Versicolor.—Fleur-de-luce 47
Small Canberry.—Vaccinium Oxycoccus 50
PLATE VI.
Wild Orange Lily.—Lilium Philadelphicum 53
Canadian Harebell.—Campanula Rotundifolia 56
Showy Lady’s Slipper.—Cypripedium Spectabile.—(Moccasin 59
Flower)
PLATE VII.
Early Wild Rose.—Rosa Blanda 63
Pentstemon Beard-Tongue.—Pentstemon pubescens 66
PLATE VIII.
Sweet Scented Water Lily.—Nymphæa Odorata 67
Yellow Pond Lily.—Nuphar Advena.—(Spatter Dock) 71
PLATE IX.
Pitcher Plant.—(Soldier’s Drinking Cup.)—Sarracenia Purpurea 73
PLATE X.
Painted Cup, Scarlet Cup.—Castilleia Coccinea 77
Showy Orchis.—Orchis Spectabilis 81
Indian Turnip.—Arum triphyllum (Arum family) 83
Cone Flower.—Rudbeckia fulgida 87
PREFACE
TO THE
WILD FLOWERS OF NORTH
AMERICA.
he first and second edition of our Book of Wild Flowers was
published last year under the title of “CANADIAN WILD
FLOWERS;” but it has been suggested by some American
friends that we ought not to have limited the title to the Wild
Flowers of Canada, as nature has given them a much wider
geographical range, and, in fact, there are none of those that have
been portrayed and described in our volume but may be found
diffused over the whole of the Eastern and Northern States of the
Union, as well as to the North and West of the Great Lakes. We,
therefore, have rectified the error in our present issue, not wishing
to put asunder those whom the Great Creator has united in one
harmonious whole, each family and tribe finding its fitting place as
when it issued freshly forth from the bounteous hand of God who
formed it for the use of His creatures and to His own honor and
glory.
As our present volume embraces but a select few of the Native
Flowers of this Northern Range of the Continent, it is our intention to
follow it by succeeding series, which will present to our readers the
most attractive of our lovely Wild Flowers, and flowering shrubs. The
subject offers a wide field for our future labours.
What a garland of loveliness has nature woven for man’s
admiration, and yet, comparatively speaking, how few appreciate the
beauties thus lavishly bestowed upon them?
The inhabitants of the crowded cities know little of them even by
name, and those that dwell among them pass them by as though
they heeded them not, or regarded them as worthless weeds,
crying, “Cut them down, why cumber they the ground?” To such
careless ones they do indeed “waste their sweetness on the desert
air.” Yet the Wild Flowers have deeper meanings and graver
teachings than the learned books of classical lore so much prized by
the scholar, if he will but receive them.
They shew him the parental care of a benificent God for the
winged creatures of the air, and for the sustenance of the beasts of
the field. They point to the better life, the resurrection from the
darkness of the grave. They are emblems of man’s beauty and of his
frailty. They lend us by flowery paths from earth to heaven, where
the flowers fade not away. Shall we then coldly disregard the flowers
that our God has made so wondrously fair, to beautify the earth we
live on?
Mothers of America teach your little ones to love the Wild
Flowers and they will love the soil on which they grew, and in all
their wanderings through the world their hearts will turn back with
loving reverence to the land of their birth, to that dear home
endeared to their hearts by the remembrance of the flowers that
they plucked and wove for their brows in their happy hours of
gladsome childhood.
How many a war-worn soldier would say with the German hero
of Schiller’s tragedy:
“Oh gladly would I give the blood stained victor’s
wreath
For the first violet of the early spring,
Plucked in those quiet fields where I have journeyed.”
Schiller.
DESCRIPTION OF THE TITLE PAGE.
Our Artist has tastefully combined in the wreath that adorns her
title page several of our native Spring Flowers. The simple blossoms
of Claytonia Virginica, better known by its familiar name “Spring
Beauty,” may easily be recognized from the right hand figure in the
group of the first plate in the book. For a description of it see page
16.
The tall slender flower on the left side on the title page is
Potentilla Canadensis, (Var simplex). This slender trailing plant may
be found in open grassy thickets, by road side wastes, at the foot of
old stumps, and similar localities, with the common Cinquefoil or
Silver Leaf. This last species is much the most attractive plant to the
lover of wild flowers. It abounds in dry gravelly and sandy soil,
courting the open sunshine, rooting among stones, over which it
spreads its slender reddish stalk, enlivening the dry arid wastes with
its silvery silken leaves and gay golden rose-shaped blossoms.
The Potentilla family belongs to the same Natural Order, Rosaceæ,
as the Strawberry, Raspberry, Blackberry and the Rose—a goodly
fellowship of the useful and the beautiful among which our humble
Cinquefoil has been allowed to find a place.
The little plant occupying the lower portion of the plate is Viola
sagittata, “Arrow Leaved Violet.” The anthers of the stamens are flesh
coloured or pale orange; the slender pointed sepals of the calyx are
of a bright light green, which form a lively contrast to the deep
purple closely wrapped pointed buds that they enfold. The leaves are
of a dull green, somewhat hairy, narrow, blunt at the apex, not
heart-shaped as in many of the species but closed at the base and
bordering the short channelled foot-stalk. Among our numerous
species few are really more lovely than “the Arrow Leaved Violet.”
Viola ovata and Viola villosa closely resemble the above, and
probably are varieties of our pretty flower.
The violet, like the rose and lily, has ever been the poet’s flower.
This is not one of our earliest violets; it blossoms later than the early
white violet, V. rotundifolia or than the early Blue Violet, V. cucullata,
or that delicate species V. striata, the lilac striped violet, which
adorns the banks and hill sides on some of our plain lands, early in
the month of May. Later in this month and in the beginning of June
we find the azure blossoms of V. sagittata in warm sheltered valleys,
often among groups of small pines and among grasses on sandy
knolls and open thickets. The plant grows low, the leaves on very
short foot-stalks closely pressed to the ground; the bright full blue
flowers springing from the crown of the plant on long slender stems
stand above the leaves.
The petals are blunt, of a full azure blue, white at the base and
bearded. Among many allusions to this favourite flower, here are
lines somewhat after the style of the older poets, addressed to early
violets found on a wintry March day at Waltham Abbey.
TO EARLY VIOLETS.
Children of sweetest birth,
Why do ye bend to earth
Eyes in whose softened blue,
Lies hid the diamond dew?
Has not the early ray,
Yet kissed those tears away
That fell with closing day?
Say do ye fear to meet
The hail and driving sleet,
Which gloomy winter stern
Flings from his snow-wreathed urn?
Or do ye fear the breeze
So sadly sighing thro’ the trees,
Will chill your fragrant flowers,
Ere April’s genial showers
Have visited your bowers?
Why came ye till the cuckoo’s voice,
Bade hill and vale rejoice;
Till Philomel with tender tone,
Waking the echoes lone,
Bids woodland glades prolong
Her sweetly tuneful song;
Till sky-lark blithe and linnet grey,
From fallow brown and meadow gay,
Pour forth their jocund roundelay;
Till ‘cowslip, wan’ and ‘daisies pied’
’Broider the hillock’s side,
And opening hawthorn buds are seen,
Decking each hedge-row screen?
What, though the primrose drest
In her pure paly vest
Came rashly forth
To brave the biting North,
Did ye not see her fall
Straight ’neath his snowy pall;
And heard ye not the West wind sigh
Her requiem as he hurried by?
Go hide ye then till groves are green
And April’s clouded bow is seen;
Till suns are warm, and skies are clear
And every flower that does appear,
Proclaims the birthday of the year.
Though Canada does not boast among her violets the sweet
purple violet (Viola odorata) of Britain she has many elegant species
remarkable for beauty of form and colour; among these “The Yellow
Wool Violet,” the “Song Spurred Violet” and the “Milkwhite Wool
Violet,” (V. Canadensis) may be named. These are all branching
violets, some, as the yellow and the white, often attain, in rank
shaded soil, to a foot in height and may be found throwing out a
succession of flowers through the later summer months. They will
bloom freely if transplanted to a shady spot in the garden.
PLATE I.
3 ANEMONE NEMOROSA 2 UVULARIA PERFOLIATA 4 CLAYTONIA VIRGINICA
(Wood Anemone) (Large flowered Bellwort) (Spring Beauty)
1 HEPATICA ACUTILOBA
(Sharp lobed Hepatica)
Nat. Ord. Ranunculaceæ.
LIVER-LEAF.
(SHARP LOBED HEPATICA.)
Hepatica acutiloba.
“Lodged in sunny clefts,
Where the cold breeze comes not, blooms alone
The little Wind-flower, whose just opened eye
Is blue, as the spring heaven it gazes at.”
Bryant.
HE American poet, Bryant, has many happy allusions to the
Hepatica under the name of “Wind-Flower;” the more common
name among our Canadian settlers is “Snow-Flower,” it being
the first blossom that appears directly after the melting off of
the winter snows.
In the forest—in open grassy old woods, on banks and upturned
roots of trees, this sweet flower gladdens the eye with its cheerful
starry blossoms; every child knows it and fills its hands and bosom
with its flowers, pink, blue, deep azure and pure white. What the
daisy is to England, the Snow-flower or Liver-leaf is to Canada. It
lingers long within the forest shade, coyly retreating within its
sheltering glades from the open glare of the sun: though for a time
it will not refuse to bloom within the garden borders, when
transplanted early in spring, and doubtless if properly supplied with
black mould from the woods and partially sheltered by shrubs it
would continue to grow and flourish with us constantly.
We have two sorts, H. acutiloba, and H. triloba. A large variety
has been found on Long Island in Rice Lake; the leaves of which are
five lobed; the lobes much rounded, the leaf stalks stout, densely
silky, the flowers large, of a deep purple blue. This handsome plant
throve under careful cultivation and proved highly ornamental.
The small round closely folded buds of the Hepatica appear
before the white silky leaves unfold themselves, though many of the
old leaves of the former year remain persistent through the winter.
The buds rise from the centre of a silken bed of soft sheaths and
young leaves, as if nature kindly provided for the warmth and
protection of these early flowers with parental care.
Later in the season, the young leaves expand just before the
flowers drop off. The white flowered is the most common among our
Hepaticas, but varieties may be seen of many hues: waxen-pink,
pale blue and azure blue with intermediate shades and tints.
The Hepatica belongs to the Nat. Ord. Ranunculaceæ, the crow-
foot family, but possesses none of the acrid and poisonous qualities
of the Ranunculus proper, being used in medicine, as a mild tonic, by
the American herb doctors in fevers and disorders of the liver.
It is very probable that its healing virtues in complaints of the
liver gave rise to its common name in old times; some assign the
name to the form of the lobed leaf.
BELLWORT.
(WOOD DAFFODIL.)
Uvularia perfoliata.
“Fair Daffodils, we weep to see
Thee haste away so soon,
As yet the early rising sun
Has not attained his noon.
Stay, stay!—
Until the hasting day
Has run,
But to the evening song;
When having prayed together we
Will go with you along.”
Herrick.
HIS slender drooping flower of early spring is known by the
name of Bellwort, from its pendent lily-like bells; and by some
it is better known as the Wood Daffodil, to which its yellow
blossoms bear some remote resemblance.
The flowers of the Bellwort are of a pale greenish-yellow;
the divisions of the petal-like sepals are six, deeply divided, pointed
and slightly twisted or waved, drooping from slender thready
pedicels terminating the branches; the stem of the plant is divided
into two portions, one of which is barren of flowers. The leaves are
of a pale green, smooth, and in the largest species perfoliate,
clasping the stem.
The root (or rhizome) is white, fleshy and tuberous. The Bellwort
is common in rich shady woods and grassy thickets, and on moist
alluvial soil on the banks of streams, where it attains to the height of
18 or 20 inches. It is an elegant, but not very showy flower—
remarkable more for its graceful pendent straw-coloured or pale
yellow blossoms, than for its brilliancy. It belongs to a sub-order of
the Lily Tribe. There are three species in Canada—the large Bellwort
—Uvularia grandiflora and U. perfoliata—we also possess the third,
enumerated by Dr. Gray, U. sessilifolia.
Nat. Ord. Ranunculaceæ.
WOOD ANEMONE.
Anemone nemorosa.
“Within the wood,
Whose young and half transparent leaves,
Scarce cast a shade; gay circles of anemones,
Danced on their stalks.”
Bryant.
HE classical name Anemone is derived from a Greek word, which
signifies the wind, because it was thought that the flower
opened out its blossoms only when the wind was blowing.
Whatever the habits of the Anemone of the Grecian Isles may
be, assuredly in their native haunts in this country, the
blossoms open alike in windy weather or in calm; in shade or in
sunshine. It is more likely that the wind acting upon the downy
seeds of some species and dispersing them abroad, has been the
origin of the idea, and has given birth to the popular name which
poets have made familiar to the ear with many sweet lines. Bryant,
who is the American poet of nature, for he seems to revel in all that
is fair among the flowers and streams and rocks and forest shades,
has also given the name of “wind flower” to the blue hepatica.
The subject of our plate, the little white pink-edged flower at the
left hand corner of the group, is Anemone nemorosa, the smaller
“Wood Anemone.”
This pretty delicate species loves the moderate shade of groves
and thickets, it is often found in open pinelands of second growth,
and evidently prefers a light and somewhat sandy soil to any other,
with glimpses of sunshine stealing down upon it.
The Wood Anemone is from 4 to 9 inches in height, but seldom
taller, the five rounded sepals which form the flower are white,
tinged with a purplish-red or dull pink on the outside. The leaves are
three parted, divided again in three, toothed and sharply cut and
somewhat coarse in texture; the three upper stem leaves form an
involucre about midway between the root and the flower-cup.
Our Wood Anemone is a cheerful little flower gladdening us with
its blossoms early in the month of May. It is very abundant in the
neighbourhood of Toronto, on the grassy banks and piny-dells at
Dover Court, and elsewhere.
“There thickly strewn in woodland bowers,
Anemones their stars unfold.”
A somewhat taller species, with very white starry flowers, is
found on gravelly banks under the shade of shrubs near the small
lakes formed by the Otonabee river, N. Douro, where also, we find
the downy seeded species known as “Thimble-weed,” Anemone
cylindrica, from the cylindrical heads of fruit. The “Thimble-weed” is
not very attractive for beauty of colour; the flower is greenish-white,
small, two of the sepals being shorter and less conspicuous than the
others; the plant is from 1 to 2 ft. high; the leaves of the cut and
pointed involucre are coarse, of a dull green, surrounding the several
long flower-stalks. The soft cottony seeds remain in close heads
through the winter, till the spring breezes disperse them.
The largest species of our native Anemones is A. Virginiana, “Tall
Anemone.” This handsome plant loves the shores of lakes and
streams; damp rich ground suits it well, as it grows freely in such
soil, and under moderate shade when transferred to the garden.
The foliage of the tall Anemone is coarse, growing in whorls
round the stem, divisions of the leaf three parted, sharply pointed
and toothed. In this, as in all the species, the coloured sepals, (or
calyx leaves) form the flower. The outer surface of the flower is
covered with minute silky hairs, the round flattened silky buds rise
singly on tall naked stems, the upper series are supplied with two
small leaflets embracing the stalk. The central and largest flowers
open first, the lateral or outer ones as these fade away; thus a
succession of blossoms is produced, which continue to bloom for
several weeks. The flowers of this sort, under cultivation, become
larger and handsomer than in their wild state, ivory white, tinged
with purple. The Anemone is always a favourite flower wherever it
may be seen, whether in British woods, on Alpine heights, or in
Canadian wilds; on banks of lonely lakes and forest streams; or in
the garden parterre, where it is rivalled by few other flowers in grace
of form or splendour of colour.
Nat. Ord. Portulacaceæ.
SPRING BEAUTY.
Claytonia Virginica.
Where the fire had smoked and smouldered
Saw the earliest flower of Spring time,
Saw the beauty of the Spring time,
Saw the Miskodeed[1] in blossom.
Hiawatha.
HIS simple delicate little plant is one of our earliest April
flowers. In warm springs it is almost exclusively an April flower,
but in cold and backward seasons, it often delays its
blossoming time till May.
Partially hidden beneath the shelter of old decaying timbers
and fallen boughs, its pretty pink buds peep shyly forth. It is often
found in partially cleared beech-woods, and in rich moist meadows.
In Canada, there are two species; one with few flowers, white,
both leaves and flowers larger than the more common form; the
blossoms of the latter are more numerous, smaller, and of a pale
pink colour, veined with lines of a deeper rose colour, forming a
slender raceme; sometimes the little pedicels or flower stalks are
bent or twisted to one side, so as to throw the flowers in one
direction.
The scape springs from a small deep tuber, bearing a single pair
of soft, oily, succulent leaves. In the white flowered species these
leaves are placed about midway up the stem, but in the pink (C.
Virginica) the leaves lie closer to the ground, and are smaller and of
a dark bluish green hue. Our Spring Beauty well deserves its pretty
poetical name. It comes in with the Robin, and the song sparrow,
the hepatica, and the first white violet; it lingers in shady spots, as if
unwilling to desert us till more sunny days have wakened up a
wealth of brighter blossoms to gladden the eye; yet the first, and
the last, are apt to be most prized by us, with flowers, as well as
other treasures.
How infinitely wise and merciful are the arrangements of the
Great Creator. Let us instance the connection between Bees and
Flowers. In cold climates the former lie torpid, or nearly so, during
the long months of Winter, until the genial rays of the sun and light
have quickened vegetation into activity, and buds and blossoms
open, containing the nutriment necessary for this busy insect tribe.
The Bees seem made for the Blossoms; the Blossoms for the Bees.
On a bright March morning what sound can be more in harmony
with the sunshine and blue skies, than the murmuring of the
honeybees, in a border of cloth of gold crocuses? what sight more
cheerful to the eye? But I forget. Canada has few of these sunny
flowers, and no March days like those that woo the hive bees from
their winter dormitories. And April is with us only a name. We have
no April month of rainbow suns and showers. We miss the deep blue
skies, and silver throne-like clouds that cast their fleeting shadows
over the tender springing grass and corn; we have no mossy lanes
odorous with blue violets. One of our old poets thus writes:
“Ye violets that first appear,
By your pure purple mantles known,
Like the proud virgins of the year,
As if the spring were all your own,
What are ye when the rose is blown.”[2]
We miss the turfy banks, studded with starry daisies, pale primroses
and azure blue-bells.
Our May is bright and sunny, more like to the English March; it is
indeed a month of promise—a month of many flowers. But too often
its fair buds and blossoms are nipped by frost, “and winter, lingering,
chills the lap of May.”
In the warmth and shelter of the forest, vegetation appears. The
black leaf mould, so light and rich, quickens the seedlings into rapid
growth, and green leaves and opening buds follow soon after the
melting of the snows of winter. The starry blossoms of the hepatica,
blood-root, bellwort, violets, white, yellow and blue, with the delicate
Coptis (gold-thread), come forth and are followed by many a lovely
flower, increasing with the more genial seasons of May and June.
But our April flowers are but few, comparatively speaking, and so
we prize our early Violets, Hepaticas and Spring Beauty.
[1] Miskodeed—Indian name for Spring Beauty.
[2] Sir Henry Wotton—written in 1651.
PLATE II.
3 AQUILEGIA 2 TRILLIUM 1 ERYTHRONIUM
CANADENSIS GRANDIFLORUM AMERICANUM
(Wild Columbine) (Large white Trillium) (Yellow adders tongue)
Nat. Ord. Liliaceæ.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
textbookfull.com