1. chapter i(pasw)

What is PASW?
Predictive
Analytics
Software

What is Statistics?
• Statistics is a set of mathematical techniques used to:
• Summarize research data.
• Determine whether the data supports the
researcher’s hypothesis.
Research Stages
1. Planning and Designing
2. Data Collecting
3. Data Analyzing
4. Data Reporting
5. deployment

SPSS
• it was acquired by IBM in 2009.
• It is also used by market researchers, health researchers,
survey companies, government, education researchers,
marketing organizations, data miners, and others.
• Companion products in the same family are used for survey
authoring and deployment (IBM SPSS Data Collection), data
mining(IBM SPSS Modeler), text analytics, and collaboration
and deployment (batch and automated scoring services).
• The software name stands for Statistical Package for the
Social Sciences (SPSS),[2] reflecting the original market,
although the software is now popular in other fields as
well, including the health sciences and marketing.

SPSS
• Run the tutorial: Sample lesson existing
• Type in Data: Insert data ourselves
• Run as existing query: having existing query
• Create new query using Database wizard: Import DB table
• Open
• SPSS Data Editor
• OPEN=> Data
• Column Header=Variable
• Row Header=Case Number
• The Data Editor provides two views of your data:
– Data View. This view displays the actual data values or defined
value labels.
– Variable View. This view displays variable definition information,
including defined variable and value labels, data type (for
example, string, date, or numeric), measurement level (nominal,
ordinal, or scale), and user-defined missing values.

Data View
• similar to the features that are found in spreadsheet applications
• Rows are cases. Each row represents a case or an observation. For
example, each individual respondent to a questionnaire is a case.
• Columns are variables. Each column represents a variable or characteristic
that is being measured. For example, each item on a questionnaire is a
variable.
• Cells contain values. Each cell contains a single value of a variable for a
case. The cell is where the case and the variable intersect. Cells contain
only data values. Unlike spreadsheet programs, cells in the Data Editor
cannot contain formulas.
• The data file is rectangular. The dimensions of the data file are determined
by the number of cases and variables. You can enter data in any cell. If you
enter data in a cell outside the boundaries of the defined data file, the
data rectangle is extended to include any rows and/or columns between
that cell and the file boundaries. There are no "empty" cells within the
boundaries of the data file. For numeric variables, blank cells are
converted to the system-missing value. For string variables, a blank is
considered a valid value. To display Data View Hide details

Variable View
• Column Name: name of variable(Name, Age, Sex, Edu, Income)
– Each variable name must be unique; duplication is not allowed.
– Variable names can be up to 64 bytes long, and the first character
must be a letter or one of the characters @, #, or $.
– Variable names cannot contain spaces
– Reserved keywords cannot be used as variable names. Reserved
keywords are ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, and
WITH.
• Column Type/Variable type: Data Type of variable
– Number: A variable whose values are numbers. Values are displayed in
standard numeric format.
– Comma: A numeric variable whose values are displayed with commas
delimiting every three places and displayed with the period as a
decimal delimiter.
– Dot: A numeric variable whose values are displayed with periods
delimiting every three places and with the comma as a decimal
delimiter.

Variable View
• nominal
– A variable can be treated as nominal when its values represent
categories with no intrinsic ranking (for example, the department of
the company in which an employee works). Examples of nominal
variables include region, zip code, and religious affiliation.
• Ordinal
– A variable can be treated as ordinal when its values represent
categories with some intrinsic ranking (for example, levels of service
satisfaction from highly dissatisfied to highly satisfied). Examples of
ordinal variables include attitude scores representing degree of
satisfaction or confidence and preference rating scores.
• Scale
– A variable can be treated as scale (continuous) when its values
represent ordered categories with a meaningful metric, so that
distance comparisons between values are appropriate. Examples of
scale variables include age in years and income in thousands of
dollars.

Variable View
• Column Type/Variable type: Data Type of variable
– Scientific notation: A numeric variable whose values are displayed
with an embedded E and a signed power-of-10 exponent. The
exponent can be preceded by E or D with an optional sign or by the
sign alone--for example, 123, 1.23E2, 1.23D2, 1.23E+2, and 1.23+2.
– Date. A numeric variable whose values are displayed in one of several
calendar-date or clock-time formats.
– Dollar. A numeric variable displayed with a leading dollar sign ($),
commas delimiting every three places, and a period as the decimal
delimiter.
– Custom currency. A numeric variable whose values are displayed in
one of the custom currency formats that you have defined on the
Currency tab of the Options dialog box.
– String. A variable whose values are not numeric and therefore are not
used in calculations.
– Restricted numeric. A variable whose values are restricted to non-
negative integers.

Variable View
• Variable labels
– You can assign descriptive variable labels up to 256 characters (128
characters in double-byte languages). Variable labels can contain
spaces and reserved characters that are not allowed in variable
names.
• Value labels
– You can assign descriptive value labels for each value of a variable.
This process is particularly useful if your data file uses numeric codes
to represent non-numeric categories (for example, codes of 1 and 2 for
male and female).
– Value labels are saved with the data file. You do not need to redefine
value labels each time you open a data file. Value labels can be up
to 120 bytes.

Replace Missing Value
(Age1+age2+age3+age4)/4

Import File Excel
• File=> Open Data=> Excel Type

Run New Query
• File=> Open DataBase=>New Query
– Select table for import

Questoin #1
Data View
Variable View
1. Change Value: 1=Male
2=Female
2. Measure
Gender=Normal
Heigh=Scale
3. Analyze=> Descriptive Statistic=> Frequency
Gender=Variable
4. Analyze=> Descriptive Statistic=> Frequency
Heigh=Variable
Statistic
5. Analyze=> Comparative Means=> Means

Frequency:Count number of case having each value variable
Percent: Valid number + Invalid number to find out percent
Valid Percent: Only valid number in percent
Cumulative Percent: valid percent sum
Ex: 14.3, 14.3+14.3=26.6, 14.3+14.3+42.9=71.4, 14.3+14.3+42.9+28.6=100.0

Standard deviation
• standard deviation (SD) (sigma, σ) measures the amount of variation
or dispersion from the average.
•A low standard deviation indicates that the data points tend to be very close
to the mean (also called expected value);
•A high standard deviation indicates that the data points are spread out over a
large range of values.
•For example, consider a population consisting of the following eight values:
2, 4, 4, 4, 5, 5, 7, 9
•These eight data points have the mean (average=mean) of 5:
(2+4+4+4+5+5+7+9)/8=5
•Each Data Point from mean:
(2-5)2=9 (5-5)2=0
(4-5)2=1 (5-5)2=0
(4-5)2=1 (7-5)2=4
(4-5)2=1 (9-5)2=16
•Varian=(9+1+1+1+0+0+4+16)/8=4
•Standard Diviation=sqrt(4)=2

Finding the Mode
• first put the numbers in order,
•count how many of each number.
•EX1: 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
•In order numbers are: 3, 5, 7, 12, 13, 14, 20, 23, 23, 23, 23, 29, 39, 40, 56
•numbers appear most often: 23
•EX2: {19, 8, 29, 35, 19, 28, 15}
•In order number are: {8, 15, 19, 19, 28, 29, 35}
•Number appear most often: 19
•Note: More Than One Mode
•EX3: {1, 3, 3, 3, 4, 4, 6, 6, 6, 9}
•So there are two modes: at 3 and 6

Questoin #2
1. Select Transform->Compute Variable
Target Variable: average
Numeric Expression: (v1+v2)/2
OK
2. Save=> Open existing Data *.sav
3. Tramsform-> Record into Different Variables
• Double Click Average
• Name: grade
• Click: Change
• Old and New Variable
• Range
• Through
Range Though value
0 14.9 0
15 19.9 1
20 24.9 2
25 30 3

Questoin #3
1. Descriptive Statistic=> Descriptive =>
Variable=>
• Age
• Exam1
• Exam2
• Average
• Grade

Median value
• The Median is the "middle number" (in a sorted list of numbers).
•Form Example
•Example1: find the Median of 12, 3 and 5
•Put them in order: 3, 5, 12
•The middle number is 5, so the median is 5
•Example2: 3, 13, 7, 5, 21, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
•Put order: 3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56
•3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56

Questoin #4
•Graphs=> Legacy Dialogs
=> Histagram
• Variable=Grade

Variance
•The average of the squared differences from the Mean.
•To calculate the variance follow these steps:
•Work out the Mean (the simple average of the numbers)Then for each
number: subtract the Mean and square the result (the squared
difference).
•Then work out the average of those squared differences. (Why Square?)
•Mean = (600 + 470 + 170 + 430 + 300)/5=394

Variance
•Frequency: count or number of cases
•Valid: valid case having niether missing data nor valid Data
•Missing: user missing value or .
•Percent: value and also missing data
•Valid percent: percent of case for non missing value

Cumulative percentage
•another way of expressing frequency distribution.
•cumulative percentage = (cumulative frequency ÷ n) x 100

Question #5
•DATA
•Select Case
•If conditon is satified
•If
•Gender=“1” if string

Question #6
- Transform=> Calculate=>
- Variable Name: Mean(q1,q2,q3,q4)
- ok
•Will compute a new variable
by Mathematic operation

Question #6(continue)
Selecting Cases
• To select cases either by filtering (which keeps all the cases but limits further analyses to
selected cases) or by removing the cases that do not meet your criteria.
-Data => Select Case => If condition is satified(.)=> sex=1 => Contiue=> OK

Sorting Cases
We can sort on one or more variables, For example, we may want to sort the
records in our dataset by age and sex.
-Data=> Shorting Case=>Select what you want

Splitting a File
•splitting a file creates separate "layers" for the grouping variables.
•Data=> Split File=> Organize output by group(Sex of student)

Descriptive
•To calculate the means and standard deviations for age, all quizzes, and the
average quiz score
•Analyze=>Descriptive Statistics=> Descriptive

Exploring Means for Different Groups
• two or more groups, you may want to examine the means for each group as well as the
overall mean.
•Select Analyze, Compare Means, Means

Frequency Distributions and Histograms
•Select Analyze, Descriptive Statistics, Frequencies
•Click on the Charts=> Histagram

1. chapter i(pasw)

More Related Content

What's hot (18)

Viewers also liked (15)

Similar to 1. chapter i(pasw) (20)

More from Chhom Karath (20)

Recently uploaded (20)

1. chapter i(pasw)

Editor's Notes