Bj Rollison
Test Architect
Microsoft
http://www.TestingMentor.com
http://blogs.msdn.com/imtesty
Customer provided data
Domain expertise
Generally very limited in scope
Tester generated data
Happy path, probabilistic data
Input population poorly defined, human bias
Random data not representative of population
Static data files
Library of historical failure indicators
Too restrictive
Ineffective with multiple iterations
Large number of variables
Variable sequences can result in a virtual infinite
number of combinations
Impractical to test all values and combinations of
values in any reasonable testing cycle
Example:
NetBIOS name 15 alphanumeric characters
Using ASCII only chars, 82 allowable
characters (0x20  * + = | : ; “ ? < > , ) invalid*
Total number of possible input tests equals
8215 + 8214 + 8213…+ 821 =
51,586,566,049,662,994,687,009,994,574
It does not “look” like real world test data.
Years ago developers would argue that a name
textbox couldn’t contain a number!
To a computer, what is the difference between
the strings Margaret and ksjCu9ls?
Random data is not reproducible.
A seeded random generator will produce the
same exact result given the same seed value
Random data violates constraints of real data
Representative data from population
Deterministic algorithms
Sampling is commonly used in risk based testing
Samples must be representative
Samples must be statistically unbiased
Samples set must include variability for breadth
Random data generation provides variability, but
Simple random data may not be representative
Simple random data hard to reproduce
Goal – generate random data that is
Representative of the input data set
Statistically unbiased - random sample of
elements from a probability distribution
Value – input test data that
Provides greater variability
Includes expected and unexpected sequences
Eliminates human bias
Is better at evaluating robustness
Is dynamic!
System.Security.Cryptography
.RandomNumberGenerator class
Encrypted data indistinguishable from random
Cannot be seeded; no repeatability
System.Random class
Sequence of numbers that meet certain
statistical requirements for randomness
Can be seeded for repeatability
Not perfect, but reasonably random for
practical purposes
Comparison between RandomNumberGenerator
class and Random class
Red – RNG
Blue – Random
Both pseudo –
random
No obvious
pattern
based on sample by
Jeff Attwood
http://www.codinghorror.com
User defined seed
Tester provides seed value for repeatability
Dynamic seed
New seed value
generated at
runtime
Seed variable
must be
preserved in
test log
public static int GetSeedValue(
string seedValue)
{
int seed = 0;
if (seedValue != string.Empty)
{
seed = int.Parse(seedValue);
}
else
{
Random r = new Random();
seed = r.Next();
}
return seed;
}
Define the representative data set
Example – Credit card numbers
341846580149320
Card length –
(BIN + digits)
between 14 and
19 depending on
card type
Bank Identification
Number (BIN) –
between 1 and 4
digits depending
on card type
Checksum – Luhn (Mod 10) algorithm
Equivalence class partitioning decomposes data
into discrete valid and invalid class subsets
Card type Valid Class subsets Invalid Class subsets
American
Express
BIN – 34, 37
Length – 15 digits
Checksum – Mod 10
Unassigned BINs
Length <= 16 digits
Length >= 14 digits
Fail Checksum
Maestro BIN – 5020, 5038,
6034, 6759
Length – 16, 18
Checksum – Mod 10
Unassigned BINs
Length <= 15 digits
Length >= 19 digits
Length == 17 digits
Fail Checksum
Input variable Valid input Invalid input
Valid BIN
Number(s)
& Length
Seed
Generator
Is Valid
Luhn
Algorithm
Random
Number
Generator
Card
Length(s)
by Type
Get
credit card
Info
Input
(card type)
Output
(card #)
Input
(optional seed)
Assigned BINs ensures the data looks real
The Mod10 check ensures the data feels real
Result is representative of real data!
GetCardNumber(int cardType, int seed)
Get BIN (cardType, seed);
Get CardLength (cardType, seed);
Assign BIN to cardNumber;
Generate a new random object;
for (cardNumberLength < CardLength)
Generate a random number 0 <> 9;
Append it to the cardNumber;
if IsNotValidCardNumber(cardNumber)
while (IsNotValidCardNumber(cardNumber))
increment last number by 1;
return cardNumber;
Deterministic
algorithm to
generate a valid
random credit
card
Model
test
data
Generate
test data
Apply
test
data
Verify
results
Decompose the
data set for each
parameter using
equivalence class
partitioning
Generate valid
and invalid test
data adhering to
parameter properties,
business rules, and
test hypothesis
Apply the test
data to the
application
under test
Verify the actual
results against
the expected
results – oracle!
JCB Type 1
BIN = 35 Len = 16
JCB Type 2
BIN = 1800, 2131, Len = 15
Robust
testing
Multi-
language
input
testing
String length
fixed or variable
Seed value
Custom range for
greater controlUnicode
language
families
Assigned code
points
Reserved
characters
Unicode surrogate
pairs
1000 Unicode characters
from the sample population
Character corruption and
data loss
135 characters (bytes)
obvious data loss
Static test data wears out!
Random test data that is not repeatable or not
representative may find defects, but…
Probabilistic stochastic test data
Is a modeled representation of the population
Is statistically unbiased
Is especially good at testing robustness
Recommend using both static (real-world)test data
and probabilistic stochastic test data for breadth
Helping Testers
Unleash Their Potential!TM
http://www.TestingMentor.com
Bj.Rollison@TestingMentor.com
Practice .NET Testing with IR Data
Bj Rollison
http://www.stpmag.com/issues/stp-2007-06.pdf
Automatic test data generation for path testing
using a new stochastic algorithm
Bruno T. de Abreu, Eliane Martins, Fabiano L. de Sousa
http://www.sbbd-sbes2005.ufu.br/arquivos/16-%209523.pdf
Data Generation Techniques for Automated
Software Robustness Testing
Matthew Schmid & Frank Hill
http://www.cigital.com/papers/download/ictcsfinal.pdf
Tools
http://www.TestingMentor.com

More Related Content

PPTX
ISTQB Сертификация. Приводим знания в порядок
PDF
Introduction to Agile software testing
PPT
Agile Testing Process
PDF
"Test Design Techniques"
PDF
QA Lab: тестирование ПО. Станислав Шмидт: "Self-testing REST APIs with API Fi...
PDF
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
PPTX
The relationship between test and production code quality (@ SIG)
PDF
The Art of Identifying Vulnerabilities - CascadiaFest 2015
ISTQB Сертификация. Приводим знания в порядок
Introduction to Agile software testing
Agile Testing Process
"Test Design Techniques"
QA Lab: тестирование ПО. Станислав Шмидт: "Self-testing REST APIs with API Fi...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
The relationship between test and production code quality (@ SIG)
The Art of Identifying Vulnerabilities - CascadiaFest 2015

Similar to Bj Rollison - Pobabillistic Stochastic Test Data (20)

PPTX
SET BOUNDARIES - FIND PEACE - Input Validation
PDF
Generating Complex and Faulty Test Data Through Model-Based Mutation Analysis
PPTX
PPTX
Data Quality with AI
PPTX
wk5ppt2_Iris
PDF
What We Talk About When We Talk About Unit Testing
PDF
Intel Random Number Generator
PDF
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
DOCX
Computing homework validation
PDF
DSR Testing (Part 1)
PPSX
Automated cheque recognition
PPTX
What Do the Asserts in a Unit Test Tell Us About Code Quality? (CSMR2013)
PPTX
01-introduction-240417110739-ee694cd2.pptx
PDF
Software testing
PPT
Bit Vectors Siddhesh
PPTX
Credit card fraud detection using machine learning Algorithms
PDF
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...
PDF
Issta21 presentation lingfeng_zhang
PDF
How deal with legacy code and not die trying v1.7
PPTX
Introduction to Descriptive & Predictive Analytics
SET BOUNDARIES - FIND PEACE - Input Validation
Generating Complex and Faulty Test Data Through Model-Based Mutation Analysis
Data Quality with AI
wk5ppt2_Iris
What We Talk About When We Talk About Unit Testing
Intel Random Number Generator
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
Computing homework validation
DSR Testing (Part 1)
Automated cheque recognition
What Do the Asserts in a Unit Test Tell Us About Code Quality? (CSMR2013)
01-introduction-240417110739-ee694cd2.pptx
Software testing
Bit Vectors Siddhesh
Credit card fraud detection using machine learning Algorithms
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...
Issta21 presentation lingfeng_zhang
How deal with legacy code and not die trying v1.7
Introduction to Descriptive & Predictive Analytics
Ad

More from TEST Huddle (20)

PPTX
Why We Need Diversity in Testing- Accenture
PPTX
Keys to continuous testing for faster delivery euro star webinar
PPTX
Why you Shouldnt Automated But You Will Anyway
PDF
Being a Tester in Scrum
PDF
Leveraging Visual Testing with Your Functional Tests
PPTX
Using Test Trees to get an Overview of Test Work
PPTX
Big Data: The Magic to Attain New Heights
PPTX
Will Robots Replace Testers?
PPTX
TDD For The Rest Of Us
PDF
Scaling Agile with LeSS (Large Scale Scrum)
PPTX
Creating Agile Test Strategies for Larger Enterprises
PPTX
Is There A Risk?
PDF
Are Your Tests Well-Travelled? Thoughts About Test Coverage
PDF
Growing a Company Test Community: Roles and Paths for Testers
PDF
Do we need testers on agile teams?
PDF
How to use selenium successfully
PDF
Testers & Teams on the Agile Fluency™ Journey
PDF
Practical Test Strategy Using Heuristics
PDF
Thinking Through Your Role
PDF
Using Selenium 3 0
Why We Need Diversity in Testing- Accenture
Keys to continuous testing for faster delivery euro star webinar
Why you Shouldnt Automated But You Will Anyway
Being a Tester in Scrum
Leveraging Visual Testing with Your Functional Tests
Using Test Trees to get an Overview of Test Work
Big Data: The Magic to Attain New Heights
Will Robots Replace Testers?
TDD For The Rest Of Us
Scaling Agile with LeSS (Large Scale Scrum)
Creating Agile Test Strategies for Larger Enterprises
Is There A Risk?
Are Your Tests Well-Travelled? Thoughts About Test Coverage
Growing a Company Test Community: Roles and Paths for Testers
Do we need testers on agile teams?
How to use selenium successfully
Testers & Teams on the Agile Fluency™ Journey
Practical Test Strategy Using Heuristics
Thinking Through Your Role
Using Selenium 3 0
Ad

Bj Rollison - Pobabillistic Stochastic Test Data

  • 2. Customer provided data Domain expertise Generally very limited in scope Tester generated data Happy path, probabilistic data Input population poorly defined, human bias Random data not representative of population Static data files Library of historical failure indicators Too restrictive Ineffective with multiple iterations
  • 3. Large number of variables Variable sequences can result in a virtual infinite number of combinations Impractical to test all values and combinations of values in any reasonable testing cycle Example: NetBIOS name 15 alphanumeric characters Using ASCII only chars, 82 allowable characters (0x20 * + = | : ; “ ? < > , ) invalid* Total number of possible input tests equals 8215 + 8214 + 8213…+ 821 = 51,586,566,049,662,994,687,009,994,574
  • 4. It does not “look” like real world test data. Years ago developers would argue that a name textbox couldn’t contain a number! To a computer, what is the difference between the strings Margaret and ksjCu9ls? Random data is not reproducible. A seeded random generator will produce the same exact result given the same seed value Random data violates constraints of real data Representative data from population Deterministic algorithms
  • 5. Sampling is commonly used in risk based testing Samples must be representative Samples must be statistically unbiased Samples set must include variability for breadth Random data generation provides variability, but Simple random data may not be representative Simple random data hard to reproduce
  • 6. Goal – generate random data that is Representative of the input data set Statistically unbiased - random sample of elements from a probability distribution Value – input test data that Provides greater variability Includes expected and unexpected sequences Eliminates human bias Is better at evaluating robustness Is dynamic!
  • 7. System.Security.Cryptography .RandomNumberGenerator class Encrypted data indistinguishable from random Cannot be seeded; no repeatability System.Random class Sequence of numbers that meet certain statistical requirements for randomness Can be seeded for repeatability Not perfect, but reasonably random for practical purposes
  • 8. Comparison between RandomNumberGenerator class and Random class Red – RNG Blue – Random Both pseudo – random No obvious pattern based on sample by Jeff Attwood http://www.codinghorror.com
  • 9. User defined seed Tester provides seed value for repeatability Dynamic seed New seed value generated at runtime Seed variable must be preserved in test log public static int GetSeedValue( string seedValue) { int seed = 0; if (seedValue != string.Empty) { seed = int.Parse(seedValue); } else { Random r = new Random(); seed = r.Next(); } return seed; }
  • 10. Define the representative data set Example – Credit card numbers 341846580149320 Card length – (BIN + digits) between 14 and 19 depending on card type Bank Identification Number (BIN) – between 1 and 4 digits depending on card type Checksum – Luhn (Mod 10) algorithm
  • 11. Equivalence class partitioning decomposes data into discrete valid and invalid class subsets Card type Valid Class subsets Invalid Class subsets American Express BIN – 34, 37 Length – 15 digits Checksum – Mod 10 Unassigned BINs Length <= 16 digits Length >= 14 digits Fail Checksum Maestro BIN – 5020, 5038, 6034, 6759 Length – 16, 18 Checksum – Mod 10 Unassigned BINs Length <= 15 digits Length >= 19 digits Length == 17 digits Fail Checksum Input variable Valid input Invalid input
  • 12. Valid BIN Number(s) & Length Seed Generator Is Valid Luhn Algorithm Random Number Generator Card Length(s) by Type Get credit card Info Input (card type) Output (card #) Input (optional seed)
  • 13. Assigned BINs ensures the data looks real The Mod10 check ensures the data feels real Result is representative of real data! GetCardNumber(int cardType, int seed) Get BIN (cardType, seed); Get CardLength (cardType, seed); Assign BIN to cardNumber; Generate a new random object; for (cardNumberLength < CardLength) Generate a random number 0 <> 9; Append it to the cardNumber; if IsNotValidCardNumber(cardNumber) while (IsNotValidCardNumber(cardNumber)) increment last number by 1; return cardNumber; Deterministic algorithm to generate a valid random credit card
  • 14. Model test data Generate test data Apply test data Verify results Decompose the data set for each parameter using equivalence class partitioning Generate valid and invalid test data adhering to parameter properties, business rules, and test hypothesis Apply the test data to the application under test Verify the actual results against the expected results – oracle!
  • 15. JCB Type 1 BIN = 35 Len = 16 JCB Type 2 BIN = 1800, 2131, Len = 15
  • 16. Robust testing Multi- language input testing String length fixed or variable Seed value Custom range for greater controlUnicode language families Assigned code points Reserved characters Unicode surrogate pairs
  • 17. 1000 Unicode characters from the sample population
  • 18. Character corruption and data loss 135 characters (bytes) obvious data loss
  • 19. Static test data wears out! Random test data that is not repeatable or not representative may find defects, but… Probabilistic stochastic test data Is a modeled representation of the population Is statistically unbiased Is especially good at testing robustness Recommend using both static (real-world)test data and probabilistic stochastic test data for breadth
  • 20. Helping Testers Unleash Their Potential!TM http://www.TestingMentor.com [email protected]
  • 21. Practice .NET Testing with IR Data Bj Rollison http://www.stpmag.com/issues/stp-2007-06.pdf Automatic test data generation for path testing using a new stochastic algorithm Bruno T. de Abreu, Eliane Martins, Fabiano L. de Sousa http://www.sbbd-sbes2005.ufu.br/arquivos/16-%209523.pdf Data Generation Techniques for Automated Software Robustness Testing Matthew Schmid & Frank Hill http://www.cigital.com/papers/download/ictcsfinal.pdf Tools http://www.TestingMentor.com