SlideShare a Scribd company logo
Database Design
Girl Develop It is here to provide affordable and
accessible programs to learn software through
mentorship and hands-on instruction.
Presented for Girl Develop It
Sondra Willhite
software developer at scrubjay technology
● over ten years working on applications with SQL
Server or Microsoft Access backend
● also have experience working as a BI analyst and
a brief stint working help desk / systems admin
https://www.linkedin.com/in/sondrawillhite
Intros
● careers in database
● overview of some database models
● database management systems
● the relational database model
● structure
● keys
● referential integrity
● normalization
Agenda
Careers in Database
● database administrator (DBA)
● database developer
● business intelligence analyst
● database consultant
● data scientist
Database Careers
a systems administrator for database
● backups and restores
● database availability
● partitioning
● security
● performance
Database Administrator
Network Diagram
Cylinder =
Database
Database Administrator
software developers who specialize in database
● query database
● script schema changes
● stored procedures
● triggers and constraints
● database design
● performance and security
Database Developer
database management software
application code
drivers, libraries, frameworks, etc
Application Stack Diagram
Database Developer
writes custom queries and reports, including data
visualizations
● aka report writer/business analyst
● SQL language experts
● platforms
○ SSRS (SQL Server Reporting Services)
○ Crystal Reports
○ Tableau / Power BI
○ Microsoft Access
● performance and security
BI Analyst
Report Writer / Data Analyst’s View
Entity Relationship Diagram
Tables
Keys
Fields
Indexes
BI Analyst
specialist DBAs/developers
● performance tuning
○ finding the bottlenecks
● security controls
○ groups and user controls
○ encryption (database, columns, rows)
● availability
○ mirroring and fail-over clusters
○ cloud systems (Azure, AWS)
Database Consultant
specialize in creating data models
● especially for predictive modeling
● mathematical background in statistics
● look beyond relational database models:
○ big data
○ graph data
○ warehouse data
Data Scientist
Some Database Models
an unordered structured set of data
Database Definition
key : value pairs
{ vendor : abc manufacturing }
Big Data
nodes and edges
Graph Data
LA NYC
5 hours
data structured along dimensions
Warehouse Data
sales
tables and fields
Relational Data
Customer ID First Last
100011 Jane Doe
Customer ID Item Cost
100011 Intro Database $25
100011 Data Modeling $30
Database Management Systems
software that provides ways to
store, modify and retrieve data
● Microsoft SQL Server
● Oracle
● IBM's DB2
● PostgreSQL
● MySQL
● Microsoft Access
Database Management Systems
DBMS responsibilities
● data integrity
● data consistency
● multi-user access
● performance tools
● security and auditing
● backup and recovery
● extraction, transformation and load (ETL)
● business intelligence tools
Database Management Systems
ensuring "valid" data
● data types ensure that, say, date fields only store
date values
● constraints and triggers allow for complex rules to
be applied (eg, you cannot delete a client who has an
upcoming appointment)
Data Integrity
ensuring that database is in a "valid" state
● record locks prevent "dirty reads/writes"
● commit and rollback mechanisms ensure
transactions are either fully completed or fully rolled
back
Data Consistency
• fine-grained record locking prevents queries
from blocking others
• indexes speed up lookups and joins by
magnitudes
• query optimizers find the fastest way to
execute your query
Performance
● tools to backup and restore to a point in time
○ log files make this possible
● database mirroring and fail-over clustering
Backup and Recovery
• multi-tiered security (server level, database
level, column level; role-level and user-level)
• logs can be queried for auditing (not directly)
• tied in to data integrity
Security and Auditing
another acronym to describe data integrity and data
consistency concepts in relational databases
Atomicity
Consistency
Isolation
Durability
ACID
a transaction must be all or nothing
Atomicity
invalid data causes transaction to roll back
Consistency
transactions are processed independently of other
transactions
Isolation
once committed a transaction is permanent, even in
the event of a system failure (eg, power outage)
Durability
example: consider the transaction of moving $100
from your checking to your savings account.
steps:
1. confirm accounts valid
2. confirm checking account has available funds
3. debit checking account by $100
4. credit savings by $100
ACID
Case 1: building loses power between step 3 and 4
because the transaction was not fully completed,
atomicity ensures that the transaction is rolled back.
ACID
Case 2: memory corruption causes step 4 to credit
checking by $100,000,000.
because the transaction leaves the system is an
inconsistent state, consistency will ensure that the
transaction is rolled back (sorry!)
ACID
Case 3: at the same time that you transfer funds, your
utility cashes a check that brings your checking
balance to $50
isolation ensures that either the utility clears before
your transaction (meaning your transaction will be
rejected) or your transaction finishes first (meaning
the utility check will bounce)
ACID
Case 4: building loses power right after your
transaction completes
durability guarantees that the transaction is
permanent
ACID
Relational Database Model
data model invented by Edgar Codd in 1970
● data is stored in tables and fields
● a set of rules, the normal forms, ensures
that the universe of data will be preserved
● data can be read and updated using SQL
○ Structured Query Language
○ all DBMS’s recognize SQL, but some
small differences exist between them
Relational Data Model
still has the lion's share of the database market
• ACID makes it reliable for critical
transactional systems
• is an all-purpose database
• older models - hierarchical, network - did not
always preserve the universe of data
• newer models are speciality models - they
typically solve one problem but come at a
high cost in other areas
Relational Data Model
two meanings of the "relation" in relational model
• data points “related” to each other are stored in
a table
• tables are “related” to each other by special
fields (keys)
Relational Data Model
storing the data as relations
• eliminates redundancy
• saves space
• reduces mistakes (ties in to consistency)
Relational Data Model
appointments.xlsx
cvClient Phone Addr Service Appt Date
Anna 215-123-4567 123 City Lane Nails 5/1/2013
Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013
Anna 215-123-4576 123 Mock Ln. Hair 9/1/2013
Redundancy in Excel
duplicating data not only wastes space but is error prone
appointments.xlsx
cvClient Phone Addr Service Appt Date
Anna 215-123-4567 123 City Lane Nails 5/1/2013
Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013
Anna 215-123-4576 123 Mock Ln. Hair 9/1/2013
Client Service Date
Anna Nails 5/1/2013
Nathan Hair 7/5/2013
Anna Hair 9/1/2013
Name Phone Address
Anna 215-123-4567 123 City Lane
Nathan 267-333-4444 999 Oak Blvd
appointments (table)clients (table)
Redundancy in Excel
storing the data as relations
• eliminates redundancy
• saves space
• reduces mistakes (ties in to consistency)
• guarantees data completeness (the universe of
data is preserved)
Relational Data Model
just means the set of data that we’re modeling
Client Phone Addr Service Appt Date DOB
Anna 215-123-4567 123 City Lane Nails 5/1/2013 8/14/1995
Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013 6/1/1998
Anna 215-123-4576 123 Mock Ln. Hair 9/1/2013 8/14/1995
in our example, the data points above is our “universe of data”
The Universe of Data
the process of splitting data into tables
Client Phone Addr Service Appt Date DOB
Anna 215-123-4567 123 City Lane Nails 5/1/2013 8/14/1995
Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013 6/1/1998
Anna 215-123-4576 123 Mock Ln. 8/14/1995
Anna Hair 9/1/2013
Decomposition
if we follow the rules set forth by Edgar Codd in the
normal forms when decomposing our data into tables,
then we are guaranteed that we'll be able to reconstruct
our universe of data using SQL
Decomposition
our spreadsheet decomposed into two tables
Name Service Date
Anna Nails 5/1/2013
Nathan Hair 7/5/2013
Anna Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City Lane 8/14/1995
Nathan 267-333-4444 999 Oak Blvd 6/1/1998
Decomposition
decomposing data means we need a mechanism to
put it back together again
Name Service Date
Anna Nails 5/1/2013
Nathan Hair 7/5/2013
Anna Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-
4567
123 City Lane 8/14/1995
Nathan 267-333-
4444
999 Oak Blvd 6/1/1998
Decomposition
in database we put back our universe by joining tables
Service Date
Nails 5/1/2013
Hair 7/5/2013
Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City
Lane
8/14/1995
Nathan 267-333-4444 999 Oak
Blvd
6/1/1998
Bowtie = Join
Joins
Appointments (A)Clients (C)
every row in table C is matched to a row in table A on
some field - this special field is called a key
Service Date
Nails 5/1/2013
Hair 7/5/2013
Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-
4567
123 City Lane 8/14/1995
Nathan 267-333-
4444
999 Oak Blvd 6/1/1998
Joins
Service Appt Date
Nails 5/1/2013
Hair 7/5/2013
Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City
Lane
8/14/1995
Nathan 267-333-4444 999 Oak
Blvd
6/1/1998
What is C A on DOB = Appt Date?
Joins
Clients Appointments
Service Appt Date
Nails 5/1/2013
Hair 7/5/2013
Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City
Lane
8/14/1995
Nathan 267-333-4444 999 Oak
Blvd
6/1/1998
What is C A on DOB = Appt Date ?
Clients Appointments
first iteration: find every appointment with an Appt Date of 8/14/1995
Joins
Service Appt Date
Nails 5/1/2013
Hair 7/5/2013
Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City
Lane
8/14/1995
Nathan 267-333-4444 999 Oak
Blvd
6/1/1998
Clients Appointments
second iteration: find every appointment with an Appt Date of 6/1/1998
what is C A on DOB = Appt Date ?
Joins
Service Appt Date Name Phone Addr DOB
Joins
Service Appt Date
Nails 5/1/2013
Hair 7/5/2013
Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City
Lane
8/14/1995
Nathan 267-333-4444 999 Oak
Blvd
6/1/1998
Clients Appointments
what is C A on DOB = Appt Date?
Name Service Appt Date
Anna Nails 5/1/2013
Nathan Hair 7/5/2013
Anna Hair 9/1/2013
Name Phone Address DOB
Anna 215-123-4567 123 City
Lane
8/14/1995
Nathan 267-333-4444 999 Oak
Blvd
6/1/1998
AppointmentsClients
keys are not arbitrary - during decomposition, we always
copy a special field to each table to serve as the key
Joins
C A on Name = Name
Name Phone Addr DOB Name Service Appt Date
Anna 215-123-4567 123 City Lane 8/14/1995 Anna Nails 5/1/2013
Nathan 267-333-4444 999 Oak Blvd 6/1/1998 Nathan Hair 7/5/2013
Anna 215-123-4567 123 City Lane 8/14/1995 Anna Hair 9/1/2013
the universe of data is reassembled by joining tables.
Joins
Primary and Foreign Keys
● these fields are known as the Primary Key and
the Foreign Key
● picking the Name as the field that relates Clients
to Appointments was intuitive
● there’s rules on how to identify them
tables are joined on special fields called keys
Keys
a field (or set of fields) that uniquely identify a row
➢ the minimal set of fields that the row is
functionally dependent upon
Primary Key
the branch of mathematics providing the foundation
for the relational database model
Relational Algebra
let A and B be sets of fields in a table,
a functional dependency exists
A --> B
if for every row[A] in the table, we get back row[B]
Functional Dependencies
let A = (SSN), B = (First Name, Last Name)
can we can say A --> B?
SSN First Name Last Name
111-11-1111 Anna Jones
222-22-2222 Nathan Smith
111-11-1111 ? ?
Functional Dependencies
A = (Name, Service), B = (Appt Date)
Does (Name, Service) --> (Appt Date)?
Name Service Appt Date
Anna Nails 5/1/2013
Nathan Hair 7/5/2013
Anna Nails ?
if so, then it means our system will only let Anna make an
appointment for nails on 5/1/2013 and no other date.
Functional Dependencies
(SSN) → (First Name, Last Name) is a business rule set by
the US government
database designers work with business folks to define
functional dependencies.
the more you work databases, the more you see the same
business rules over and over
functional dependencies reflect business rules
Functional Dependencies
the minimal set of fields that the row is functionally
dependent upon
Name Phone Address City
Anna Jones 215-123-4567 123 City Lane Philadelphia
Nathan Smith 267-333-4444 999 Oak Blvd Media
Clients
Primary Key Candidate
Name Phone Address City
Anna Jones 215-123-4567 123 City Lane Philadelphia
Nathan Smith 267-333-4444 999 Oak Blvd Media
Clients
(Name) → (Phone, Address, City)
Primary Key Candidate
(Name) → (Phone, Address, City)
● every time I see a particular Name, I expect to
get back the same address.
● does the reverse hold true?
no!! FD’s are one-way functions
the above assertion allows more than one
person to live at the same address, but
prevents one person from living multiple
(primary) addresses.
Primary Key Candidate
Name Phone Address City
Anna Jones 215-123-4567 123 City Lane Philadelphia
Nathan Smith 267-333-4444 999 Oak Blvd Media
Clients
so is Name a good primary key candidate?
Primary Key Candidate
(Name) → (Phone, Address, City) meets the above
Primary Key
a field (or set of fields) that uniquely identify a row
➢ the minimal set of fields that the row is
functionally dependent upon
PK additional considerations
• values in PK must be unique for each
record in a table
• only one PK per table allowed
• it’s automatically indexed (for fast lookup)
Primary Key
Name Phone Address City
Anna Jones 215-123-4567 123 City Lane Philadelphia
Nathan Smith 267-333-4444 999 Oak Blvd Media
Clients
the values stored in a primary key must be
unique within the table - names make poor PKs
Primary Key Candidate
Client_ID Name Phone Address City
1 Anna 215-123-4567 123 City Lane Philadelphia
2 Nathan 267-333-4444 999 Oak Blvd Media
Clients
for this reason, you’ll usually see an unique ID
field used as a PK in most tables
Primary Key
primary keys are often “ID” fields in all tables
● this is done for convenience.
○ ID fields are usually autoincrement fields
● frameworks like CakePHP, Drupal, etc use this convention
● primary keys are automatically indexed, and numbers are
faster to index
● other tables may refer back to another table’s PK, and it’s
easier to bring in one field instead of multiple fields
Primary Key
Appt_ID Name Service Date
1000 Anna Nails 5/1/2013
1001 Nathan Hair 7/5/2013
1002 Anna Nails 9/1/2013
Primary Key
Appointments
(Appt_ID) → (Name, Service, Date)
a field (or set of fields) that is a PK in some other
table
● There can be multiple FKs in a table
● FK are how you designate that tables are
related
Foreign Reference Keys
Appt_ID Name Service Date
1000 Anna Nails 5/1/2013
1001 Nathan Hair 7/5/2013
Client_ID Name Phone
1 Anna 215-123-4567
2 Nathan 267-333-4444
are there any foreign keys?
Appointments
Clients
Foreign Keys
Appt_ID Client_ID Service Date
1000 1 Nails 5/1/2013
1001 2 Hair 7/5/2013
Client_ID Name Phone
1 Anna 215-123-4567
2 Nathan 267-333-4444
a field that is a PK in some other table
Appointments
Clients
Foreign Keys
Appt_ID Client_ID Service Date
1000 1 Nails 5/1/2013
1001 2 Hair 7/5/2013
Client_ID Name Phone
1 Anna 215-123-4567
2 Nathan 267-333-4444
Appointments
Clients
Foreign Keys
primary keys and foreign keys can be designated
using the SQL language:
CREATE TABLE Appointments (
Appt_ID INT PRIMARY KEY,
Client_ID INT, FOREIGN KEY REFERENCES
Clients(Client_ID)
ON UPDATE CASCADE ON DELETE RESTRICT
Service TEXT,
Appt_Date DATE
)
Implementing Keys
primary keys and foreign keys can also be designated
via the DBMS user interface:
Snippet from PHPMyAdmin (MySQL)
Implementing Keys
Referential Integrity
a database has referential integrity if rules are in
place that ensure that a FK can never point to a row
that doesn’t exist
Referential Integrity
Referential Integrity
Appt_ID Client_ID Service Date
1000 1 Nails 5/1/201
1001 2 Hair 7/5/2017
Client_ID Name Phone
1 Anna 215-123-4567
Appointments
Clients
?
orphaned row -->
CASCADE
cascade changes in PK to all referencing FKs
RESTRICT (or NO ACTION)
don’t allow changes to PK if there is a referencing FK
Referential Constraints
what will happen if we change Anna’s Client_ID from 1
to 1001?
what will happen if we delete Anna’s record from the
Clients table?
Appointments Table
Referential Constraints
Break Time
● the universe of data is broken out into tables
● the data points in tables are related to one
another
○ functional dependencies formalize how we
recognize related data
● tables are joined to put the universe of data back
together
● tables are joined on primary keys and foreign
keys
● primary key values are unique within a table
● declaring foreign keys ensures that our database
has referential integrity
Recap
Indexes
a special data structure that speeds up data retrieval
Indexes
consider the task of trying to find all characters
named Anna in War and Peace.
• without an index, would pretty much have
to read or scan the entire novel
• a character index would require just a few
page turns
• an index on a field works in a similar
fashion - using a special data structure
called a b-tree
Indexes
Primary Key (PK)
it’s automatically indexed (for fast lookup) -
why?
joins are expensive!
every row from one table is matched to
every row in another table.
Indexes
Appt_ID Client_ID Date
100001 1832 8/1/2008
100002 2432 7/5/2013
... ... ...
1000000 43901 2/1/2017
Client_ID Name
1 Anna
2 Nathan
... ...
100000 Gia
AppointmentsClients
get all Clients who have had at least one Appointment
to look for any appointments for client 1, we have to scan the
Appointments table up to 1 million times
Joins
then we do the same for client 2
Appt_ID Client_ID Date
100001 1832 8/1/2008
100002 2432 7/5/2013
... ... ...
1000000 43901 2/1/2017
Client_ID Name
1 Anna
2 Nathan
... ...
100000 Gia
AppointmentsClients
Joins
by the time we reach client 100,000, we’ve scanned all 1
million rows of Appointments 100000 times!
Appt_ID Client_ID Date
100001 1832 8/1/2008
100002 2432 7/5/2013
... ... ...
1000000 43901 2/1/2017
Client_ID Name
1 Anna
2 Nathan
... ...
100000 Gia
AppointmentsClients
Joins
the cost of joining two tables, M and N, is M x N
in our previous example, that means
100,000 x 1,000,000 = 100,000,000,000
Joins
● computers are fast, but a M x N operation is still
expensive!
● most queries will join multiple tables, not just two
● indexes reduce the time of this query to roughly M
Joins
to prevent a full table scan in table A for each row in
table C, we use the index.
Client_ID Name
1 Anna
2 Nathan
... ...
Clients
Does Client 1
have
Appointment?
Yes, found Client 1 at location ….
Joins using Indexes
the index has magical properties that allow it to find
any piece of data in just a few lookups
Joins using Indexes
it’s secret is that unlike a database, the values in an
index are ordered, so it knows which branch to look in
A M
F
Joins using Indexes
by asking the index, it now only takes 100000 lookups
(okay, more like 300000). much better than
100,000,000,000!
Client_ID Name
1 Anna
2 Nathan
... ...
Clients
Joins using Indexes
like all good things, there’s a trade-off.
● while indexes speed up reads, they slow
down writes.
● too many indexes can result in slower queries!
○ if the query analyzer can’t make sense of
your indexes, it won’t
indexes are critical for performance
so why not just index every field?
Indexes
● primary keys are automatically indexed by the DBMS
o #1 reason why every table should always, always
have a PK
● foreign keys
○ especially if you enforce referential integrity
● fields that will be queried over and over
o name fields
o phone number, if you lookup people by phone
o state, if you produce mass mailings by state
what fields should be indexed?
Indexes
Relationship Cardinality
there are three types of relations between tables
one-to-one
one-to-many
many-to-many
Relationship Cardinality
each row in table A has precisely one match in table B
One-to-One
One-to-One
Emp_ID Dept
1 Marketing
2 Operations
... ...
Emp_ID SSN Salary
1 111111111 45000
2 222222222 35000
... ...
Employee Public Data Employee Secret Data
One-to-One
Emp_ID Dept
1 Marketing
2 Operations
... ...
M HR
Emp_ID SSN Salary
1 111111111 45000
2 222222222 35000
... ...
? 333333333 50000
answer: M
each row in A has 0 to many matches in B
One-to-Many
One-to-Many
Client_ID Name
1 Anna
2 Nathan
... ...
AppointmentsClients
Appt_ID Client_ID Service Date
1000 1 Nails 5/1/2013
1001 2 Hair 7/5/2013
1002 1 Nails 9/1/2013
...
One-to-Many
Client_ID Name
1 Anna
2 Nathan
... ...
M
AppointmentsClients
Appt_ID Client_ID Service Date
1000 1 Nails 5/1/2013
1001 2 Hair 7/5/2013
1002 1 Nails 9/1/2013
...
N
answer: N
Inner Join
returns all rows from both tables where there is a match
each row in A has 0 to many matches in B
each row in B has 0 to many matches in A
to represent this relationship, a third table C is
created (called the cross reference table)
Many-to-Many
Many-to-Many
Clients
Client_ID Name
1 Anna
2 Nathan
Thing_ID Name
1000 Newspaper
2000 Baseball
Things
Client_ID Thing_ID
1 1000
2 2000
2 1000
Clients_Favorite_Things
Many-to-Many
Clients
Client_ID Name
1 Anna
2 Nathan
... ...
M
Thing_ID Name
1000 Newspaper
2000 Baseball
... ...
N
Things
Client_ID Thing_ID
1 1000
2 2000
2 1000
… ...
L
Clients_Favorite_Things
answer: L
either table can store the PK or FK: the designer must
choose who gets what
Emp_ID DOH SSN
1 5/1/2011 111-11-1111
2 8/17/2012 222-22-2222
Emp_ID First Last Title
1 April Smith Technician
2 Jamie Hawkins Marketing Manager
One-to-One Implementation
Employee Public Data
Employee Secret Data
the many side stores the FK pointing to the one side's PK
Appt_ID Client_ID Service Date
100001 1 Nails 5/1/2013
100002 2 Hair 7/5/2013
100003 1 Hair 6/1/2013
Client_ID Name Phone
1 Anna 215-123-4567
2 Nathan 267-333-4444
Appointments
Clients
One-to-Many Implementation
neither table can store the FK of the other - a third table
represents the relationship, storing the PK of both tables
Clients
Client_ID Name
1 Anna
2 Nathan
Thing_ID Name
1000 Newspaper
2000 Baseball
Things
Client_ID Thing_ID
1 1000
2 2000
2 1000
Clients_Favorite_Things
Many-to-Many Implementation
one of these relationships is unnecessary
one-to-one
one-to-many
many-to-many
Relationship Cardinality
Data Types
in a relational database, all fields must be assigned a
datatype that the values will be saved as.
Client Service Date Time Technician Price
Anna Nails 5/1/2013 10:00 am 100 $30
Nathan Hair 7/5/2013 3:30 pm 200 $25
most of the time this a straightforward process.
Text
Varchar
Date Time Integer Currency
Data Types
SSN Phone ZIP Comments Position
111-11-1111 (215) 111-1111 19102 Blah blah bla 1.849308339
222-22-2222 (215) 222-2222 19147 This is anoth 1.890223333
other times will require some careful thought and a
judgement call
text or number?
storing scientific
measurements?
beware that
"floats" may
truncate your
precision
seems
like a
number,
but what
about
Canada
?
data could
be
truncated if
text field is
made too
small.
Data Types
The Normal Forms
the normal forms are the specifications for how to
split your fields into tables in such a way that
● eliminates redundancy
● prevents data anomalies
● functional dependencies are preserved, thereby
enabling
● loss-less joins (the tables can be put back
together to yield the precise universe of data)
Normal Forms
1st Normal Form (1NF)
2nd Normal Form (2NF)
3rd Normal Form
4th Normal Form (4NF)
Boyce-Codd Normal Form
5th Normal Form (5NF)
Higher forms (academic)
Normal Forms
your database is considered to be normalized
if it is in least 3NF
• the goal of normalizing a database is to
prevent data anomalies from occurring
during insert, update or delete operations,
and most 3NF tables are free of these
anomalies.
Normal Forms
data in each field is atomic
(cannot be decomposed into additional fields)
First Normal Form (1NF)
each field is atomic
● basically says a field cannot contain a table (or multiple
values)
EmpID Favorite Things
1 Mittens, Raindrops
2 Raindrops, Schnitzel
3 Doorbells, Mittens
Not in 1NF
First Normal Form (1NF)
EmpID Favorite Things
1 Mittens, Raindrops
2 Raindrops, Schnitzel
3 Doorbells, Mittens
how do we get this in 1NF?
First Normal Form (1NF)
decomposition into 1NF depends on relationship type
➔ one-to-one: make new fields
➔ one-to-many: make new table
➔ many-to-many: make two new tables
First Normal Form (1NF)
what is the relationship type between
employee and favorite things?
EmpID Favorite Things
1 Mittens, Raindrops
2 Raindrops, Schnitzel
3 Doorbells, Mittens
First Normal Form (1NF)
EmpID
1
2
ThingID Favorite Things
100 Mittens
200 Raindrops
300 Schnitzel
400 Doorbells
EmpID ThingID
1 100
1 200
2 200
2 300
➔ many-to-many: make two new tables
First Normal Form (1NF)
EmpID Name
1 Mary Smith
2 Todd T Burke
depends...
○ do you want to search/sort by Last Name?
○ do want to be compliant with industry standard?
○ business rules determine what is considered
“normal”
EmpID First Name Last Name
1 Mary Smith
2 Todd Burke
In 1NF? Definitely 1NF
First Normal Form (1NF)
in 1NF, and every field in a table is functionally
dependent on a subset of the PK
Second Normal Form (2NF)
in 2NF, and every field in a table is functionally
dependent on all fields in the PK
Third Normal Form (3NF)
the database is already in 1NF
• every field in a table is functionally dependent on all
fields in the PK
Appt_ID Appt Date Client_ID Service Service
Price
1001 2/23/13 1 Nails 20
1002 2/24/13 2 Hair 30
is this in 3NF?
Third Normal Form (3NF)
to determine if table is in 3NF
• identify the primary key
• determine if every field is dependent on PK
Appt_ID Appt Date Client_ID Service Service
Price
1001 2/23/13 1 Nails 20
1002 2/24/13 2 Hair 30
what’s the primary key?
Third Normal Form (3NF)
is every field dependent on Appt_ID?
Appt_ID Appt Date Client_ID Service Service
Price
1001 2/23/13 1 Nails 20
1002 2/24/13 2 Hair 30
(Appt_ID) -> (Date, Client, Service, Service Price)?
Do the fields contain all the information, and only the
information, needed to define an “appointment”?
Third Normal Form (3NF)
if we assert (Service) -> (Service Price), then not 3NF.
Appt_ID Appt Date Client_ID Service Service
Price
1001 2/23/13 1 Nails 20
1002 2/24/13 2 Hair 30
Third Normal Form (3NF)
decomposition into 3NF: independent functional
dependencies become new relations (tables)
Third Normal Form (3NF)
decomposition into 3NF depends on relationship type
➔ one-to-one: make new table
➔ one-to-many: make new table
➔ many-to-many: make two new tables
Third Normal Form (3NF)
treat the independent
(Service) -> (Service Price)
functional dependency as a new relation (table)
Appt_ID Appt Date Client Service
1 2/23/13 Anna Nails
2 2/24/13 Nathan Hair
Service Service Price
Nails 20
Hair 30
Third Normal Form (3NF)
Appointments Services
● data is stored in tables
● normal forms guide us in how to determine which fields
belong in which tables
● databases in 3NF are free from most anomalies
● every table should be assigned a primary key
● tables are related to each other on primary keys and foreign
keys
● three types of relationships - 1:1, 1:N, N:N
● a N:N relationship is represented by a cross-reference table
● declaring foreign keys ensures database has referential
integrity
● two types of referential integrity constraints: cascade, restrict
● all fields that will be frequently qualified in a query should be
indexed
Recap

More Related Content

What's hot (20)

PPTX
Database management system
krishna partiwala
 
PPTX
Dbms and sqlpptx
thesupermanreturns
 
PPT
Data Base Management System
Dr. C.V. Suresh Babu
 
PPTX
Database management systems
Joel Briza
 
PPT
L8 components and properties of dbms
Rushdi Shams
 
PPT
Fundamentals of Database ppt ch01
Jotham Gadot
 
PPTX
Data base management system
Navneet Jingar
 
PPTX
Dbms Useful PPT
Krishna Bashyal
 
PPTX
introduction to database
Akif shexi
 
PPTX
Disadvantages of file management system (file processing systems)
raj upadhyay
 
PPTX
Presentation of DBMS (database management system) part 1
Junaid Nadeem
 
PPT
Mis assignment (database)
Muhammad Sultan Bhatti
 
PPTX
Database systems - Chapter 1
shahab3
 
PPTX
Dbms slides
rahulrathore725
 
PPTX
Chapter1
Jafar Nesargi
 
PDF
DBMS & Data Models - In Introduction
Rajeev Srivastava
 
PPTX
Components and Advantages of DBMS
Shubham Joon
 
PDF
Complete dbms notes
Tanya Makkar
 
PPT
Chapter2
Jafar Nesargi
 
PPTX
Relational database management system (rdbms) i
Ravinder Kamboj
 
Database management system
krishna partiwala
 
Dbms and sqlpptx
thesupermanreturns
 
Data Base Management System
Dr. C.V. Suresh Babu
 
Database management systems
Joel Briza
 
L8 components and properties of dbms
Rushdi Shams
 
Fundamentals of Database ppt ch01
Jotham Gadot
 
Data base management system
Navneet Jingar
 
Dbms Useful PPT
Krishna Bashyal
 
introduction to database
Akif shexi
 
Disadvantages of file management system (file processing systems)
raj upadhyay
 
Presentation of DBMS (database management system) part 1
Junaid Nadeem
 
Mis assignment (database)
Muhammad Sultan Bhatti
 
Database systems - Chapter 1
shahab3
 
Dbms slides
rahulrathore725
 
Chapter1
Jafar Nesargi
 
DBMS & Data Models - In Introduction
Rajeev Srivastava
 
Components and Advantages of DBMS
Shubham Joon
 
Complete dbms notes
Tanya Makkar
 
Chapter2
Jafar Nesargi
 
Relational database management system (rdbms) i
Ravinder Kamboj
 

Viewers also liked (20)

PDF
Database design & Normalization (1NF, 2NF, 3NF)
Jargalsaikhan Alyeksandr
 
PDF
Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)
Beat Signer
 
PPT
Database Design Process
mussawir20
 
PPTX
Database design process
Tayyab Hameed
 
PPT
7. Relational Database Design in DBMS
koolkampus
 
PPTX
Introduction to database
Pongsakorn U-chupala
 
PDF
Relational Database Management System
Free Open Source Software Technology Lab
 
PPTX
Importance of database design (1)
yhen06
 
PPT
Database design
Dhani Ahmad
 
PPS
Oracle Database Overview
honglee71
 
PPTX
Chapter 4 record storage and primary file organization
Jafar Nesargi
 
PPT
Week 3 database design
Fareez Borhanudin
 
PPTX
Database design
Joshua Yoon
 
PDF
ER Modelling
lubna19
 
PPTX
ERD and tables of database
muhammad bilal
 
PDF
DBMS Course Overview
Eunice Orozco
 
PDF
No SQL Databases
Anoop Nayak
 
PPTX
Presentation database about ERD
Elis Ervina
 
PPTX
Fundamentals of Database Design
Information Technology
 
Database design & Normalization (1NF, 2NF, 3NF)
Jargalsaikhan Alyeksandr
 
Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)
Beat Signer
 
Database Design Process
mussawir20
 
Database design process
Tayyab Hameed
 
7. Relational Database Design in DBMS
koolkampus
 
Introduction to database
Pongsakorn U-chupala
 
Relational Database Management System
Free Open Source Software Technology Lab
 
Importance of database design (1)
yhen06
 
Database design
Dhani Ahmad
 
Oracle Database Overview
honglee71
 
Chapter 4 record storage and primary file organization
Jafar Nesargi
 
Week 3 database design
Fareez Borhanudin
 
Database design
Joshua Yoon
 
ER Modelling
lubna19
 
ERD and tables of database
muhammad bilal
 
DBMS Course Overview
Eunice Orozco
 
No SQL Databases
Anoop Nayak
 
Presentation database about ERD
Elis Ervina
 
Fundamentals of Database Design
Information Technology
 
Ad

Similar to Intro to Database Design (20)

PPTX
Database Management System
Nishant Munjal
 
PPTX
Database Management System
Nishant Munjal
 
PDF
database management system - overview of entire dbms
vikramkagitapu
 
PDF
Bill howe 2_databases
Mahammad Valiyev
 
PPTX
Databases
Sajitha Pathirana
 
PPT
Advanced Database Management System_Introduction Slide.ppt
BikalAdhikari4
 
PPTX
Unit-I_dbms_TT_Final.pptx
UnknownUnknown252665
 
PPT
Introduction to Data Management
Cloudbells.com
 
PPSX
DISE - Database Concepts
Rasan Samarasinghe
 
PPT
Chapter 1 Fundamental Concepts of Database Management.ppt
ChardaneLabiste
 
PPTX
DIGITAL CONTENT for the help of students.pptx
aakashrathi20022016
 
PDF
Week01 Lecture Semester 1 2025 (Extra).pdf
fazlerabby04ruetcse
 
PPT
DBMS - Introduction
JOSEPHINE297640
 
PDF
Cse ii ii sem
MdwebdevDev
 
PPSX
oracle
Vilasita Nandamuri
 
PPT
This discussion about the dbms introduction
rishabsharma1509
 
PPTX
Databases and its representation
Ruhull
 
PPTX
✅ Session 1 - "Introduction to MySQL and Databases"
LogaRajeshwaranKarth
 
PDF
DBMS Unit 1 nice content please download it
kelpwadwise
 
Database Management System
Nishant Munjal
 
Database Management System
Nishant Munjal
 
database management system - overview of entire dbms
vikramkagitapu
 
Bill howe 2_databases
Mahammad Valiyev
 
Advanced Database Management System_Introduction Slide.ppt
BikalAdhikari4
 
Unit-I_dbms_TT_Final.pptx
UnknownUnknown252665
 
Introduction to Data Management
Cloudbells.com
 
DISE - Database Concepts
Rasan Samarasinghe
 
Chapter 1 Fundamental Concepts of Database Management.ppt
ChardaneLabiste
 
DIGITAL CONTENT for the help of students.pptx
aakashrathi20022016
 
Week01 Lecture Semester 1 2025 (Extra).pdf
fazlerabby04ruetcse
 
DBMS - Introduction
JOSEPHINE297640
 
Cse ii ii sem
MdwebdevDev
 
This discussion about the dbms introduction
rishabsharma1509
 
Databases and its representation
Ruhull
 
✅ Session 1 - "Introduction to MySQL and Databases"
LogaRajeshwaranKarth
 
DBMS Unit 1 nice content please download it
kelpwadwise
 
Ad

Recently uploaded (20)

PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 

Intro to Database Design

  • 2. Girl Develop It is here to provide affordable and accessible programs to learn software through mentorship and hands-on instruction. Presented for Girl Develop It
  • 3. Sondra Willhite software developer at scrubjay technology ● over ten years working on applications with SQL Server or Microsoft Access backend ● also have experience working as a BI analyst and a brief stint working help desk / systems admin https://www.linkedin.com/in/sondrawillhite Intros
  • 4. ● careers in database ● overview of some database models ● database management systems ● the relational database model ● structure ● keys ● referential integrity ● normalization Agenda
  • 6. ● database administrator (DBA) ● database developer ● business intelligence analyst ● database consultant ● data scientist Database Careers
  • 7. a systems administrator for database ● backups and restores ● database availability ● partitioning ● security ● performance Database Administrator
  • 9. software developers who specialize in database ● query database ● script schema changes ● stored procedures ● triggers and constraints ● database design ● performance and security Database Developer
  • 10. database management software application code drivers, libraries, frameworks, etc Application Stack Diagram Database Developer
  • 11. writes custom queries and reports, including data visualizations ● aka report writer/business analyst ● SQL language experts ● platforms ○ SSRS (SQL Server Reporting Services) ○ Crystal Reports ○ Tableau / Power BI ○ Microsoft Access ● performance and security BI Analyst
  • 12. Report Writer / Data Analyst’s View Entity Relationship Diagram Tables Keys Fields Indexes BI Analyst
  • 13. specialist DBAs/developers ● performance tuning ○ finding the bottlenecks ● security controls ○ groups and user controls ○ encryption (database, columns, rows) ● availability ○ mirroring and fail-over clusters ○ cloud systems (Azure, AWS) Database Consultant
  • 14. specialize in creating data models ● especially for predictive modeling ● mathematical background in statistics ● look beyond relational database models: ○ big data ○ graph data ○ warehouse data Data Scientist
  • 16. an unordered structured set of data Database Definition
  • 17. key : value pairs { vendor : abc manufacturing } Big Data
  • 18. nodes and edges Graph Data LA NYC 5 hours
  • 19. data structured along dimensions Warehouse Data sales
  • 20. tables and fields Relational Data Customer ID First Last 100011 Jane Doe Customer ID Item Cost 100011 Intro Database $25 100011 Data Modeling $30
  • 22. software that provides ways to store, modify and retrieve data ● Microsoft SQL Server ● Oracle ● IBM's DB2 ● PostgreSQL ● MySQL ● Microsoft Access Database Management Systems
  • 23. DBMS responsibilities ● data integrity ● data consistency ● multi-user access ● performance tools ● security and auditing ● backup and recovery ● extraction, transformation and load (ETL) ● business intelligence tools Database Management Systems
  • 24. ensuring "valid" data ● data types ensure that, say, date fields only store date values ● constraints and triggers allow for complex rules to be applied (eg, you cannot delete a client who has an upcoming appointment) Data Integrity
  • 25. ensuring that database is in a "valid" state ● record locks prevent "dirty reads/writes" ● commit and rollback mechanisms ensure transactions are either fully completed or fully rolled back Data Consistency
  • 26. • fine-grained record locking prevents queries from blocking others • indexes speed up lookups and joins by magnitudes • query optimizers find the fastest way to execute your query Performance
  • 27. ● tools to backup and restore to a point in time ○ log files make this possible ● database mirroring and fail-over clustering Backup and Recovery
  • 28. • multi-tiered security (server level, database level, column level; role-level and user-level) • logs can be queried for auditing (not directly) • tied in to data integrity Security and Auditing
  • 29. another acronym to describe data integrity and data consistency concepts in relational databases Atomicity Consistency Isolation Durability ACID
  • 30. a transaction must be all or nothing Atomicity
  • 31. invalid data causes transaction to roll back Consistency
  • 32. transactions are processed independently of other transactions Isolation
  • 33. once committed a transaction is permanent, even in the event of a system failure (eg, power outage) Durability
  • 34. example: consider the transaction of moving $100 from your checking to your savings account. steps: 1. confirm accounts valid 2. confirm checking account has available funds 3. debit checking account by $100 4. credit savings by $100 ACID
  • 35. Case 1: building loses power between step 3 and 4 because the transaction was not fully completed, atomicity ensures that the transaction is rolled back. ACID
  • 36. Case 2: memory corruption causes step 4 to credit checking by $100,000,000. because the transaction leaves the system is an inconsistent state, consistency will ensure that the transaction is rolled back (sorry!) ACID
  • 37. Case 3: at the same time that you transfer funds, your utility cashes a check that brings your checking balance to $50 isolation ensures that either the utility clears before your transaction (meaning your transaction will be rejected) or your transaction finishes first (meaning the utility check will bounce) ACID
  • 38. Case 4: building loses power right after your transaction completes durability guarantees that the transaction is permanent ACID
  • 40. data model invented by Edgar Codd in 1970 ● data is stored in tables and fields ● a set of rules, the normal forms, ensures that the universe of data will be preserved ● data can be read and updated using SQL ○ Structured Query Language ○ all DBMS’s recognize SQL, but some small differences exist between them Relational Data Model
  • 41. still has the lion's share of the database market • ACID makes it reliable for critical transactional systems • is an all-purpose database • older models - hierarchical, network - did not always preserve the universe of data • newer models are speciality models - they typically solve one problem but come at a high cost in other areas Relational Data Model
  • 42. two meanings of the "relation" in relational model • data points “related” to each other are stored in a table • tables are “related” to each other by special fields (keys) Relational Data Model
  • 43. storing the data as relations • eliminates redundancy • saves space • reduces mistakes (ties in to consistency) Relational Data Model
  • 44. appointments.xlsx cvClient Phone Addr Service Appt Date Anna 215-123-4567 123 City Lane Nails 5/1/2013 Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013 Anna 215-123-4576 123 Mock Ln. Hair 9/1/2013 Redundancy in Excel duplicating data not only wastes space but is error prone
  • 45. appointments.xlsx cvClient Phone Addr Service Appt Date Anna 215-123-4567 123 City Lane Nails 5/1/2013 Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013 Anna 215-123-4576 123 Mock Ln. Hair 9/1/2013 Client Service Date Anna Nails 5/1/2013 Nathan Hair 7/5/2013 Anna Hair 9/1/2013 Name Phone Address Anna 215-123-4567 123 City Lane Nathan 267-333-4444 999 Oak Blvd appointments (table)clients (table) Redundancy in Excel
  • 46. storing the data as relations • eliminates redundancy • saves space • reduces mistakes (ties in to consistency) • guarantees data completeness (the universe of data is preserved) Relational Data Model
  • 47. just means the set of data that we’re modeling Client Phone Addr Service Appt Date DOB Anna 215-123-4567 123 City Lane Nails 5/1/2013 8/14/1995 Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013 6/1/1998 Anna 215-123-4576 123 Mock Ln. Hair 9/1/2013 8/14/1995 in our example, the data points above is our “universe of data” The Universe of Data
  • 48. the process of splitting data into tables Client Phone Addr Service Appt Date DOB Anna 215-123-4567 123 City Lane Nails 5/1/2013 8/14/1995 Nathan 267-333-4444 999 Oak Blvd Hair 7/5/2013 6/1/1998 Anna 215-123-4576 123 Mock Ln. 8/14/1995 Anna Hair 9/1/2013 Decomposition
  • 49. if we follow the rules set forth by Edgar Codd in the normal forms when decomposing our data into tables, then we are guaranteed that we'll be able to reconstruct our universe of data using SQL Decomposition
  • 50. our spreadsheet decomposed into two tables Name Service Date Anna Nails 5/1/2013 Nathan Hair 7/5/2013 Anna Hair 9/1/2013 Name Phone Address DOB Anna 215-123-4567 123 City Lane 8/14/1995 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 Decomposition
  • 51. decomposing data means we need a mechanism to put it back together again Name Service Date Anna Nails 5/1/2013 Nathan Hair 7/5/2013 Anna Hair 9/1/2013 Name Phone Address DOB Anna 215-123- 4567 123 City Lane 8/14/1995 Nathan 267-333- 4444 999 Oak Blvd 6/1/1998 Decomposition
  • 52. in database we put back our universe by joining tables Service Date Nails 5/1/2013 Hair 7/5/2013 Hair 9/1/2013 Name Phone Address DOB Anna 215-123-4567 123 City Lane 8/14/1995 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 Bowtie = Join Joins
  • 53. Appointments (A)Clients (C) every row in table C is matched to a row in table A on some field - this special field is called a key Service Date Nails 5/1/2013 Hair 7/5/2013 Hair 9/1/2013 Name Phone Address DOB Anna 215-123- 4567 123 City Lane 8/14/1995 Nathan 267-333- 4444 999 Oak Blvd 6/1/1998 Joins
  • 54. Service Appt Date Nails 5/1/2013 Hair 7/5/2013 Hair 9/1/2013 Name Phone Address DOB Anna 215-123-4567 123 City Lane 8/14/1995 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 What is C A on DOB = Appt Date? Joins Clients Appointments
  • 55. Service Appt Date Nails 5/1/2013 Hair 7/5/2013 Hair 9/1/2013 Name Phone Address DOB Anna 215-123-4567 123 City Lane 8/14/1995 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 What is C A on DOB = Appt Date ? Clients Appointments first iteration: find every appointment with an Appt Date of 8/14/1995 Joins
  • 56. Service Appt Date Nails 5/1/2013 Hair 7/5/2013 Hair 9/1/2013 Name Phone Address DOB Anna 215-123-4567 123 City Lane 8/14/1995 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 Clients Appointments second iteration: find every appointment with an Appt Date of 6/1/1998 what is C A on DOB = Appt Date ? Joins
  • 57. Service Appt Date Name Phone Addr DOB Joins Service Appt Date Nails 5/1/2013 Hair 7/5/2013 Hair 9/1/2013 Name Phone Address DOB Anna 215-123-4567 123 City Lane 8/14/1995 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 Clients Appointments what is C A on DOB = Appt Date?
  • 58. Name Service Appt Date Anna Nails 5/1/2013 Nathan Hair 7/5/2013 Anna Hair 9/1/2013 Name Phone Address DOB Anna 215-123-4567 123 City Lane 8/14/1995 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 AppointmentsClients keys are not arbitrary - during decomposition, we always copy a special field to each table to serve as the key Joins
  • 59. C A on Name = Name Name Phone Addr DOB Name Service Appt Date Anna 215-123-4567 123 City Lane 8/14/1995 Anna Nails 5/1/2013 Nathan 267-333-4444 999 Oak Blvd 6/1/1998 Nathan Hair 7/5/2013 Anna 215-123-4567 123 City Lane 8/14/1995 Anna Hair 9/1/2013 the universe of data is reassembled by joining tables. Joins
  • 61. ● these fields are known as the Primary Key and the Foreign Key ● picking the Name as the field that relates Clients to Appointments was intuitive ● there’s rules on how to identify them tables are joined on special fields called keys Keys
  • 62. a field (or set of fields) that uniquely identify a row ➢ the minimal set of fields that the row is functionally dependent upon Primary Key
  • 63. the branch of mathematics providing the foundation for the relational database model Relational Algebra
  • 64. let A and B be sets of fields in a table, a functional dependency exists A --> B if for every row[A] in the table, we get back row[B] Functional Dependencies
  • 65. let A = (SSN), B = (First Name, Last Name) can we can say A --> B? SSN First Name Last Name 111-11-1111 Anna Jones 222-22-2222 Nathan Smith 111-11-1111 ? ? Functional Dependencies
  • 66. A = (Name, Service), B = (Appt Date) Does (Name, Service) --> (Appt Date)? Name Service Appt Date Anna Nails 5/1/2013 Nathan Hair 7/5/2013 Anna Nails ? if so, then it means our system will only let Anna make an appointment for nails on 5/1/2013 and no other date. Functional Dependencies
  • 67. (SSN) → (First Name, Last Name) is a business rule set by the US government database designers work with business folks to define functional dependencies. the more you work databases, the more you see the same business rules over and over functional dependencies reflect business rules Functional Dependencies
  • 68. the minimal set of fields that the row is functionally dependent upon Name Phone Address City Anna Jones 215-123-4567 123 City Lane Philadelphia Nathan Smith 267-333-4444 999 Oak Blvd Media Clients Primary Key Candidate
  • 69. Name Phone Address City Anna Jones 215-123-4567 123 City Lane Philadelphia Nathan Smith 267-333-4444 999 Oak Blvd Media Clients (Name) → (Phone, Address, City) Primary Key Candidate
  • 70. (Name) → (Phone, Address, City) ● every time I see a particular Name, I expect to get back the same address. ● does the reverse hold true? no!! FD’s are one-way functions the above assertion allows more than one person to live at the same address, but prevents one person from living multiple (primary) addresses. Primary Key Candidate
  • 71. Name Phone Address City Anna Jones 215-123-4567 123 City Lane Philadelphia Nathan Smith 267-333-4444 999 Oak Blvd Media Clients so is Name a good primary key candidate? Primary Key Candidate
  • 72. (Name) → (Phone, Address, City) meets the above Primary Key a field (or set of fields) that uniquely identify a row ➢ the minimal set of fields that the row is functionally dependent upon
  • 73. PK additional considerations • values in PK must be unique for each record in a table • only one PK per table allowed • it’s automatically indexed (for fast lookup) Primary Key
  • 74. Name Phone Address City Anna Jones 215-123-4567 123 City Lane Philadelphia Nathan Smith 267-333-4444 999 Oak Blvd Media Clients the values stored in a primary key must be unique within the table - names make poor PKs Primary Key Candidate
  • 75. Client_ID Name Phone Address City 1 Anna 215-123-4567 123 City Lane Philadelphia 2 Nathan 267-333-4444 999 Oak Blvd Media Clients for this reason, you’ll usually see an unique ID field used as a PK in most tables Primary Key
  • 76. primary keys are often “ID” fields in all tables ● this is done for convenience. ○ ID fields are usually autoincrement fields ● frameworks like CakePHP, Drupal, etc use this convention ● primary keys are automatically indexed, and numbers are faster to index ● other tables may refer back to another table’s PK, and it’s easier to bring in one field instead of multiple fields Primary Key
  • 77. Appt_ID Name Service Date 1000 Anna Nails 5/1/2013 1001 Nathan Hair 7/5/2013 1002 Anna Nails 9/1/2013 Primary Key Appointments (Appt_ID) → (Name, Service, Date)
  • 78. a field (or set of fields) that is a PK in some other table ● There can be multiple FKs in a table ● FK are how you designate that tables are related Foreign Reference Keys
  • 79. Appt_ID Name Service Date 1000 Anna Nails 5/1/2013 1001 Nathan Hair 7/5/2013 Client_ID Name Phone 1 Anna 215-123-4567 2 Nathan 267-333-4444 are there any foreign keys? Appointments Clients Foreign Keys
  • 80. Appt_ID Client_ID Service Date 1000 1 Nails 5/1/2013 1001 2 Hair 7/5/2013 Client_ID Name Phone 1 Anna 215-123-4567 2 Nathan 267-333-4444 a field that is a PK in some other table Appointments Clients Foreign Keys
  • 81. Appt_ID Client_ID Service Date 1000 1 Nails 5/1/2013 1001 2 Hair 7/5/2013 Client_ID Name Phone 1 Anna 215-123-4567 2 Nathan 267-333-4444 Appointments Clients Foreign Keys
  • 82. primary keys and foreign keys can be designated using the SQL language: CREATE TABLE Appointments ( Appt_ID INT PRIMARY KEY, Client_ID INT, FOREIGN KEY REFERENCES Clients(Client_ID) ON UPDATE CASCADE ON DELETE RESTRICT Service TEXT, Appt_Date DATE ) Implementing Keys
  • 83. primary keys and foreign keys can also be designated via the DBMS user interface: Snippet from PHPMyAdmin (MySQL) Implementing Keys
  • 85. a database has referential integrity if rules are in place that ensure that a FK can never point to a row that doesn’t exist Referential Integrity
  • 86. Referential Integrity Appt_ID Client_ID Service Date 1000 1 Nails 5/1/201 1001 2 Hair 7/5/2017 Client_ID Name Phone 1 Anna 215-123-4567 Appointments Clients ? orphaned row -->
  • 87. CASCADE cascade changes in PK to all referencing FKs RESTRICT (or NO ACTION) don’t allow changes to PK if there is a referencing FK Referential Constraints
  • 88. what will happen if we change Anna’s Client_ID from 1 to 1001? what will happen if we delete Anna’s record from the Clients table? Appointments Table Referential Constraints
  • 90. ● the universe of data is broken out into tables ● the data points in tables are related to one another ○ functional dependencies formalize how we recognize related data ● tables are joined to put the universe of data back together ● tables are joined on primary keys and foreign keys ● primary key values are unique within a table ● declaring foreign keys ensures that our database has referential integrity Recap
  • 92. a special data structure that speeds up data retrieval Indexes
  • 93. consider the task of trying to find all characters named Anna in War and Peace. • without an index, would pretty much have to read or scan the entire novel • a character index would require just a few page turns • an index on a field works in a similar fashion - using a special data structure called a b-tree Indexes
  • 94. Primary Key (PK) it’s automatically indexed (for fast lookup) - why? joins are expensive! every row from one table is matched to every row in another table. Indexes
  • 95. Appt_ID Client_ID Date 100001 1832 8/1/2008 100002 2432 7/5/2013 ... ... ... 1000000 43901 2/1/2017 Client_ID Name 1 Anna 2 Nathan ... ... 100000 Gia AppointmentsClients get all Clients who have had at least one Appointment to look for any appointments for client 1, we have to scan the Appointments table up to 1 million times Joins
  • 96. then we do the same for client 2 Appt_ID Client_ID Date 100001 1832 8/1/2008 100002 2432 7/5/2013 ... ... ... 1000000 43901 2/1/2017 Client_ID Name 1 Anna 2 Nathan ... ... 100000 Gia AppointmentsClients Joins
  • 97. by the time we reach client 100,000, we’ve scanned all 1 million rows of Appointments 100000 times! Appt_ID Client_ID Date 100001 1832 8/1/2008 100002 2432 7/5/2013 ... ... ... 1000000 43901 2/1/2017 Client_ID Name 1 Anna 2 Nathan ... ... 100000 Gia AppointmentsClients Joins
  • 98. the cost of joining two tables, M and N, is M x N in our previous example, that means 100,000 x 1,000,000 = 100,000,000,000 Joins
  • 99. ● computers are fast, but a M x N operation is still expensive! ● most queries will join multiple tables, not just two ● indexes reduce the time of this query to roughly M Joins
  • 100. to prevent a full table scan in table A for each row in table C, we use the index. Client_ID Name 1 Anna 2 Nathan ... ... Clients Does Client 1 have Appointment? Yes, found Client 1 at location …. Joins using Indexes
  • 101. the index has magical properties that allow it to find any piece of data in just a few lookups Joins using Indexes
  • 102. it’s secret is that unlike a database, the values in an index are ordered, so it knows which branch to look in A M F Joins using Indexes
  • 103. by asking the index, it now only takes 100000 lookups (okay, more like 300000). much better than 100,000,000,000! Client_ID Name 1 Anna 2 Nathan ... ... Clients Joins using Indexes
  • 104. like all good things, there’s a trade-off. ● while indexes speed up reads, they slow down writes. ● too many indexes can result in slower queries! ○ if the query analyzer can’t make sense of your indexes, it won’t indexes are critical for performance so why not just index every field? Indexes
  • 105. ● primary keys are automatically indexed by the DBMS o #1 reason why every table should always, always have a PK ● foreign keys ○ especially if you enforce referential integrity ● fields that will be queried over and over o name fields o phone number, if you lookup people by phone o state, if you produce mass mailings by state what fields should be indexed? Indexes
  • 107. there are three types of relations between tables one-to-one one-to-many many-to-many Relationship Cardinality
  • 108. each row in table A has precisely one match in table B One-to-One
  • 109. One-to-One Emp_ID Dept 1 Marketing 2 Operations ... ... Emp_ID SSN Salary 1 111111111 45000 2 222222222 35000 ... ... Employee Public Data Employee Secret Data
  • 110. One-to-One Emp_ID Dept 1 Marketing 2 Operations ... ... M HR Emp_ID SSN Salary 1 111111111 45000 2 222222222 35000 ... ... ? 333333333 50000 answer: M
  • 111. each row in A has 0 to many matches in B One-to-Many
  • 112. One-to-Many Client_ID Name 1 Anna 2 Nathan ... ... AppointmentsClients Appt_ID Client_ID Service Date 1000 1 Nails 5/1/2013 1001 2 Hair 7/5/2013 1002 1 Nails 9/1/2013 ...
  • 113. One-to-Many Client_ID Name 1 Anna 2 Nathan ... ... M AppointmentsClients Appt_ID Client_ID Service Date 1000 1 Nails 5/1/2013 1001 2 Hair 7/5/2013 1002 1 Nails 9/1/2013 ... N answer: N
  • 114. Inner Join returns all rows from both tables where there is a match
  • 115. each row in A has 0 to many matches in B each row in B has 0 to many matches in A to represent this relationship, a third table C is created (called the cross reference table) Many-to-Many
  • 116. Many-to-Many Clients Client_ID Name 1 Anna 2 Nathan Thing_ID Name 1000 Newspaper 2000 Baseball Things Client_ID Thing_ID 1 1000 2 2000 2 1000 Clients_Favorite_Things
  • 117. Many-to-Many Clients Client_ID Name 1 Anna 2 Nathan ... ... M Thing_ID Name 1000 Newspaper 2000 Baseball ... ... N Things Client_ID Thing_ID 1 1000 2 2000 2 1000 … ... L Clients_Favorite_Things answer: L
  • 118. either table can store the PK or FK: the designer must choose who gets what Emp_ID DOH SSN 1 5/1/2011 111-11-1111 2 8/17/2012 222-22-2222 Emp_ID First Last Title 1 April Smith Technician 2 Jamie Hawkins Marketing Manager One-to-One Implementation Employee Public Data Employee Secret Data
  • 119. the many side stores the FK pointing to the one side's PK Appt_ID Client_ID Service Date 100001 1 Nails 5/1/2013 100002 2 Hair 7/5/2013 100003 1 Hair 6/1/2013 Client_ID Name Phone 1 Anna 215-123-4567 2 Nathan 267-333-4444 Appointments Clients One-to-Many Implementation
  • 120. neither table can store the FK of the other - a third table represents the relationship, storing the PK of both tables Clients Client_ID Name 1 Anna 2 Nathan Thing_ID Name 1000 Newspaper 2000 Baseball Things Client_ID Thing_ID 1 1000 2 2000 2 1000 Clients_Favorite_Things Many-to-Many Implementation
  • 121. one of these relationships is unnecessary one-to-one one-to-many many-to-many Relationship Cardinality
  • 123. in a relational database, all fields must be assigned a datatype that the values will be saved as. Client Service Date Time Technician Price Anna Nails 5/1/2013 10:00 am 100 $30 Nathan Hair 7/5/2013 3:30 pm 200 $25 most of the time this a straightforward process. Text Varchar Date Time Integer Currency Data Types
  • 124. SSN Phone ZIP Comments Position 111-11-1111 (215) 111-1111 19102 Blah blah bla 1.849308339 222-22-2222 (215) 222-2222 19147 This is anoth 1.890223333 other times will require some careful thought and a judgement call text or number? storing scientific measurements? beware that "floats" may truncate your precision seems like a number, but what about Canada ? data could be truncated if text field is made too small. Data Types
  • 126. the normal forms are the specifications for how to split your fields into tables in such a way that ● eliminates redundancy ● prevents data anomalies ● functional dependencies are preserved, thereby enabling ● loss-less joins (the tables can be put back together to yield the precise universe of data) Normal Forms
  • 127. 1st Normal Form (1NF) 2nd Normal Form (2NF) 3rd Normal Form 4th Normal Form (4NF) Boyce-Codd Normal Form 5th Normal Form (5NF) Higher forms (academic) Normal Forms
  • 128. your database is considered to be normalized if it is in least 3NF • the goal of normalizing a database is to prevent data anomalies from occurring during insert, update or delete operations, and most 3NF tables are free of these anomalies. Normal Forms
  • 129. data in each field is atomic (cannot be decomposed into additional fields) First Normal Form (1NF)
  • 130. each field is atomic ● basically says a field cannot contain a table (or multiple values) EmpID Favorite Things 1 Mittens, Raindrops 2 Raindrops, Schnitzel 3 Doorbells, Mittens Not in 1NF First Normal Form (1NF)
  • 131. EmpID Favorite Things 1 Mittens, Raindrops 2 Raindrops, Schnitzel 3 Doorbells, Mittens how do we get this in 1NF? First Normal Form (1NF)
  • 132. decomposition into 1NF depends on relationship type ➔ one-to-one: make new fields ➔ one-to-many: make new table ➔ many-to-many: make two new tables First Normal Form (1NF)
  • 133. what is the relationship type between employee and favorite things? EmpID Favorite Things 1 Mittens, Raindrops 2 Raindrops, Schnitzel 3 Doorbells, Mittens First Normal Form (1NF)
  • 134. EmpID 1 2 ThingID Favorite Things 100 Mittens 200 Raindrops 300 Schnitzel 400 Doorbells EmpID ThingID 1 100 1 200 2 200 2 300 ➔ many-to-many: make two new tables First Normal Form (1NF)
  • 135. EmpID Name 1 Mary Smith 2 Todd T Burke depends... ○ do you want to search/sort by Last Name? ○ do want to be compliant with industry standard? ○ business rules determine what is considered “normal” EmpID First Name Last Name 1 Mary Smith 2 Todd Burke In 1NF? Definitely 1NF First Normal Form (1NF)
  • 136. in 1NF, and every field in a table is functionally dependent on a subset of the PK Second Normal Form (2NF)
  • 137. in 2NF, and every field in a table is functionally dependent on all fields in the PK Third Normal Form (3NF)
  • 138. the database is already in 1NF • every field in a table is functionally dependent on all fields in the PK Appt_ID Appt Date Client_ID Service Service Price 1001 2/23/13 1 Nails 20 1002 2/24/13 2 Hair 30 is this in 3NF? Third Normal Form (3NF)
  • 139. to determine if table is in 3NF • identify the primary key • determine if every field is dependent on PK Appt_ID Appt Date Client_ID Service Service Price 1001 2/23/13 1 Nails 20 1002 2/24/13 2 Hair 30 what’s the primary key? Third Normal Form (3NF)
  • 140. is every field dependent on Appt_ID? Appt_ID Appt Date Client_ID Service Service Price 1001 2/23/13 1 Nails 20 1002 2/24/13 2 Hair 30 (Appt_ID) -> (Date, Client, Service, Service Price)? Do the fields contain all the information, and only the information, needed to define an “appointment”? Third Normal Form (3NF)
  • 141. if we assert (Service) -> (Service Price), then not 3NF. Appt_ID Appt Date Client_ID Service Service Price 1001 2/23/13 1 Nails 20 1002 2/24/13 2 Hair 30 Third Normal Form (3NF)
  • 142. decomposition into 3NF: independent functional dependencies become new relations (tables) Third Normal Form (3NF)
  • 143. decomposition into 3NF depends on relationship type ➔ one-to-one: make new table ➔ one-to-many: make new table ➔ many-to-many: make two new tables Third Normal Form (3NF)
  • 144. treat the independent (Service) -> (Service Price) functional dependency as a new relation (table) Appt_ID Appt Date Client Service 1 2/23/13 Anna Nails 2 2/24/13 Nathan Hair Service Service Price Nails 20 Hair 30 Third Normal Form (3NF) Appointments Services
  • 145. ● data is stored in tables ● normal forms guide us in how to determine which fields belong in which tables ● databases in 3NF are free from most anomalies ● every table should be assigned a primary key ● tables are related to each other on primary keys and foreign keys ● three types of relationships - 1:1, 1:N, N:N ● a N:N relationship is represented by a cross-reference table ● declaring foreign keys ensures database has referential integrity ● two types of referential integrity constraints: cascade, restrict ● all fields that will be frequently qualified in a query should be indexed Recap