SlideShare a Scribd company logo
IN2243
Foundations of Computer
Science
Instructor
Dr. Muhammad Waqar
Chapter 13
File Structure
3
INTRODUCTION
Tests
‱ Files are stored on auxiliary or secondary storage devices. The two most
common forms of secondary storage are disk and tape. Files in secondary
storage can be both read from and written to.
‱ For our purposes, a file is a collection of data records in which each record
consists of one or more fields.
‱ When we design a file, the important issue is how we will retrieve
information (a specific record) from the file. Sometimes we need to process
records one after another, whereas sometimes we need to access a specific
record quickly without retrieving the preceding records. The access method
determines how records can be retrieved: sequentially or randomly.
4
A taxonomy of file structures
Tests
Sequential access
‱ If we need to access a file sequentially—that is, one record after another,
from beginning to end—we use a sequential file structure.
Random access
‱ If we need to access a specific record without having to retrieve all records
before it, we use a file structure that allows random access. Two file
structures allow this: indexed files and hashed files.
5
SEQUENTIAL FILES
Tests
‱ A sequential file is one in which records can only be accessed one after
another from beginning to end. Records are stored one after another in
auxiliary storage, such as tape or disk, and there is an EOF (end-of-file)
marker after the last record.
‱ The operating system has no information about the record addresses, it only
knows where the whole file is stored. The only thing known to the operating
system is that the records are sequential.
6
Pseudocode for processing records in a sequential file
Tests
7
Updating sequential files
Tests
‱ Sequential files must be updated periodically to reflect changes in
information. The updating process is very involved because all the records
need to be checked and updated (if necessary) sequentially.
8
Processing file updates
Tests
9
INDEXED FILES
Tests
‱ To access a record in a file randomly, we need to know the address of the
record.
‱ For example, suppose a customer wants to check their bank account. Neither
the customer nor the teller knows the address of the customer’s record. The
customer can only give the teller their account number (key). Here, an
indexed file can relate the account number (key) to the record address.
10
INDEXED FILES
Tests
‱ An indexed file is made of a data file, which is a sequential file, and an index.
‱ The index itself is a very small file with only two fields: the key of the
sequential file and the address of the corresponding record on the disk.
‱ The index is sorted based on the key values of the data files.
11
Accessing a record in INDEXED FILES
Tests
‱ The entire index file is loaded into main memory (the file is small and uses
little memory).
‱ The index entries are searched, using an efficient search algorithm such as a
binary search, to find the desired key.
‱ The address of the record is retrieved.
‱ Using the address, the data record is retrieved and passed to the user.
12
Inverted files
Tests
‱ One of the advantages of indexed files is that we can have more than one
index, each with a different key.
‱ For example, an employee file can be retrieved based on either social
security number or last name.
‱ This type of indexed file is usually called an inverted file.
13
HASHED FILES
Tests
‱ In an indexed file, the index maps the key to the address.
‱ A hashed file uses a mathematical function to accomplish this mapping.
‱ The user gives the key, the function maps the key to the address and passes
it to the operating system, and the record is retrieved.
14
Direct hashing
Tests
‱ In direct hashing, the key is the data file address without any algorithmic
manipulation.
‱ The file must therefore contain a record for every possible key. Although
situations suitable for direct hashing are limited, it can be very powerful
because it guarantees that there are no synonyms or collisions.
15
Modulo division hashing
Tests
‱ Also known as division remainder hashing, the modulo division method
divides the key by the file size and uses the remainder plus 1 for the address.
‱ This gives the simple hashing algorithm that follows, where list_size is the
number of elements in the file. The reason for adding a 1 to the mod
operation result is that our list starts with 1 instead of 0:
16
Modulo division hashing
Tests
‱ This algorithm works with any list size, a list size that is a prime number
produces fewer collisions than other list sizes. Therefore, whenever possible,
try to make the file size a prime number.
17
Digit extraction hashing
Tests
‱ Using digit extraction hashing, selected digits are extracted from the key and
used as the address.
‱ For example, using our six-digit employee number to hash to a three-digit
address (000–999), we could select the first, third, and fourth digits (from
the left) and use them as the address.
18
Collision
Tests
‱ Generally, the population of keys for a hashed list is greater than the number
of records in the data file.
‱ For example, if we have a file of 50 students for a class in which the students
are identified by the last four digits of their social security number, then
there are 200 possible keys for each element in the file (10000/50).
‱ Because there are many keys for each address in the file, there is a possibility
that more than one key will hash to the same address in the file. We call the
set of keys that hash to the same address in our list synonyms.
19
Collision
Tests
‱ If the actual data that we insert into our list contains two or more synonyms,
we will have collisions.
‱ A collision is the event that occurs when a hashing algorithm produces an
address for an insertion key, but that address is already occupied.
‱ The address produced by the hashing algorithm is known as the home
address.
‱ The part of the file that contains all the home addresses is known as the
prime area.
‱ When two keys collide at a home address, we must resolve the collision by
placing one of the keys and its data in another location, outside the prime
area.
20
Collision resolution
Tests
Open addressing
‱ The first collision resolution method, open addressing resolution, resolves
collisions in the prime area.
‱ When a collision occurs, the prime area addresses are searched for an open
or unoccupied record where the new data can be placed.
‱ One simple strategy for data that cannot be stored in the home address is to
store it in the next address (home address + 1).
21
Collision resolution
Tests
Linked list resolution
‱ A major disadvantage of open addressing is that each collision resolution
increases the probability of future collisions.
‱ This disadvantage is eliminated in another approach to collision resolution,
linked list resolution. In this method, the first record is stored in the home
address, but contains a pointer to the second record.
22
Collision resolution
Tests
Bucket hashing
‱ Another approach to handling the problem of collisions is to hash to buckets.
‱ A bucket is a node that can accommodate more than one record.
‱ The disadvantage of this method is that there may be a lot of wasted
(unoccupied) locations.
23
DIRECTORIES
Tests
‱ Directories are provided by most operating systems for organizing files.
‱ A directory performs the same function as a folder in a filing cabinet.
However, a directory in most operating systems is represented as a special
type of file that holds information about other files.
‱ A directory not only serves as a kind of index that tells the operating system
where files are located on an auxiliary storage device, but can also contain
other information about the files it contains, such as who has access to each
file, or the date when each file was created, accessed, or modified.
24
Directories in the UNIX operating system
Tests
25
Special directories
Tests
Root Directory
‱ The root directory is the highest level in the file system hierarchy.
‱ It is the root of the whole file structure, and therefore does not have a
parent directory.
‱ In a UNIX environment, the root directory always has several levels of
subdirectories. The root directory belongs to the system administrator and
can be changed only by the system administrator.
Home Directory
‱ We use our home directory when we first log into the system. This contains
any files we create while in it and may contain personal system files.
‱ Our home directory is also the beginning of our personal directory structure.
Each user has a home directory.
26
Special directories
Tests
Working directory
‱ The working directory (or current directory) is the directory we are ‘in’ at any
point in a user session.
‱ When we first log in, the working directory is our home directory. If we have
subdirectories, we will most likely move from our home directory to one or
more subdirectories as needed during a session.
‱ When we change directory, our working directory changes automatically.
Parent directory
‱ The parent directory is the directory immediately above the working
directory. When we are in our home directory, its parent is one of the system
directories.
27
Paths and pathnames
Tests
‱ Every directory and file in a file system must have a name. In a directory, we
will note that there are some files that have the same names as files in other
directories.
‱ It should be obvious, therefore, that we need more than just the filename to
identify them. To uniquely identify a file, therefore, we need to specify the
file’s path from the root directory to the file.
‱ The file’s path is specified by its absolute pathname, a list of all directories
separated by a slash character (/).
‱ This full or absolute pathname can get quite long. For that reason, UNIX also
provides a shorter pathname under certain circumstances, known as a
relative pathname, which is the path relative to the working directory.
28
TEXT VERSUS BINARY
Tests
‱ Two terms are used to categorize files: text files and binary files.
‱ A file stored on a storage device is a sequence of bits that can be interpreted
by an application program as a text file or a binary file.
29
Text files
Tests
‱ A text file is a file of characters. It cannot contain integers, floating-point
numbers, or any other data structures in their internal memory format.
‱ To store these data types, they must be converted to their character
equivalent formats.
‱ Let’s look at an example. When data (a file stream) is sent to the printer, the
printer takes eight bits, interprets them as a byte, and decodes them into the
encoding system of the printer (ASCII or EBCDIC).
‱ If the character belongs to the printable category, it will be printed,
otherwise some other activity takes place, such as printing a space.
‱ The printer takes the next eight bits and repeats the process. This is done
until the file stream is exhausted.
30
Binary files
Tests
‱ A binary file is a collection of data stored in the internal format of the
computer.
‱ In this definition, data can be an integer (including other data types
represented as unsigned integers, such as image, audio, or video), a floating-
point number, or any other structured data (except a file).
‱ Unlike text files, binary files contain data that is meaningful only if it is
properly interpreted by a program.
‱ If the data is textual, one byte is used to represent one character. But if the
data is numeric, two or more bytes are considered a data item.
‱ For example, assume we are using a personal computer that uses two bytes
to store an integer. In this case, when we read or write an integer, two bytes
are interpreted as one integer.
Thank You

More Related Content

Similar to File Strucutres and Access in Data Structures (20)

PPTX
File System.pptx
bcanawakadalcollege
 
PPT
Unit 3 file management
Kalai Selvi
 
PPT
Unit 3 chapter 1-file management
Kalai Selvi
 
PPT
File Management
Ramasubbu .P
 
PPTX
operating system notes for file managment.pptx
panditestmail
 
PPTX
File management
sangrampatil81
 
PPTX
File System Interface
chandinisanz
 
PPTX
UNIT7-FileMgmt.pptx
NavyaKumar22
 
PPTX
Chapter 12.pptx
AsmaaFaried1
 
PDF
File system in operating system e learning
Lavanya Sharma
 
PDF
oslectureset7.pdf
WaltonJrSmithTZ
 
PDF
File system
Mohd Arif
 
PPT
network and system admistration by Chapter 4.ppt
gadisaadamu101
 
PPT
Chapter 10 - File System Interface
Wayne Jones Jnr
 
PPTX
File Management
ramya marichamy
 
PDF
File Types in Data Structure
Prof Ansari
 
PPT
Ch10
tech2click
 
PPTX
OS Unit 4.pptx
GautamBorana
 
PPTX
Concept of computer files
Samuel Igbanogu
 
PPTX
8 File Management system project .pptx
anitabricks49
 
File System.pptx
bcanawakadalcollege
 
Unit 3 file management
Kalai Selvi
 
Unit 3 chapter 1-file management
Kalai Selvi
 
File Management
Ramasubbu .P
 
operating system notes for file managment.pptx
panditestmail
 
File management
sangrampatil81
 
File System Interface
chandinisanz
 
UNIT7-FileMgmt.pptx
NavyaKumar22
 
Chapter 12.pptx
AsmaaFaried1
 
File system in operating system e learning
Lavanya Sharma
 
oslectureset7.pdf
WaltonJrSmithTZ
 
File system
Mohd Arif
 
network and system admistration by Chapter 4.ppt
gadisaadamu101
 
Chapter 10 - File System Interface
Wayne Jones Jnr
 
File Management
ramya marichamy
 
File Types in Data Structure
Prof Ansari
 
Ch10
tech2click
 
OS Unit 4.pptx
GautamBorana
 
Concept of computer files
Samuel Igbanogu
 
8 File Management system project .pptx
anitabricks49
 

More from mwaslam2303 (7)

PPTX
Abstract Data Types (ADTs) in Data Structures
mwaslam2303
 
PPTX
File Structures and Access in Data Structures
mwaslam2303
 
PPTX
Abstract Data Types (ADTs) in Data Structures
mwaslam2303
 
PPTX
File Structures and File Access in Data Structures
mwaslam2303
 
PPTX
Abstract Data Types (ADTs) in Data Structures
mwaslam2303
 
PPTX
Introduction to Microsoft Word Documents
mwaslam2303
 
PPTX
Introduction to Microsoft Office Applications
mwaslam2303
 
Abstract Data Types (ADTs) in Data Structures
mwaslam2303
 
File Structures and Access in Data Structures
mwaslam2303
 
Abstract Data Types (ADTs) in Data Structures
mwaslam2303
 
File Structures and File Access in Data Structures
mwaslam2303
 
Abstract Data Types (ADTs) in Data Structures
mwaslam2303
 
Introduction to Microsoft Word Documents
mwaslam2303
 
Introduction to Microsoft Office Applications
mwaslam2303
 
Ad

Recently uploaded (20)

PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
PDF
Detailed manufacturing Engineering and technology notes
VIKKYsing
 
PPTX
Coding about python and MySQL connectivity
inderjitsingh1985as
 
PDF
SMART HOME AUTOMATION PPT BY - SHRESTH SUDHIR KOKNE
SHRESTHKOKNE
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
PPT
Tiles.ppt The purpose of a floor is to provide a level surface capable of sup...
manojaioe
 
PDF
Lecture Information Theory and CodingPart-1.pdf
msc9219
 
PDF
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PPTX
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
PPTX
Presentation on Foundation Design for Civil Engineers.pptx
KamalKhan563106
 
PPTX
Cyclic_Redundancy_Check_Presentation.pptx
alhjranyblalhmwdbdal
 
PDF
MRI Tool Kit E2I0500BC Plus Presentation
Ing. Ph. J. Daum GmbH & Co. KG
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PDF
1_ISO Certifications by Indian Industrial Standards Organisation.pdf
muhammad2010960
 
PPTX
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
PDF
Geothermal Heat Pump ppt-SHRESTH S KOKNE
SHRESTHKOKNE
 
PDF
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
PDF
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
PDF
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
Detailed manufacturing Engineering and technology notes
VIKKYsing
 
Coding about python and MySQL connectivity
inderjitsingh1985as
 
SMART HOME AUTOMATION PPT BY - SHRESTH SUDHIR KOKNE
SHRESTHKOKNE
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
Tiles.ppt The purpose of a floor is to provide a level surface capable of sup...
manojaioe
 
Lecture Information Theory and CodingPart-1.pdf
msc9219
 
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
Presentation on Foundation Design for Civil Engineers.pptx
KamalKhan563106
 
Cyclic_Redundancy_Check_Presentation.pptx
alhjranyblalhmwdbdal
 
MRI Tool Kit E2I0500BC Plus Presentation
Ing. Ph. J. Daum GmbH & Co. KG
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
1_ISO Certifications by Indian Industrial Standards Organisation.pdf
muhammad2010960
 
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
Geothermal Heat Pump ppt-SHRESTH S KOKNE
SHRESTHKOKNE
 
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
Ad

File Strucutres and Access in Data Structures

  • 3. 3 INTRODUCTION Tests ‱ Files are stored on auxiliary or secondary storage devices. The two most common forms of secondary storage are disk and tape. Files in secondary storage can be both read from and written to. ‱ For our purposes, a file is a collection of data records in which each record consists of one or more fields. ‱ When we design a file, the important issue is how we will retrieve information (a specific record) from the file. Sometimes we need to process records one after another, whereas sometimes we need to access a specific record quickly without retrieving the preceding records. The access method determines how records can be retrieved: sequentially or randomly.
  • 4. 4 A taxonomy of file structures Tests Sequential access ‱ If we need to access a file sequentially—that is, one record after another, from beginning to end—we use a sequential file structure. Random access ‱ If we need to access a specific record without having to retrieve all records before it, we use a file structure that allows random access. Two file structures allow this: indexed files and hashed files.
  • 5. 5 SEQUENTIAL FILES Tests ‱ A sequential file is one in which records can only be accessed one after another from beginning to end. Records are stored one after another in auxiliary storage, such as tape or disk, and there is an EOF (end-of-file) marker after the last record. ‱ The operating system has no information about the record addresses, it only knows where the whole file is stored. The only thing known to the operating system is that the records are sequential.
  • 6. 6 Pseudocode for processing records in a sequential file Tests
  • 7. 7 Updating sequential files Tests ‱ Sequential files must be updated periodically to reflect changes in information. The updating process is very involved because all the records need to be checked and updated (if necessary) sequentially.
  • 9. 9 INDEXED FILES Tests ‱ To access a record in a file randomly, we need to know the address of the record. ‱ For example, suppose a customer wants to check their bank account. Neither the customer nor the teller knows the address of the customer’s record. The customer can only give the teller their account number (key). Here, an indexed file can relate the account number (key) to the record address.
  • 10. 10 INDEXED FILES Tests ‱ An indexed file is made of a data file, which is a sequential file, and an index. ‱ The index itself is a very small file with only two fields: the key of the sequential file and the address of the corresponding record on the disk. ‱ The index is sorted based on the key values of the data files.
  • 11. 11 Accessing a record in INDEXED FILES Tests ‱ The entire index file is loaded into main memory (the file is small and uses little memory). ‱ The index entries are searched, using an efficient search algorithm such as a binary search, to find the desired key. ‱ The address of the record is retrieved. ‱ Using the address, the data record is retrieved and passed to the user.
  • 12. 12 Inverted files Tests ‱ One of the advantages of indexed files is that we can have more than one index, each with a different key. ‱ For example, an employee file can be retrieved based on either social security number or last name. ‱ This type of indexed file is usually called an inverted file.
  • 13. 13 HASHED FILES Tests ‱ In an indexed file, the index maps the key to the address. ‱ A hashed file uses a mathematical function to accomplish this mapping. ‱ The user gives the key, the function maps the key to the address and passes it to the operating system, and the record is retrieved.
  • 14. 14 Direct hashing Tests ‱ In direct hashing, the key is the data file address without any algorithmic manipulation. ‱ The file must therefore contain a record for every possible key. Although situations suitable for direct hashing are limited, it can be very powerful because it guarantees that there are no synonyms or collisions.
  • 15. 15 Modulo division hashing Tests ‱ Also known as division remainder hashing, the modulo division method divides the key by the file size and uses the remainder plus 1 for the address. ‱ This gives the simple hashing algorithm that follows, where list_size is the number of elements in the file. The reason for adding a 1 to the mod operation result is that our list starts with 1 instead of 0:
  • 16. 16 Modulo division hashing Tests ‱ This algorithm works with any list size, a list size that is a prime number produces fewer collisions than other list sizes. Therefore, whenever possible, try to make the file size a prime number.
  • 17. 17 Digit extraction hashing Tests ‱ Using digit extraction hashing, selected digits are extracted from the key and used as the address. ‱ For example, using our six-digit employee number to hash to a three-digit address (000–999), we could select the first, third, and fourth digits (from the left) and use them as the address.
  • 18. 18 Collision Tests ‱ Generally, the population of keys for a hashed list is greater than the number of records in the data file. ‱ For example, if we have a file of 50 students for a class in which the students are identified by the last four digits of their social security number, then there are 200 possible keys for each element in the file (10000/50). ‱ Because there are many keys for each address in the file, there is a possibility that more than one key will hash to the same address in the file. We call the set of keys that hash to the same address in our list synonyms.
  • 19. 19 Collision Tests ‱ If the actual data that we insert into our list contains two or more synonyms, we will have collisions. ‱ A collision is the event that occurs when a hashing algorithm produces an address for an insertion key, but that address is already occupied. ‱ The address produced by the hashing algorithm is known as the home address. ‱ The part of the file that contains all the home addresses is known as the prime area. ‱ When two keys collide at a home address, we must resolve the collision by placing one of the keys and its data in another location, outside the prime area.
  • 20. 20 Collision resolution Tests Open addressing ‱ The first collision resolution method, open addressing resolution, resolves collisions in the prime area. ‱ When a collision occurs, the prime area addresses are searched for an open or unoccupied record where the new data can be placed. ‱ One simple strategy for data that cannot be stored in the home address is to store it in the next address (home address + 1).
  • 21. 21 Collision resolution Tests Linked list resolution ‱ A major disadvantage of open addressing is that each collision resolution increases the probability of future collisions. ‱ This disadvantage is eliminated in another approach to collision resolution, linked list resolution. In this method, the first record is stored in the home address, but contains a pointer to the second record.
  • 22. 22 Collision resolution Tests Bucket hashing ‱ Another approach to handling the problem of collisions is to hash to buckets. ‱ A bucket is a node that can accommodate more than one record. ‱ The disadvantage of this method is that there may be a lot of wasted (unoccupied) locations.
  • 23. 23 DIRECTORIES Tests ‱ Directories are provided by most operating systems for organizing files. ‱ A directory performs the same function as a folder in a filing cabinet. However, a directory in most operating systems is represented as a special type of file that holds information about other files. ‱ A directory not only serves as a kind of index that tells the operating system where files are located on an auxiliary storage device, but can also contain other information about the files it contains, such as who has access to each file, or the date when each file was created, accessed, or modified.
  • 24. 24 Directories in the UNIX operating system Tests
  • 25. 25 Special directories Tests Root Directory ‱ The root directory is the highest level in the file system hierarchy. ‱ It is the root of the whole file structure, and therefore does not have a parent directory. ‱ In a UNIX environment, the root directory always has several levels of subdirectories. The root directory belongs to the system administrator and can be changed only by the system administrator. Home Directory ‱ We use our home directory when we first log into the system. This contains any files we create while in it and may contain personal system files. ‱ Our home directory is also the beginning of our personal directory structure. Each user has a home directory.
  • 26. 26 Special directories Tests Working directory ‱ The working directory (or current directory) is the directory we are ‘in’ at any point in a user session. ‱ When we first log in, the working directory is our home directory. If we have subdirectories, we will most likely move from our home directory to one or more subdirectories as needed during a session. ‱ When we change directory, our working directory changes automatically. Parent directory ‱ The parent directory is the directory immediately above the working directory. When we are in our home directory, its parent is one of the system directories.
  • 27. 27 Paths and pathnames Tests ‱ Every directory and file in a file system must have a name. In a directory, we will note that there are some files that have the same names as files in other directories. ‱ It should be obvious, therefore, that we need more than just the filename to identify them. To uniquely identify a file, therefore, we need to specify the file’s path from the root directory to the file. ‱ The file’s path is specified by its absolute pathname, a list of all directories separated by a slash character (/). ‱ This full or absolute pathname can get quite long. For that reason, UNIX also provides a shorter pathname under certain circumstances, known as a relative pathname, which is the path relative to the working directory.
  • 28. 28 TEXT VERSUS BINARY Tests ‱ Two terms are used to categorize files: text files and binary files. ‱ A file stored on a storage device is a sequence of bits that can be interpreted by an application program as a text file or a binary file.
  • 29. 29 Text files Tests ‱ A text file is a file of characters. It cannot contain integers, floating-point numbers, or any other data structures in their internal memory format. ‱ To store these data types, they must be converted to their character equivalent formats. ‱ Let’s look at an example. When data (a file stream) is sent to the printer, the printer takes eight bits, interprets them as a byte, and decodes them into the encoding system of the printer (ASCII or EBCDIC). ‱ If the character belongs to the printable category, it will be printed, otherwise some other activity takes place, such as printing a space. ‱ The printer takes the next eight bits and repeats the process. This is done until the file stream is exhausted.
  • 30. 30 Binary files Tests ‱ A binary file is a collection of data stored in the internal format of the computer. ‱ In this definition, data can be an integer (including other data types represented as unsigned integers, such as image, audio, or video), a floating- point number, or any other structured data (except a file). ‱ Unlike text files, binary files contain data that is meaningful only if it is properly interpreted by a program. ‱ If the data is textual, one byte is used to represent one character. But if the data is numeric, two or more bytes are considered a data item. ‱ For example, assume we are using a personal computer that uses two bytes to store an integer. In this case, when we read or write an integer, two bytes are interpreted as one integer.