Assembler

System Software

• components
– translator
• assembler
• compiler
• interpreter
– system manager
• operating system
– other utilities
• loader
• linker
• DBMS, editor, debugger, ...
• purpose of this course
– understand how to build system software
– understand how these components work

2

Issues in System Software

• not many in this area
– mature area
• advanced architectures complicates system software
– superscalar CPU
– memory model
– multiprocessor
• new applications
– embedded systems
– mobile/ubiquitous computing

3

Assembler Overview

• functions
– translate programs written in assembly language to machine code
• mnemonic code to machine code
• symbols to addresses
– handles
• constants
• literals
• addressing
• 32 bit constant or address
• 32 bit offset

4

Assembler Overview (cont’d)
• pass 1: loop until the end of the program
1. read in a line of assembly code
2. assign an address to this line
• increment N (word addressing or byte addressing)
3. save address values assigned to labels
• in symbol tables
4. process assembler directives
• constant declaration
• space reservation
• pass2: same loop
1. read in a line of code
2. translate op code
using op code table
3. change labels to address
using the symbol table
4. process assembler directives
5. produce object program

5

Data Structures for Assembler

add $t0, $t1, $t2 000000 01001 01010 01000 00000 100000

• op code table
– looked up for the translation of mnemonic code
• key: mnemonic code
• result: bits
– hashing is usually used
• once prepared, the table is not changed
• efficient lookup is desired
• since mnemonic code is predefined, the hashing function can
be tuned a priori
– the table may have the instruction format and length
• to decide where to put op code bits, operands bits, offset bits
• for variable instruction size
• used to calculate the address

6

Data Structures for Assembler (cont’d)
.text
.globl main
• symbol table main:
la $t0, array
– stored and looked up to assign lw $t1, count
address to labels lw $t2, ($t0)
loop:
• efficient insertion and retrieval lw $t3, 4($t0)
is needed ble $t3, $t2, loop2
• deletion does not occur move $t2, $t3

– difficulties in hashing loop2: add $t1, $t1, -1
add $t0, $t0, 4
• non random keys bnez $t1, loop
– problem
…
• the size varies widely ….
.data
array: .word 3, 5, 5, 1, 6, 7, …..
count: .word 15
string1: .asciiz “nmax = “

7

Symbol Table Construction

.text
.globl main symbol name value
main: main 0
la $t0, array
lw $t1, count loop 12
lw $t2, ($t0)
loop: loop2 24
lw $t3, 4($t0)
ble $t3, $t2, loop2 …
move $t2, $t3
array 408
loop2: add $t1, $t1, -1 count 468
add $t0, $t0, 4
bnez $t1, loop string1 472
… bad 478
….
.data
array: .word 3, 5, 5, 1, 6, 7, …..
count: .word 15
string1: .asciiz “nmax = “
bad: .word 7

8

Assembler Algorithm: pass1
begin
if starting address is given
LOCCTR = starting address;
else
LOCCTR = 0;
while OPCODE != END do ;; or EOF
begin
read a line from the code
if there is a label
if this label is in SYMTAB, then error
else insert (label, LOCCTR) into SYMTAB
search OPTAB for the op code
if found
LOCCTR += N ;; N is the length of this instruction (4 for MIPS)
else if this is an assembly directive
update LOCCTR as directed
else error
write line to intermediate file
end
program size = LOCCTR - starting address;
end

9

Assembler Algorithm: pass2
begin
read a line;
if op code = START then ;; .globl xxx for MIPS
write header record;
while op code != END do ;; or EOF
begin
search OPTAB for the op code;
if found
if the operand is a symbol then
replace it with an address using SYMTAB;
assemble the object code;
else if is a defined directive add $t0, $t1, $t2 =>
convert it to object code; 000000 01001 01010 01000 00000 100000
add object code to the text;
read next line;
end
write End record to the text;
output text;
end

10

Program Relocation
0 .
.
. jump to 1004 1004
. .
1076 5000 .
jump to 1004 .
. jump to 1004
.
6076

program is loaded at 0 program is loaded at 5000

• motivations for relocation
– a program may consists of several pieces of codes that are assembled
independently
– when a program is assembled, it is impossible to know the exact location
where the program starts

11

Program Relocation (cont’d)

• distances from the origin of a program do not change
– make the address relative to the origin
– provides loader with information about
• which address needs fixing
• length of address field
– the loader change those addresses as
• distance + start address of a program
– only absolute addresses need to be changed

12

Literals

• usage
– encoded as an operand (similar to the immediate in MIPS, but different)
• load $7, =X’0A7F’
– simple way to declare a constant
– assembler does
• declare a constant with a label
• use the label to use the value
• comparison with immediate
– literal is an assembler directive
• immediate is a machine recognizable data
– full word can be used for literals
• immediate: full word – (opcode, registers)
– values are obtained from data memory - slow
• immediate data is within the instruction itself

13

Literals (cont’d)

• literal pool
– assembler collects all the literals into one or more literal pools
– default location is at the end of the program
• for better code reading
– programmer can declare a place (LTORG)
• to use PC-relative addressing
• to keep data close to instruction
• optimization
– make one literal for the same value
• compare character string or value?
– x’454F46’ = c’EOF’
• value comparison needs evaluation
• literal table
– name(label), operand value, operand length, address in the table
– name and value are all used as a key

14

Literal Handling Algorithm

pass 1
at a recognition of a literal
search LITTAB by name
if found but different value, error
else if the same value, no action
else if not found insert a new literal (no address yet)
if the code is LTORG or END
allocate each literal assigning an address

pass 2
replace each literal with the address in the LITTAB
if these addresses are absolute,
prepare modification for relocation

15

Symbol Defining Statement

• MAXLEN EQU 4096
– makes program structure better
– easier to modify a single location
– easier to remember than numbers
– registers can be given meaningful names
– (maxlen = 4096) in MIPS
• assembler
– searches SYMTAB and replace the symbol with the value in the table
– resulting object code is the same as using the value instead of symbol
– remember that with 2 passes there is restriction
X EQU Y
Y EQU 100
• X cannot be defined in pass 1

16

Expressions

BUFFER: .space 4096 ; reserve 4096 bytes here
BUFEND: ; set current location to BUFFEND
(MAXLEN = BUFEND – BUFFER) ; calculate the size of the buffer

• allows simple arithmetic operations in symbol definition
• operands may have relative values for relocation
– relative values should be modified by the loader later
• we need to know which is relative
– symbol table needs a type field to discern absolute symbols from relative
symbols

17

Expression Rules

• basic
– constant is absolute
– address is relative
• using expressions
– expression with absolute arguments is absolute
– expression that has multiplication and division is absolute
– relative_1 - relative_2 is absolute
• dependencies on starting address are canceled out
– all the other expressions having relative terms are neither relative nor
absolute (error?)
• constant - relative
• relative_1 + relative_2
• 3 x relative_1

18

Program Blocks

source object code

block 0
block 0
block 1
assembled
block 2
block 0 block 1

block 1
block 2
block 2

19

Program Blocks (cont’d)

• motivation
– programmer’s view may be different from machine’s view
• affects only efficiency not functionality
– addressing can be simplified
• large data area can be moved to the end of code while source code places
it close to the instructions that use this data
• data structure and algorithm
– block table (name, block number, address, length)
– pass 1
• maintain separate LOCCTR for each block
– each label is assigned address relative to the start of the block that contains it
• SYMTAB stores block number for each symbol
• store starting address of each block in block table
– pass 2
• assign address to each symbol by adding the relative address to the block
starting address

20

Control Sections

• control section is a part of program that can be assembled independent of
other parts
– a large problem can be divided into many control sections
– each control section can be developed independently
– each control section can be modified independently
• symbols defined in other control sections
– called external
– assembler prepares those symbols
– loader & linker resolves the value of external symbols

21

Control Sections (cont’d)

• a table prepared by assembler
– define record
• name of symbol defined in this control section
• relative address of the symbol
– refer record
• name of external symbols
– modification record
• starting address of field to be modified
• length of this field
• name of external symbol
• loader
– for every external symbol
• find the relative address from the define record
• add the starting address of the control section where the symbol is defined
• modify the field

22

One-Pass Assembler

• problem
– forward reference: reference to symbols that are not defined yet
• why do we need one-pass assembler?
– fast
• useful for program development and testing
• university computing environment
• load-and-go assembler
– writes the object code on memory not on disk file
– since it is on memory it is easy to modify a part of object code

23

One-Pass Assembler (cont’d)

• one-pass assembler for load-and-go
– stores undefined symbols in the SYMTAB with the address of the field that
references this symbol
– when the symbol is defined later, look up the SYMTAB and modify the field
with correct address
• there may be many places to be modified
• what if object code is written on disk?
– bring back the text to memory
• efficiency of one-pass assembler cannot be justified
– make loader to modify the address at loading time
• modification record again
• optimization
– require all the data declaration be placed at the beginning of the program
• reduces reference resolution

24

Multi-Pass Assembler
• support forwarding reference even though it is bad for program readability

at 1, store in a table two tuples
(A, 1, B/2, 0)
1: one symbol is missing
1.(A = B/2) 0: no other symbol depends on A
2.(B = C-D) (B, *, , &LB)
.... *: don’t know how many symbols missing yet
8. C .....
9. D ..… LB: list of symbols that depend on B (now, there is only A in this list)
at 2,
insert (C,*, ,&LC), (D,*, ,&LD)
LC and LD contains only B
modify (B,*, ,&LB) as (B,2,C-D,&LB)
after 8
from LC, B is found
change 2 to 1 in the B tuple meaning one symbol remains to be defined
after 9
from LD, B is found
now evaluate B with defined C, D values
since B is done
from LB, A is found
now A can be evaluated
25

Assembler

More Related Content

What's hot (20)

Similar to Assembler (20)

More from Mohd Arif (20)

Recently uploaded (20)

Assembler