SlideShare a Scribd company logo
Let’s write a PDF file
A simple walk-through to learn
the basics of the PDF format
(at your rhythm)
PDF = Portable Document Format
r2
Ange Albertini
reverse engineering &
visual documentation
@angealbertini
ange@corkami.com
http://www.corkami.com
Goal:
write a “Hello World” in PDF
PDF is text-based,
with some binary in specific cases.
But not in this example,
so just open a text editor.
Statements are separated
by white space.
(any extra white space is ignored)
Any of these:
0x00 Null 0x0C Form Feed
0x09 Tab 0x0D Carriage Return
0x0A Line feed 0x20 Space
(yes, you can mix EOL style :( )
Delimiters don’t require
white space before.
( ) < > [ ] { } /
_
Let’s start!
%PDF-_
A PDF starts with a %PDF-? signature
followed by a version number.
1.0 <= version number <= 1.7
(it doesn’t really matter here)
%PDF-1.3
_
Ok, we have a valid signature ☺
%PDF-1.3
%_
A comment starts with %
until the end of the line.
%PDF-1.3
%file body
_
After the signature,
comes the file body.
(we’ll see about it later)
%PDF-1.3
%file body
xref
_ After the file body,
comes the cross reference table.
It starts with the xref keyword, on a separated line.
%PDF-1.3
%file body
xref
%xref table here
_
After the xref keyword,
comes the actual table.
(we’ll see about it later)
%PDF-1.3
%file body
xref
%xref table here
trailer_
After the table,
comes the trailer...
It starts with a trailer keyword.
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
_
(we’ll see that later too…)
...and its contents.
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
startxref
_
(with startxref)
Then, a pointer
to the xref table...
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
startxref
%xref pointer
_
(later, too...)
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
startxref
%xref pointer
%%EOF_
...an %%EOF marker.
Lastly, to mark
the end of the file...
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
startxref
%xref pointer
%%EOF
Easy ;)
That’s the overall layout
of a PDF document!
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
startxref
%xref pointer
%%EOF
Now, we just need
to fill in the rest :)
Study time
Def: name objects
A.k.a. “strings starting with a slash”
/Name
A slash, then an alphanumeric string
(no whitespace)
Case sensitive
/Name != /name
Names with incorrect case are just ignored
(no error is triggered)
Def: dictionary object
Sequence of keys and values
(no delimiter in between)
enclosed in << and >>
sets each key to value
Syntax
<<
key value key value
[key value]*…
>>
Keys are always name objects
<< /Index 1>> sets /Index to 1
<< Index 1 >> is invalid
(the key is not a name)
Dictionaries can have any length
<< /Index 1
/Count /Whatever >>
sets /Index to 1
and /Count to /Whatever
Extra white space is ignored(as usual)
<< /Index 1
/Count
/Whatever >>
is equivalent to
<< /Index 1 /Count /Whatever >>
Dictionaries can be nested.
<< /MyDict << >> >>
sets /MyDict to << >> (empty dictionary)
White space before delimiters
is not required.
<< /Index 1 /MyDict << >> >>
equivalent to
<</Index 1/MyDict<<>>>>
Def: indirect object
an object number (>0), a generation number (0*)
the obj keyword
the object content
the endobj keyword
* 99% of the time
Example
1 0 obj
3
endobj
is object #1, generation 0, containing “3”
Def: object reference
object number, object generation, R
number number R
ex: 1 0 R
Object reference
Refers to an indirect object as a value
ex: << /Root 1 0 R >> refers to
object number 1 generation 0
as the /Root
Used only as values
in a dictionary
<< /Root 1 0 R >> is OK.
<< 1 0 R /Catalog>> isn’t.
Be careful with the syntax!
“1 0 3” is a sequence of 3 numbers 1 0 3
“1 0 R” is a single reference to an object
number 1 generation 0
Def: file body
sequence of indirect objects
object order doesn’t matter
Example
1 0 obj 3 endobj
2 0 obj << /Index 1 >> endobj
defines 2 objects with different contents
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
startxref
%xref pointer
%%EOF
Remember this?
A PDF document is defined
by a tree of objects.
%PDF-1.3
%file body
xref
%xref table here
trailer
%trailer contents
startxref
%xref pointer
%%EOF
Now, let’s start!
%PDF-1.3
%file body
xref
%xref table here
trailer
<< _ >>
startxref
%xref pointer
%%EOF
The trailer is a dictionary.
%PDF-1.3
%file body
xref
%xref table here
trailer
<< /Root_ >>
startxref
%xref pointer
%%EOF
It defines a /Root name...
%PDF-1.3
%file body
xref
%xref table here
trailer
<< /Root 1 0 R_>>
startxref
%xref pointer
%%EOF
...that refers to an object...
%PDF-1.3
%file body
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
(like all the the other objects)
...that will be in
the file body.
Recap:
the trailer is a dictionary
that refers to a root object.
%PDF-1.3
_
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Let’s create our
first object...
%PDF-1.3
1 0 obj
_
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
…(with the standard
object declaration)...
%PDF-1.3
1 0 obj
<< _ >>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
(like most objects)
...that contains a
dictionary.
%PDF-1.3
1 0 obj
<< /Type_ >>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...and its /Type is...
%PDF-1.3
1 0 obj
<< /Type /Catalog_ >>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...defined as /Catalog...
%PDF-1.3
1 0 obj
<< /Type /Catalog _ >>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
the /Root object also
refers to the page tree...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages_ >>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...via a /Pages name...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R_>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...that refers to
another object...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
_
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...which we’ll create.
Recap:
object 1 is a catalog, and
refers to a Pages object.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
_
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Let’s create object 2.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
_
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
The usual declaration.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< _
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
It’s a dictionary too.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages_
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
The pages’ object
/Type has to be
defined as … /Pages ☺
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids_
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
This object defines
its children via /Kids...
Def: array
enclosed in [ ]
values separated by whitespace
ex: [1 2 3 4] is an array of 4 integers 1 2 3 4
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ _ ]
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...which is an array...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R_]
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
… of references
to each page object.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
_ >>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
One last step...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1_>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...the number of kids
has to be set in /Count...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...and now
object 2 is complete!
Recap:
object 2 is /Pages;
it defines Kids + Count
(pages of the document).
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
_
We can add our only Kid...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
_
endobj
…(a single page)...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< _ >>
endobj
… a dictionary...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type_ >>
endobj
… defining a /Type...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page_ >>
endobj
… as /Page.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent_ >>
endobj
This grateful kid
properly recognizes
its own parent...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R_>>
endobj
… as you would
expect ☺
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
_
>>
endobj
Our page requires
resources.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources_
>>
endobj
Let’s add them...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << _ >>
>>
endobj
...as a dictionary:
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font_ >>
>>
endobj
In this case, fonts...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << _ >> >>
>>
endobj
...as a dictionary.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
_
>> >>
>>
endobj
We define one font...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1_
>> >>
>>
endobj
...by giving it a name...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1 << _ >>
>> >>
>>
endobj
...and setting its
parameters:
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1 << /Type_ >>
>> >>
>>
endobj
its type is ...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1 << /Type /Font_ >>
>> >>
>>
endobj
… font ☺
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1 << /Type /Font /Subtype_ >>
>> >>
>>
endobj
Its font type is...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1 << /Type /Font /Subtype /Type1_
>> >> >>
>>
endobj
…(Adobe) Type1...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1 << /Type /Font /Subtype /Type1
/BaseFont_>> >> >>
>>
endobj
...and its name is...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font <<
/F1 << /Type /Font /Subtype /Type1
/BaseFont /Arial_>> >> >>
>>
endobj
.../Arial.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 << /Type /Font
/Subtype /Type1 /BaseFont /Arial >> >> >>
_
>>
endobj
One thing is missing
in our page...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 << /Type /Font
/Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents_
>>
endobj
The actual page
contents...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 << /Type /Font
/Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R_
>>
endobj
… as a reference
to another object.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 << /Type /Font
/Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
That’s all for
our page object.
Recap:
object 3 defines a /Page,
its /Parent, /Resources (fonts)
and its /Contents is
in another object.(thank you Mario!)
Study time
Def: stream objects
So far, everything is text.
How do you store binary data (images,...) ?
1 0 obj
…
endobj
Stream objects are objects.
They start and they end like any other object:
Ex: .
Stream objects contain a stream.
between stream and endstream keywords
1 0 obj
stream
<stream content>
endstream
endobj
Streams can contain anything
Yes, really!
Even binary, other file formats...
(except the endstream keyword)
Stream parameters
are stored before the stream.
a dictionary
after obj, before stream
required: stream length
optional: compression algorithm, etc…
1 0 obj
<< /Length 10 >>
stream
0123456789
endstream
endobj
Example
_
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
_
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
We create
a /Content object...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
stream
_
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
...that is a stream
object...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Study time
Page contents syntax
parameters sequence then operator
ex: param1 param2 operator
4 0 obj
stream
_
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
Text objects are delimited
by BT and ET...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
stream
BT
_
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
...(BeginText & EndText).
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
stream
BT
Tf_
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
We need to set a font,
with Tf.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
stream
BT
_ Tf
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
It takes 2 parameters:
a font name...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
stream
BT
/F1_ Tf
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
...(from the page’s
resources)...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100_Tf
ET
endstream
endobj
...and a font size.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
_
ET
endstream
endobj
We move the cursor...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
Td_
ET
endstream
endobj
...with the Td operator...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
_ Td
ET
endstream
endobj
...that takes 2 parameters...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
10 400_Td
ET
endstream
endobj
...x and y coordinates.
(default page size: 612x792)
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Study time
Def: literal strings
enclosed in parentheses
Ex: (Hi Mum)
Can contain parentheses
(Hello() World((()
Can contain white space
( Hello
World !
)
Standard escaping is
supported
(Hello 
World rn)
Escaping is in octal
(Hell157 World)
4 0 obj
stream
BT
/F1 100 Tf
10 400 Td
_
ET
endstream
endobj
Showing a text string...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
10 400 Td
Tj_
ET
endstream
endobj
...is done with the Tj
operator...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
10 400 Td
_ Tj
ET
endstream
endobj
...that takes a single
parameter...
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
10 400 Td
(_) Tj
ET
endstream
endobj
...a literal string.
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
10 400 Td
(Hello World_) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Our contents stream
is complete...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
_
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
One last thing...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
<< _ >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...we need to set
its parameters...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
<< /Length_ >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
… the stream length...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
<< /Length 44_>>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
…including white space
(new lines characters…).
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Our stream parameters
are finished...
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...so our page contents
object is finished.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
Recap:
obj 4 is a stream object with a set length,
defining the page’s contents:
declare text, set a font and size,
move cursor, display text.
The whole document is defined.
We need to polish the structure.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Our PDF defines 4 objects,
starting at index 1...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...but PDFs always have an
object 0, that is null...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
%xref table here
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...so 5 objects, starting at 0.
Warning: offsets & EOLs
We have to define offsets,
which are affected by the EOL conventions:
1 char under Linux/Mac, 2 under Windows.
(I use 1 char newlines character here)
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Let’s edit the XREF table!
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
The next line defines the
starting index...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...and the number of objects.
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Then, one line per object...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...following the
xxxxxxxxxx yyyyy a format
(10 digits, 5 digits, 1 letter).
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
The first parameter is the offset
(in decimal) of the object...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...(for the null object, it’s 0).
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 _
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Then, the generation number
(that is almost always 0)...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...but for object 0, it’s 65535.
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Then, a letter, to tell if this entry
is free (f) or in use (n).
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Lastly, each line should take 20
bytes, including EOL...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f _
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...so add a trailing space.
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Next line (the first real object)...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
…object offset, in decimal...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
…generation number...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
…and declare the object index
in use (n)...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n _
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
…and the trailing space
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
_
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
Do the same with the other
objects...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
00000 n
00000 n
00000 n _
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
…knowing that all lines
will end with “ 00000 n ”,...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n _
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
...set all offsets.
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
The cross-reference table
is finished.
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R >>
startxref
%xref pointer
%%EOF
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R >>
startxref
_
%%EOF
We set the startxref
pointer...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R >>
startxref
364_
%%EOF
...as xref’s offset, in decimal
(no prepending 0s).
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R >>
startxref
364
%%EOF
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R _ >>
startxref
364
%%EOF
We also need to update the
trailer dictionary...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R /Size_ >>
startxref
364
%%EOF
...with the number of
objects...
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R /Size 5_>>
startxref
364
%%EOF
… in the PDF
(including object 0).
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R /Size 5 >>
startxref
364
%%EOF
Our PDF is now complete.
%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages
/Kids [ 3 0 R ]
/Count 1 >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R
/Resources << /Font << /F1 <<
/Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >>
/Contents 4 0 R
>>
endobj
4 0 obj
<< /Length 44 >>
stream
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000060 00000 n
0000000120 00000 n
0000000269 00000 n
trailer
<< /Root 1 0 R /Size 5 >>
startxref
364
%%EOF
Disclaimer:
this is a minimal PDF.
Most PDF documents are much bigger,
and contain many more elements.
Our PDF:
528 bytes
4 objects
text only
A standard generated “Hello World”:
15 kiloBytes
20 objects
text and binary (embedded fonts…)
No need to type them yourself!
Hint: use “mutool clean”
to fix offsets and lengths.
http://www.mupdf.com/
⇒ mutool version
Slightly different content,
but same rendering.
%PDF-1.3
%%μῦ
1 0 obj
<</Type/Catalog/Pages 2 0 R>>
endobj
2 0 obj
<</Type/Pages/Kids[3 0 R]/Count 1>>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources 5 0 R/Contents 4 0 R>>
endobj
4 0 obj
<</Length 49>>
stream
q
BT
/F1 100 Tf
10 400 Td
(Hello World!) Tj
ET
Q
endstream
endobj
5 0 obj
<</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Arial>>>>>>
endobj
xref
0 6
0000000000 65536 f
0000000018 00000 n
0000000064 00000 n
0000000116 00000 n
0000000191 00000 n
0000000288 00000 n
trailer
<</Size 6/Root 1 0 R>>
startxref
364
%%EOF
Hint: you can directly extract
the PDF sources.
use “pdftotext --layout” on the slide deck
http://www.foolabs.com/xpdf/home.html
One more thing...
This one is important for self study.
Def: stream filters
streams can be encoded and/or compressed
algorithms can be cascaded
ex: compression, then ASCII encoding
New stream parameter:
/Filter
ex: encode the stream in ASCII
1 0 obj
<< /Length 12 >>
stream
Hello World!
endstream
endobj
1 0 obj
<< /Length 24 /Filter /ASCIIHexDecode>>
stream
48656C6C6F20576F726C6421
endstream
endobj
⇔
Ex: compression
(deflate = ZIP compression)
1 0 obj
<< /Length 12 >>
stream
Hello World!
endstream
endobj
1 0 obj
<< /Length 20 /Filter /FlateDecode>>
stream
x£¾H═╔╔¤/╩IQ♦ ∟I♦>
endstream
endobj
⇔
Filters can be cascaded.
Ex: compressed, then encoded in ASCII
1 0 obj
<< /Length 12 >>
stream
Hello World!
endstream
endobj
1 0 obj
<< /Length 40 /Filter [/ASCIIHexDecode /FlateDecode] >>
stream
789CF348CDC9C95708CF2FCA495104001C49043E
endstream
endobj
⇔
Hint: “mutool clean -d”
to remove any stream filter.
(if you want to explore PDFs by yourself)
http://www.mupdf.com/
Want more?
pdf101.corkami.com
Questions?
(you can download this poster at http://pics.corkami.com)
ACK
@Doegox @ChrisJohnRiley
@PDFKungFoo
To be
continued...?
https://leanpub.com/binaryisbeautiful
Let’s write
a PDF file
corkami.com
@angealbertini
Hail to the king, baby!
r2

More Related Content

PDF
Pdf secrets v2
Ange Albertini
 
PDF
Advanced Pdf Tricks
Ange Albertini
 
PDF
PDF secrets - hiding & revealing secrets in PDF documents
Ange Albertini
 
PDF
when AES(☢) = ☠ --- a crypto-binary magic trick
Ange Albertini
 
PDF
An overview of potential leaks via PDF
Ange Albertini
 
PDF
PDF: myths vs facts
Ange Albertini
 
PDF
Trusting files (and their formats)
Ange Albertini
 
PDF
Aspects of software naturalness through the generation of IdentifierNames
Oleksandr Zaitsev
 
Pdf secrets v2
Ange Albertini
 
Advanced Pdf Tricks
Ange Albertini
 
PDF secrets - hiding & revealing secrets in PDF documents
Ange Albertini
 
when AES(☢) = ☠ --- a crypto-binary magic trick
Ange Albertini
 
An overview of potential leaks via PDF
Ange Albertini
 
PDF: myths vs facts
Ange Albertini
 
Trusting files (and their formats)
Ange Albertini
 
Aspects of software naturalness through the generation of IdentifierNames
Oleksandr Zaitsev
 

What's hot (19)

PDF
C,c++ interview q&a
Kumaran K
 
PDF
Caring for file formats
Ange Albertini
 
PPT
Introduction to Python
C. ASWINI
 
PDF
Understand unicode & utf8 in perl (2)
Jerome Eteve
 
PPTX
introduction to python
Jincy Nelson
 
PPT
Stream Based Input Output
Bharat17485
 
PPTX
Learn Python The Hard Way Presentation
Amira ElSharkawy
 
PPTX
Programming in C
sujathavvv
 
PDF
File Handling in C Programming
RavindraSalunke3
 
PPSX
Programming with Python
Rasan Samarasinghe
 
PDF
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...
Kamiya Toshihiro
 
PDF
Improving file formats
Ange Albertini
 
PPT
C tutorial
Khan Rahimeen
 
PPTX
Understanding F# Workflows
mdlm
 
PDF
C# Language Overview Part I
Doncho Minkov
 
PDF
Clone detection in Python
Valerio Maggio
 
PPTX
PHP Basics
Henry Osborne
 
PDF
Python: an introduction for PHP webdevelopers
Glenn De Backer
 
C,c++ interview q&a
Kumaran K
 
Caring for file formats
Ange Albertini
 
Introduction to Python
C. ASWINI
 
Understand unicode & utf8 in perl (2)
Jerome Eteve
 
introduction to python
Jincy Nelson
 
Stream Based Input Output
Bharat17485
 
Learn Python The Hard Way Presentation
Amira ElSharkawy
 
Programming in C
sujathavvv
 
File Handling in C Programming
RavindraSalunke3
 
Programming with Python
Rasan Samarasinghe
 
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...
Kamiya Toshihiro
 
Improving file formats
Ange Albertini
 
C tutorial
Khan Rahimeen
 
Understanding F# Workflows
mdlm
 
C# Language Overview Part I
Doncho Minkov
 
Clone detection in Python
Valerio Maggio
 
PHP Basics
Henry Osborne
 
Python: an introduction for PHP webdevelopers
Glenn De Backer
 
Ad

Similar to Let's write a PDF file (20)

PDF
Making Sense of Twig
Brandon Kelly
 
PDF
03 HTML #burningkeyboards
Denis Ristic
 
PPT
Apache Velocity
Bhavya Siddappa
 
PPT
Apache Velocity
yesprakash
 
PPT
Os Bubna
oscon2007
 
ODP
Design Patterns in Ruby
Aleksander Dąbrowski
 
PDF
Mixed Effects Models - Descriptive Statistics
Scott Fraundorf
 
PDF
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
KEY
You're Doing It Wrong
bostonrb
 
KEY
You're Doing It Wrong
bostonrb
 
PPT
Xpath
Manav Prasad
 
PDF
Conf orm - explain
Louise Grandjonc
 
PPT
Bioinformatica 27-10-2011-p4-files
Prof. Wim Van Criekinge
 
PDF
Milot Shala - C++ (OSCAL2014)
Open Labs Albania
 
PDF
Html,javascript & css
Predhin Sapru
 
PDF
Gráficas en python
Jhon Valle
 
PPT
Html ppt
sanjay joshi
 
PDF
Words in Code
Pete Goodliffe
 
KEY
Rails vu d'un Javaiste
Christian Blavier
 
Making Sense of Twig
Brandon Kelly
 
03 HTML #burningkeyboards
Denis Ristic
 
Apache Velocity
Bhavya Siddappa
 
Apache Velocity
yesprakash
 
Os Bubna
oscon2007
 
Design Patterns in Ruby
Aleksander Dąbrowski
 
Mixed Effects Models - Descriptive Statistics
Scott Fraundorf
 
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
You're Doing It Wrong
bostonrb
 
You're Doing It Wrong
bostonrb
 
Conf orm - explain
Louise Grandjonc
 
Bioinformatica 27-10-2011-p4-files
Prof. Wim Van Criekinge
 
Milot Shala - C++ (OSCAL2014)
Open Labs Albania
 
Html,javascript & css
Predhin Sapru
 
Gráficas en python
Jhon Valle
 
Html ppt
sanjay joshi
 
Words in Code
Pete Goodliffe
 
Rails vu d'un Javaiste
Christian Blavier
 
Ad

More from Ange Albertini (20)

PDF
Overview of file type identifiers (HackLu)
Ange Albertini
 
PDF
A question of time - Troopers 2024 Keynote
Ange Albertini
 
PDF
Technical challenges with file formats
Ange Albertini
 
PDF
Relations between archive formats
Ange Albertini
 
PDF
Abusing archive file formats
Ange Albertini
 
PDF
TimeCryption
Ange Albertini
 
PDF
You are *not* an idiot
Ange Albertini
 
PDF
KILL MD5
Ange Albertini
 
PDF
No more dumb hex!
Ange Albertini
 
PDF
Beyond your studies
Ange Albertini
 
PDF
An introduction to inkscape
Ange Albertini
 
PDF
The challenges of file formats
Ange Albertini
 
PDF
Exploiting hash collisions
Ange Albertini
 
PDF
Infosec & failures
Ange Albertini
 
PDF
Connecting communities
Ange Albertini
 
PDF
TASBot - the perfectionist
Ange Albertini
 
PDF
Hacks in video games
Ange Albertini
 
PDF
Funky file formats - 31c3
Ange Albertini
 
PDF
Preserving arcade games - 31c3
Ange Albertini
 
PDF
Preserving arcade games
Ange Albertini
 
Overview of file type identifiers (HackLu)
Ange Albertini
 
A question of time - Troopers 2024 Keynote
Ange Albertini
 
Technical challenges with file formats
Ange Albertini
 
Relations between archive formats
Ange Albertini
 
Abusing archive file formats
Ange Albertini
 
TimeCryption
Ange Albertini
 
You are *not* an idiot
Ange Albertini
 
KILL MD5
Ange Albertini
 
No more dumb hex!
Ange Albertini
 
Beyond your studies
Ange Albertini
 
An introduction to inkscape
Ange Albertini
 
The challenges of file formats
Ange Albertini
 
Exploiting hash collisions
Ange Albertini
 
Infosec & failures
Ange Albertini
 
Connecting communities
Ange Albertini
 
TASBot - the perfectionist
Ange Albertini
 
Hacks in video games
Ange Albertini
 
Funky file formats - 31c3
Ange Albertini
 
Preserving arcade games - 31c3
Ange Albertini
 
Preserving arcade games
Ange Albertini
 

Recently uploaded (20)

PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
Immersive experiences: what Pharo users do!
ESUG
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Immersive experiences: what Pharo users do!
ESUG
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 

Let's write a PDF file

  • 1. Let’s write a PDF file A simple walk-through to learn the basics of the PDF format (at your rhythm) PDF = Portable Document Format r2
  • 2. Ange Albertini reverse engineering & visual documentation @angealbertini [email protected] http://www.corkami.com
  • 3. Goal: write a “Hello World” in PDF
  • 4. PDF is text-based, with some binary in specific cases. But not in this example, so just open a text editor.
  • 5. Statements are separated by white space. (any extra white space is ignored) Any of these: 0x00 Null 0x0C Form Feed 0x09 Tab 0x0D Carriage Return 0x0A Line feed 0x20 Space (yes, you can mix EOL style :( )
  • 6. Delimiters don’t require white space before. ( ) < > [ ] { } /
  • 8. %PDF-_ A PDF starts with a %PDF-? signature followed by a version number. 1.0 <= version number <= 1.7 (it doesn’t really matter here)
  • 9. %PDF-1.3 _ Ok, we have a valid signature ☺
  • 10. %PDF-1.3 %_ A comment starts with % until the end of the line.
  • 11. %PDF-1.3 %file body _ After the signature, comes the file body. (we’ll see about it later)
  • 12. %PDF-1.3 %file body xref _ After the file body, comes the cross reference table. It starts with the xref keyword, on a separated line.
  • 13. %PDF-1.3 %file body xref %xref table here _ After the xref keyword, comes the actual table. (we’ll see about it later)
  • 14. %PDF-1.3 %file body xref %xref table here trailer_ After the table, comes the trailer... It starts with a trailer keyword.
  • 15. %PDF-1.3 %file body xref %xref table here trailer %trailer contents _ (we’ll see that later too…) ...and its contents.
  • 16. %PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref _ (with startxref) Then, a pointer to the xref table...
  • 17. %PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer _ (later, too...)
  • 18. %PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF_ ...an %%EOF marker. Lastly, to mark the end of the file...
  • 19. %PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Easy ;) That’s the overall layout of a PDF document!
  • 20. %PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Now, we just need to fill in the rest :)
  • 22. Def: name objects A.k.a. “strings starting with a slash”
  • 23. /Name A slash, then an alphanumeric string (no whitespace)
  • 24. Case sensitive /Name != /name Names with incorrect case are just ignored (no error is triggered)
  • 25. Def: dictionary object Sequence of keys and values (no delimiter in between) enclosed in << and >> sets each key to value
  • 26. Syntax << key value key value [key value]*… >>
  • 27. Keys are always name objects << /Index 1>> sets /Index to 1 << Index 1 >> is invalid (the key is not a name)
  • 28. Dictionaries can have any length << /Index 1 /Count /Whatever >> sets /Index to 1 and /Count to /Whatever
  • 29. Extra white space is ignored(as usual) << /Index 1 /Count /Whatever >> is equivalent to << /Index 1 /Count /Whatever >>
  • 30. Dictionaries can be nested. << /MyDict << >> >> sets /MyDict to << >> (empty dictionary)
  • 31. White space before delimiters is not required. << /Index 1 /MyDict << >> >> equivalent to <</Index 1/MyDict<<>>>>
  • 32. Def: indirect object an object number (>0), a generation number (0*) the obj keyword the object content the endobj keyword * 99% of the time
  • 33. Example 1 0 obj 3 endobj is object #1, generation 0, containing “3”
  • 34. Def: object reference object number, object generation, R number number R ex: 1 0 R
  • 35. Object reference Refers to an indirect object as a value ex: << /Root 1 0 R >> refers to object number 1 generation 0 as the /Root
  • 36. Used only as values in a dictionary << /Root 1 0 R >> is OK. << 1 0 R /Catalog>> isn’t.
  • 37. Be careful with the syntax! “1 0 3” is a sequence of 3 numbers 1 0 3 “1 0 R” is a single reference to an object number 1 generation 0
  • 38. Def: file body sequence of indirect objects object order doesn’t matter
  • 39. Example 1 0 obj 3 endobj 2 0 obj << /Index 1 >> endobj defines 2 objects with different contents
  • 40. %PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Remember this?
  • 41. A PDF document is defined by a tree of objects.
  • 42. %PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Now, let’s start!
  • 43. %PDF-1.3 %file body xref %xref table here trailer << _ >> startxref %xref pointer %%EOF The trailer is a dictionary.
  • 44. %PDF-1.3 %file body xref %xref table here trailer << /Root_ >> startxref %xref pointer %%EOF It defines a /Root name...
  • 45. %PDF-1.3 %file body xref %xref table here trailer << /Root 1 0 R_>> startxref %xref pointer %%EOF ...that refers to an object...
  • 46. %PDF-1.3 %file body xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF (like all the the other objects) ...that will be in the file body.
  • 47. Recap: the trailer is a dictionary that refers to a root object.
  • 48. %PDF-1.3 _ xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Let’s create our first object...
  • 49. %PDF-1.3 1 0 obj _ endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …(with the standard object declaration)...
  • 50. %PDF-1.3 1 0 obj << _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF (like most objects) ...that contains a dictionary.
  • 51. %PDF-1.3 1 0 obj << /Type_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...and its /Type is...
  • 52. %PDF-1.3 1 0 obj << /Type /Catalog_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...defined as /Catalog...
  • 53. %PDF-1.3 1 0 obj << /Type /Catalog _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF the /Root object also refers to the page tree...
  • 54. %PDF-1.3 1 0 obj << /Type /Catalog /Pages_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...via a /Pages name...
  • 55. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R_>> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...that refers to another object...
  • 56. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj _ xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...which we’ll create.
  • 57. Recap: object 1 is a catalog, and refers to a Pages object.
  • 58. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj _ xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Let’s create object 2.
  • 59. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj _ endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The usual declaration.
  • 60. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF It’s a dictionary too.
  • 61. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The pages’ object /Type has to be defined as … /Pages ☺
  • 62. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF This object defines its children via /Kids...
  • 63. Def: array enclosed in [ ] values separated by whitespace ex: [1 2 3 4] is an array of 4 integers 1 2 3 4
  • 64. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ _ ] >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...which is an array...
  • 65. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R_] >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF … of references to each page object.
  • 66. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF One last step...
  • 67. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1_>> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...the number of kids has to be set in /Count...
  • 68. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...and now object 2 is complete!
  • 69. Recap: object 2 is /Pages; it defines Kids + Count (pages of the document).
  • 70. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj _ We can add our only Kid...
  • 71. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj _ endobj …(a single page)...
  • 72. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << _ >> endobj … a dictionary...
  • 73. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type_ >> endobj … defining a /Type...
  • 74. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page_ >> endobj … as /Page.
  • 75. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent_ >> endobj This grateful kid properly recognizes its own parent...
  • 76. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R_>> endobj … as you would expect ☺
  • 77. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R _ >> endobj Our page requires resources.
  • 78. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources_ >> endobj Let’s add them...
  • 79. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << _ >> >> endobj ...as a dictionary:
  • 80. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font_ >> >> endobj In this case, fonts...
  • 81. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << _ >> >> >> endobj ...as a dictionary.
  • 82. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << _ >> >> >> endobj We define one font...
  • 83. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1_ >> >> >> endobj ...by giving it a name...
  • 84. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << _ >> >> >> >> endobj ...and setting its parameters:
  • 85. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type_ >> >> >> >> endobj its type is ...
  • 86. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font_ >> >> >> >> endobj … font ☺
  • 87. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype_ >> >> >> >> endobj Its font type is...
  • 88. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1_ >> >> >> >> endobj …(Adobe) Type1...
  • 89. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont_>> >> >> >> endobj ...and its name is...
  • 90. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial_>> >> >> >> endobj .../Arial.
  • 91. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> _ >> endobj One thing is missing in our page...
  • 92. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents_ >> endobj The actual page contents...
  • 93. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R_ >> endobj … as a reference to another object.
  • 94. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj That’s all for our page object.
  • 95. Recap: object 3 defines a /Page, its /Parent, /Resources (fonts) and its /Contents is in another object.(thank you Mario!)
  • 97. Def: stream objects So far, everything is text. How do you store binary data (images,...) ?
  • 98. 1 0 obj … endobj Stream objects are objects. They start and they end like any other object: Ex: .
  • 99. Stream objects contain a stream. between stream and endstream keywords 1 0 obj stream <stream content> endstream endobj
  • 100. Streams can contain anything Yes, really! Even binary, other file formats... (except the endstream keyword)
  • 101. Stream parameters are stored before the stream. a dictionary after obj, before stream required: stream length optional: compression algorithm, etc…
  • 102. 1 0 obj << /Length 10 >> stream 0123456789 endstream endobj Example
  • 103. _ %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 104. 4 0 obj _ endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj We create a /Content object... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 105. 4 0 obj stream _ endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj ...that is a stream object... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 107. Page contents syntax parameters sequence then operator ex: param1 param2 operator
  • 108. 4 0 obj stream _ endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj Text objects are delimited by BT and ET... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 109. 4 0 obj stream BT _ ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj ...(BeginText & EndText). xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 110. 4 0 obj stream BT Tf_ ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj We need to set a font, with Tf. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 111. 4 0 obj stream BT _ Tf ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj It takes 2 parameters: a font name... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 112. 4 0 obj stream BT /F1_ Tf ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj ...(from the page’s resources)... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 113. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100_Tf ET endstream endobj ...and a font size. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 114. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf _ ET endstream endobj We move the cursor... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 115. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf Td_ ET endstream endobj ...with the Td operator... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 116. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf _ Td ET endstream endobj ...that takes 2 parameters... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 117. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf 10 400_Td ET endstream endobj ...x and y coordinates. (default page size: 612x792) xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 119. Def: literal strings enclosed in parentheses Ex: (Hi Mum)
  • 121. Can contain white space ( Hello World ! )
  • 123. Escaping is in octal (Hell157 World)
  • 124. 4 0 obj stream BT /F1 100 Tf 10 400 Td _ ET endstream endobj Showing a text string... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 125. 4 0 obj stream BT /F1 100 Tf 10 400 Td Tj_ ET endstream endobj ...is done with the Tj operator... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 126. 4 0 obj stream BT /F1 100 Tf 10 400 Td _ Tj ET endstream endobj ...that takes a single parameter... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 127. 4 0 obj stream BT /F1 100 Tf 10 400 Td (_) Tj ET endstream endobj ...a literal string. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 128. 4 0 obj stream BT /F1 100 Tf 10 400 Td (Hello World_) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 129. 4 0 obj stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Our contents stream is complete... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 130. 4 0 obj stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 131. 4 0 obj _ stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF One last thing... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 132. 4 0 obj << _ >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...we need to set its parameters... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 133. 4 0 obj << /Length_ >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF … the stream length... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 134. 4 0 obj << /Length 44_>> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …including white space (new lines characters…). %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 135. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Our stream parameters are finished... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 136. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...so our page contents object is finished. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj
  • 137. Recap: obj 4 is a stream object with a set length, defining the page’s contents: declare text, set a font and size, move cursor, display text.
  • 138. The whole document is defined. We need to polish the structure.
  • 139. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Our PDF defines 4 objects, starting at index 1...
  • 140. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...but PDFs always have an object 0, that is null...
  • 141. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...so 5 objects, starting at 0.
  • 142. Warning: offsets & EOLs We have to define offsets, which are affected by the EOL conventions: 1 char under Linux/Mac, 2 under Windows. (I use 1 char newlines character here)
  • 143. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Let’s edit the XREF table!
  • 144. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The next line defines the starting index...
  • 145. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...and the number of objects.
  • 146. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Then, one line per object...
  • 147. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...following the xxxxxxxxxx yyyyy a format (10 digits, 5 digits, 1 letter).
  • 148. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The first parameter is the offset (in decimal) of the object...
  • 149. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...(for the null object, it’s 0).
  • 150. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Then, the generation number (that is almost always 0)...
  • 151. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...but for object 0, it’s 65535.
  • 152. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Then, a letter, to tell if this entry is free (f) or in use (n).
  • 153. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Lastly, each line should take 20 bytes, including EOL...
  • 154. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...so add a trailing space.
  • 155. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Next line (the first real object)...
  • 156. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …object offset, in decimal...
  • 157. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …generation number...
  • 158. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …and declare the object index in use (n)...
  • 159. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …and the trailing space
  • 160. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Do the same with the other objects...
  • 161. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 00000 n 00000 n 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …knowing that all lines will end with “ 00000 n ”,...
  • 162. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...set all offsets.
  • 163. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The cross-reference table is finished.
  • 164. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 165. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref %xref pointer %%EOF
  • 166. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref _ %%EOF We set the startxref pointer...
  • 167. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref 364_ %%EOF ...as xref’s offset, in decimal (no prepending 0s).
  • 168. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref 364 %%EOF
  • 169. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R _ >> startxref 364 %%EOF We also need to update the trailer dictionary...
  • 170. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size_ >> startxref 364 %%EOF ...with the number of objects...
  • 171. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size 5_>> startxref 364 %%EOF … in the PDF (including object 0).
  • 172. 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size 5 >> startxref 364 %%EOF Our PDF is now complete.
  • 173. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size 5 >> startxref 364 %%EOF
  • 174. Disclaimer: this is a minimal PDF. Most PDF documents are much bigger, and contain many more elements. Our PDF: 528 bytes 4 objects text only A standard generated “Hello World”: 15 kiloBytes 20 objects text and binary (embedded fonts…)
  • 175. No need to type them yourself! Hint: use “mutool clean” to fix offsets and lengths. http://www.mupdf.com/
  • 176. ⇒ mutool version Slightly different content, but same rendering. %PDF-1.3 %%μῦ 1 0 obj <</Type/Catalog/Pages 2 0 R>> endobj 2 0 obj <</Type/Pages/Kids[3 0 R]/Count 1>> endobj 3 0 obj <</Type/Page/Parent 2 0 R/Resources 5 0 R/Contents 4 0 R>> endobj 4 0 obj <</Length 49>> stream q BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET Q endstream endobj 5 0 obj <</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Arial>>>>>> endobj xref 0 6 0000000000 65536 f 0000000018 00000 n 0000000064 00000 n 0000000116 00000 n 0000000191 00000 n 0000000288 00000 n trailer <</Size 6/Root 1 0 R>> startxref 364 %%EOF
  • 177. Hint: you can directly extract the PDF sources. use “pdftotext --layout” on the slide deck http://www.foolabs.com/xpdf/home.html
  • 178. One more thing... This one is important for self study.
  • 179. Def: stream filters streams can be encoded and/or compressed algorithms can be cascaded ex: compression, then ASCII encoding
  • 180. New stream parameter: /Filter ex: encode the stream in ASCII 1 0 obj << /Length 12 >> stream Hello World! endstream endobj 1 0 obj << /Length 24 /Filter /ASCIIHexDecode>> stream 48656C6C6F20576F726C6421 endstream endobj ⇔
  • 181. Ex: compression (deflate = ZIP compression) 1 0 obj << /Length 12 >> stream Hello World! endstream endobj 1 0 obj << /Length 20 /Filter /FlateDecode>> stream x£¾H═╔╔¤/╩IQ♦ ∟I♦> endstream endobj ⇔
  • 182. Filters can be cascaded. Ex: compressed, then encoded in ASCII 1 0 obj << /Length 12 >> stream Hello World! endstream endobj 1 0 obj << /Length 40 /Filter [/ASCIIHexDecode /FlateDecode] >> stream 789CF348CDC9C95708CF2FCA495104001C49043E endstream endobj ⇔
  • 183. Hint: “mutool clean -d” to remove any stream filter. (if you want to explore PDFs by yourself) http://www.mupdf.com/
  • 185. Questions? (you can download this poster at http://pics.corkami.com)
  • 188. Let’s write a PDF file corkami.com @angealbertini Hail to the king, baby! r2