SlideShare a Scribd company logo
8-May-14
XML
eXtensible Markup Language
2
HTML and XML, I
XML stands for eXtensible Markup Language
HTML is used to mark up
text so it can be displayed to
users
XML is used to mark up
data so it can be processed
by computers
HTML describes both
structure (e.g. <p>, <h2>,
<em>) and appearance (e.g.
<br>, <font>, <i>)
XML describes only
content, or “meaning”
HTML uses a fixed,
unchangeable set of tags
In XML, you make up
your own tags
3
HTML and XML, II
 HTML and XML look similar, because they are
both SGML languages (SGML = Standard
Generalized Markup Language)
 Both HTML and XML use elements enclosed in tags
(e.g. <body>This is an element</body>)
 Both use tag attributes (e.g.,
<font face="Verdana" size="+1" color="red">)
 Both use entities (&lt;, &gt;, &amp;, &quot;, &apos;)
 More precisely,
 HTML is defined in SGML
 XML is a (very small) subset of SGML
4
HTML and XML, III
 HTML is for humans
 HTML describes web pages
 You don’t want to see error messages about web pages you visit
 Browsers ignore and/or correct as many HTML errors as they can,
so HTML is often sloppy
 XML is for computers
 XML describes data
 The rules are strict and errors are not allowed
 In this way, XML is like a programming language
 Current versions of most browsers can display XML
 However, browser support of XML tends to be inconsistent
5
XML-related technologies
 DTD (Document Type Definition) and XML Schemas are used to
define legal XML tags and their attributes for particular purposes
 CSS (Cascading Style Sheets) describe how to display HTML or
XML in a browser
 XSLT (eXtensible Stylesheet Language Transformations) and
XPath are used to translate from one form of XML to another
 DOM (Document Object Model), SAX (Simple API for XML,
and JAXP (Java API for XML Processing) are all APIs for XML
parsing
6
Example XML document
<?xml version="1.0"?>
<weatherReport>
<date>7/14/97</date>
<city>North Place</city>
<state>NX</state>
<country>USA</country>
High Temp: <high scale="F">103</high>
Low Temp: <low scale="F">70</low>
Morning: <morning>Partly cloudy, Hazy</morning>
Afternoon: <afternoon>Sunny &amp; hot</afternoon>
Evening: <evening>Clear and Cooler</evening>
</weatherReport>
From: XML: A Primer, by Simon St. Laurent
7
Overall structure
 An XML document may start with one or more
processing instructions (PIs) or directives:
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="ss.css"?>
 Following the directives, there must be exactly one tag,
called the root element, containing all the rest of the
XML:
<weatherReport>
...
</weatherReport>
8
XML building blocks
 Aside from the directives, an XML document is
built from:
 elements: high in <high scale="F">103</high>
 tags, in pairs: <high scale="F">103</high>
 attributes: <high scale="F">103</high>
 entities: <afternoon>Sunny &amp; hot</afternoon>
 character data, which may be:
 parsed (processed as XML)--this is the default
 unparsed (all characters stand for themselves)
9
Elements and attributes
 Attributes and elements are somewhat interchangeable
 Example using just elements:
<name>
<first>David</first>
<last>Matuszek</last>
</name>
 Example using attributes:
<name first="David" last="Matuszek"></name>
 You will find that elements are easier to use in your programs--
this is a good reason to prefer them
 Attributes often contain metadata, such as unique IDs
 Generally speaking, browsers display only elements (values
enclosed by tags), not tags and attributes
10
Well-formed XML
 Every element must have both a start tag and an end tag, e.g.
<name> ... </name>
 But empty elements can be abbreviated: <break />.
 XML tags are case sensitive
 XML tags may not begin with the letters xml, in any
combination of cases
 Elements must be properly nested, e.g. not <b><i>bold and
italic</b></i>
 Every XML document must have one and only one root element
 The values of attributes must be enclosed in single or double
quotes, e.g. <time unit="days">
 Character data cannot contain < or &
11
Entities
 Five special characters must be written as entities:
&amp; for & (almost always necessary)
&lt; for < (almost always necessary)
&gt; for > (not usually necessary)
&quot; for " (necessary inside double quotes)
&apos; for ' (necessary inside single quotes)
 These entities can be used even in places where they
are not absolutely required
 These are the only predefined entities in XML
12
XML declaration
 The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
 The XML declaration is not required by browsers, but is required by
most XML processors (so include it!)
 If present, the XML declaration must be first--not even whitespace
should precede it
 Note that the brackets are <? and ?>
 version="1.0" is required (this is the only version so far)
 encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or
something else, or it can be omitted
 standalone tells whether there is a separate DTD
13
Processing instructions
 PIs (Processing Instructions) may occur anywhere in the XML
document (but usually first)
 A PI is a command to the program processing the XML
document to handle it in a certain way
 XML documents are typically processed by more than one
program
 Programs that do not recognize a given PI should just ignore it
 General format of a PI: <?target instructions?>
 Example: <?xml-stylesheet type="text/css"
href="mySheet.css"?>
14
Comments
 <!-- This is a comment in both HTML and XML -->
 Comments can be put anywhere in an XML document
 Comments are useful for:
 Explaining the structure of an XML document
 Commenting out parts of the XML during development and testing
 Comments are not elements and do not have an end tag
 The blanks after <!-- and before --> are optional
 The character sequence -- cannot occur in the comment
 The closing bracket must be -->
 Comments are not displayed by browsers, but can be seen by
anyone who looks at the source code
15
CDATA
 By default, all text inside an XML document is parsed
 You can force text to be treated as unparsed character data by
enclosing it in <![CDATA[ ... ]]>
 Any characters, even & and <, can occur inside a CDATA
 Whitespace inside a CDATA is (usually) preserved
 The only real restriction is that the character sequence ]]> cannot
occur inside a CDATA
 CDATA is useful when your text has a lot of illegal characters
(for example, if your XML document contains some HTML text)
16
Names in XML
 Names (as used for tags and attributes) must begin with
a letter or underscore, and can consist of:
 Letters, both Roman (English) and foreign
 Digits, both Roman and foreign
. (dot)
- (hyphen)
_ (underscore)
: (colon) should be used only for namespaces
 Combining characters and extenders (not used in English)
17
Namespaces
 Recall that DTDs are used to define the tags that can be
used in an XML document
 An XML document may reference more than one DTD
 Namespaces are a way to specify which DTD defines a
given tag
 XML, like Java, uses qualified names
 This helps to avoid collisions between names
 Java: myObject.myVariable
 XML: myDTD:myTag
 Note that XML uses a colon (:) rather than a dot (.)
18
Namespaces and URIs
 A namespace is defined as a unique string
 To guarantee uniqueness, typically a URI (Uniform
Resource Indicator) is used, because the author “owns”
the domain
 It doesn't have to be a “real” URI; it just has to be a
unique string
 Example: http://www.matuszek.org/ns
 There are two ways to use namespaces:
 Declare a default namespace
 Associate a prefix with a namespace, then use the prefix
in the XML to refer to the namespace
19
Namespace syntax
 In any start tag you can use the reserved attribute name xmlns:
<book xmlns="http://www.matuszek.org/ns">
 This namespace will be used as the default for all elements up to the
corresponding end tag
 You can override it with a specific prefix
 You can use almost this same form to declare a prefix:
<book xmlns:dave="http://www.matuszek.org/ns">
 Use this prefix on every tag and attribute you want to use from this
namespace, including end tags--it is not a default prefix
<dave:chapter dave:number="1">To Begin</dave:chapter>
 You can use the prefix in the start tag in which it is defined:
<dave:book xmlns:dave="http://www.matuszek.org/ns">
20
Review of XML rules
 Start with <?xml version="1"?>
 XML is case sensitive
 You must have exactly one root element that encloses
all the rest of the XML
 Every element must have a closing tag
 Elements must be properly nested
 Attribute values must be enclosed in double or single
quotation marks
 There are only five predeclared entities
21
Another well-structured example
<novel>
<foreword>
<paragraph>This is the great American novel.
</paragraph>
</foreword>
<chapter number="1">
<paragraph>It was a dark and stormy night.
</paragraph>
<paragraph>Suddenly, a shot rang out!
</paragraph>
</chapter>
</novel>
22
XML as a tree
 An XML document represents a hierarchy; a hierarchy is a tree
novel
foreword chapter
number="1"
paragraph paragraph paragraph
This is the great
American novel.
It was a dark
and stormy night.
Suddenly, a shot
rang out!
23
Valid XML
 You can make up your own XML tags and attributes, but...
 ...any program that uses the XML must know what to expect!
 A DTD (Document Type Definition) defines what tags are legal
and where they can occur in the XML
 An XML document does not require a DTD
 XML is well-structured if it follows the rules given earlier
 In addition, XML is valid if it declares a DTD and conforms to
that DTD
 A DTD can be included in the XML, but is typically a separate
document
 Errors in XML documents will stop XML programs
 Some alternatives to DTDs are XML Schemas and RELAX NG
24
Mixed content
 An element may contain other elements, plain text, or both
 An element containing only text:
<name>David Matuszek</name>
 An element (<name>) containing only elements:
<name><first>David</first><last>Matuszek</last></name>
 An element containing both:
<class>CIT597 <time>10:30-12:00 MW</time></class>
 An element that contains both text and other elements is said to
have mixed content
 Mixed content is legal, but bad
 Mixed content makes it much harder to define valid XML
 Mixed content is more complicated to use in a program
 Mixed content adds no power to XML--it is never needed for anything
25
Example XML document, revised
<?xml version="1.0"?>
<weatherReport>
<date>7/14/97</date>
<place><city>North Place</city>
<state>NX</state>
<country>USA</country>
</place>
<temperatures><high scale="F">103</high>
<low scale="F">70</low>
</temperatures>
<forecast><time>Morning</time>
<predict>Partly cloudy, Hazy</predict>
</forecast>
<forecast><time>Afternoon</time>
<predict>Sunny &amp; hot</predict>
</forecast>
<forecast><time>Evening</time>
<predict>Clear and Cooler</predict>
</weatherReport>
26
Viewing XML
 XML is designed to be processed by computer
programs, not to be displayed to humans
 Nevertheless, almost all current browsers can display
XML documents
 They don’t all display it the same way
 They may not display it at all if it has errors
 For best results, update your browsers to the newest available
versions
 Remember:
HTML is designed to be viewed,
XML is designed to be used
27
Extended document standards
 You can define your own XML tag sets, but here are
some already available:
 XHTML: HTML redefined in XML
 XAML: eXtensible Application Markup Language
 declarative language for Microsoft Silverlight and DirectX graphics
programming in Windows Vista or later
 SOAP: Simple Object Access Protocol
 defines the forms of messages and RPC’s (Remote Procedure Calls)
 MathML: Mathematical Markup Language
 describes math notation for integrating mathematical formulae into
HTML and other document types
 SVG: Scalable Vector Graphics
 describing two-dimensional vector graphics
28
Vocabulary
 SGML: Standard Generalized Markup Language
 XML : Extensible Markup Language
 DTD: Document Type Definition
 element: a start and end tag, along with their contents
 attribute: a value given in the start tag of an element
 entity: a representation of a particular character or string
 PI: a Processing Instruction, to possibly be used by a
program that processes this XML
 namespace: a unique string that references a DTD
 well-formed XML: XML that follows the basic syntax rules
 valid XML: well-formed XML that conforms to a DTD
29
The End

More Related Content

What's hot (20)

PPTX
page replacement.pptx
homipeh
 
PPT
Jsp ppt
Vikas Jagtap
 
PPT
Chapter 02 php basic syntax
Dhani Ahmad
 
PPT
Php with MYSQL Database
Computer Hardware & Trouble shooting
 
PPTX
Javascript event handler
Jesus Obenita Jr.
 
PPT
Java Networking
Sunil OS
 
PPS
Java rmi
kamal kotecha
 
PPT
JDBC Architecture and Drivers
SimoniShah6
 
PPTX
HTML Forms
Ravinder Kamboj
 
PPTX
html-css
Dhirendra Chauhan
 
PPTX
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
Aaqib Hussain
 
PPTX
Types of Programming Errors
Neha Sharma
 
PPT
Introduction to PHP
Jussi Pohjolainen
 
PPTX
Rdbms
rdbms
 
PPTX
Soap web service
NITT, KAMK
 
PPTX
jQuery
Dileep Mishra
 
PDF
S3 classes and s4 classes
Ashwini Mathur
 
PPTX
Java RMI
Prajakta Nimje
 
PPT
Object-oriented concepts
BG Java EE Course
 
PPT
Jdbc ppt
Vikas Jagtap
 
page replacement.pptx
homipeh
 
Jsp ppt
Vikas Jagtap
 
Chapter 02 php basic syntax
Dhani Ahmad
 
Php with MYSQL Database
Computer Hardware & Trouble shooting
 
Javascript event handler
Jesus Obenita Jr.
 
Java Networking
Sunil OS
 
Java rmi
kamal kotecha
 
JDBC Architecture and Drivers
SimoniShah6
 
HTML Forms
Ravinder Kamboj
 
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
Aaqib Hussain
 
Types of Programming Errors
Neha Sharma
 
Introduction to PHP
Jussi Pohjolainen
 
Rdbms
rdbms
 
Soap web service
NITT, KAMK
 
S3 classes and s4 classes
Ashwini Mathur
 
Java RMI
Prajakta Nimje
 
Object-oriented concepts
BG Java EE Course
 
Jdbc ppt
Vikas Jagtap
 

Viewers also liked (20)

PPT
Chapter 2 - Matter and Change
Kendon Smith
 
PPTX
Introduction 2 linux
Papu Kumar
 
PDF
Chapter 1 Lecture- Matter & Measurement
Mary Beth Smith
 
PPTX
Operating system ppt
kaviya kumaresan
 
DOC
2010 05 02 10 Dr Daniel C Tsui Physics Nobel Prize King Carl Xvi Gustaf Of S...
hjk888
 
PDF
Chapter One- Intro to Biology
Mary Beth Smith
 
PPT
UNIX(Essential needs of administration)
Papu Kumar
 
PPT
Chapter 4 Lecture- Solution Stoich
Mary Beth Smith
 
ODP
Web Application Lunacy
anandvaidya
 
PPT
Chapter 5 - Electron Configurations
Kendon Smith
 
PDF
005 skyeye
Sherif Mousa
 
PDF
Chapter 24- Seeds & Flowers
Mary Beth Smith
 
PDF
سه فیزیک دان برنده نوبل ۲۰۱۶
گروه نجوم پرن یزد
 
PPT
Chapter 6 - The Periodic Table
Kendon Smith
 
PPT
Chapter 4 notes
Kendon Smith
 
PPTX
Smile
Sherif Mousa
 
PDF
1 introduction
Dr. Loganathan R
 
PDF
Digestive & Excretory Systems- Chapter 38
Mary Beth Smith
 
PPT
Chapter 40 Lecture- The Immune System
Mary Beth Smith
 
DOCX
Macro economics
mukul bhardwaj
 
Chapter 2 - Matter and Change
Kendon Smith
 
Introduction 2 linux
Papu Kumar
 
Chapter 1 Lecture- Matter & Measurement
Mary Beth Smith
 
Operating system ppt
kaviya kumaresan
 
2010 05 02 10 Dr Daniel C Tsui Physics Nobel Prize King Carl Xvi Gustaf Of S...
hjk888
 
Chapter One- Intro to Biology
Mary Beth Smith
 
UNIX(Essential needs of administration)
Papu Kumar
 
Chapter 4 Lecture- Solution Stoich
Mary Beth Smith
 
Web Application Lunacy
anandvaidya
 
Chapter 5 - Electron Configurations
Kendon Smith
 
005 skyeye
Sherif Mousa
 
Chapter 24- Seeds & Flowers
Mary Beth Smith
 
سه فیزیک دان برنده نوبل ۲۰۱۶
گروه نجوم پرن یزد
 
Chapter 6 - The Periodic Table
Kendon Smith
 
Chapter 4 notes
Kendon Smith
 
1 introduction
Dr. Loganathan R
 
Digestive & Excretory Systems- Chapter 38
Mary Beth Smith
 
Chapter 40 Lecture- The Immune System
Mary Beth Smith
 
Macro economics
mukul bhardwaj
 
Ad

Similar to Introduction to xml (20)

PPT
XML Presentation-2
Sudharsan S
 
PPTX
Unit3wt
vamsitricks
 
PPTX
Unit3wt
vamsi krishna
 
PPT
eXtensible Markup Language (By Dr.Hatem Mohamed)
MUFIX Community
 
PPT
uptu web technology unit 2 Xml2
Abhishek Kesharwani
 
PPTX
Unit 5 xml (1)
manochitra10
 
PPT
EXtensible Markup Language
Prabhat gangwar
 
PPTX
Web Development Course - XML by RSOLUTIONS
RSolutions
 
PPT
Ch2 neworder
davidlahr32
 
PPT
Xml Presentation-3
Sudharsan S
 
PPT
Xml1111
Sudharsan S
 
PPTX
LECT_TWO.pptx
YehoshaphatJoshua
 
PPT
xml.ppt
RajaGanesan14
 
PDF
XMLin Web development and Applications.pdf
VinayVitekari
 
PPT
Xml (2)
sudhakar mandal
 
PPTX
Sgml and xml
Jaya Kumari
 
PDF
Introduction to XML
Prabu U
 
XML Presentation-2
Sudharsan S
 
Unit3wt
vamsitricks
 
Unit3wt
vamsi krishna
 
eXtensible Markup Language (By Dr.Hatem Mohamed)
MUFIX Community
 
uptu web technology unit 2 Xml2
Abhishek Kesharwani
 
Unit 5 xml (1)
manochitra10
 
EXtensible Markup Language
Prabhat gangwar
 
Web Development Course - XML by RSOLUTIONS
RSolutions
 
Ch2 neworder
davidlahr32
 
Xml Presentation-3
Sudharsan S
 
Xml1111
Sudharsan S
 
LECT_TWO.pptx
YehoshaphatJoshua
 
xml.ppt
RajaGanesan14
 
XMLin Web development and Applications.pdf
VinayVitekari
 
Sgml and xml
Jaya Kumari
 
Introduction to XML
Prabu U
 
Ad

More from Shivalik college of engineering (20)

DOCX
Front pages of practical file
Shivalik college of engineering
 
DOC
Algorithms Question bank
Shivalik college of engineering
 
PDF
Infosystestpattern
Shivalik college of engineering
 
PPT
stack presentation
Shivalik college of engineering
 
DOCX
Dbms lab file format front page
Shivalik college of engineering
 
DOCX
Question bank toafl
Shivalik college of engineering
 
PPT
computer architecture.
Shivalik college of engineering
 
PPT
Parallel processing
Shivalik college of engineering
 
PPT
SQA presenatation made by krishna ballabh gupta
Shivalik college of engineering
 
PPT
Webapplication ppt prepared by krishna ballabh gupta
Shivalik college of engineering
 
PPTX
Cloud computing prepare by krishna ballabh gupta
Shivalik college of engineering
 
PPT
Cloud computing kb gupta
Shivalik college of engineering
 
PPT
comparing windows and linux ppt
Shivalik college of engineering
 
PPT
Gsm an introduction....
Shivalik college of engineering
 
PPT
Gsm an introduction....
Shivalik college of engineering
 
Front pages of practical file
Shivalik college of engineering
 
Algorithms Question bank
Shivalik college of engineering
 
Dbms lab file format front page
Shivalik college of engineering
 
Question bank toafl
Shivalik college of engineering
 
computer architecture.
Shivalik college of engineering
 
Parallel processing
Shivalik college of engineering
 
SQA presenatation made by krishna ballabh gupta
Shivalik college of engineering
 
Webapplication ppt prepared by krishna ballabh gupta
Shivalik college of engineering
 
Cloud computing prepare by krishna ballabh gupta
Shivalik college of engineering
 
Cloud computing kb gupta
Shivalik college of engineering
 
comparing windows and linux ppt
Shivalik college of engineering
 
Gsm an introduction....
Shivalik college of engineering
 
Gsm an introduction....
Shivalik college of engineering
 

Recently uploaded (20)

PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
July Patch Tuesday
Ivanti
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
July Patch Tuesday
Ivanti
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 

Introduction to xml

  • 2. 2 HTML and XML, I XML stands for eXtensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to mark up data so it can be processed by computers HTML describes both structure (e.g. <p>, <h2>, <em>) and appearance (e.g. <br>, <font>, <i>) XML describes only content, or “meaning” HTML uses a fixed, unchangeable set of tags In XML, you make up your own tags
  • 3. 3 HTML and XML, II  HTML and XML look similar, because they are both SGML languages (SGML = Standard Generalized Markup Language)  Both HTML and XML use elements enclosed in tags (e.g. <body>This is an element</body>)  Both use tag attributes (e.g., <font face="Verdana" size="+1" color="red">)  Both use entities (&lt;, &gt;, &amp;, &quot;, &apos;)  More precisely,  HTML is defined in SGML  XML is a (very small) subset of SGML
  • 4. 4 HTML and XML, III  HTML is for humans  HTML describes web pages  You don’t want to see error messages about web pages you visit  Browsers ignore and/or correct as many HTML errors as they can, so HTML is often sloppy  XML is for computers  XML describes data  The rules are strict and errors are not allowed  In this way, XML is like a programming language  Current versions of most browsers can display XML  However, browser support of XML tends to be inconsistent
  • 5. 5 XML-related technologies  DTD (Document Type Definition) and XML Schemas are used to define legal XML tags and their attributes for particular purposes  CSS (Cascading Style Sheets) describe how to display HTML or XML in a browser  XSLT (eXtensible Stylesheet Language Transformations) and XPath are used to translate from one form of XML to another  DOM (Document Object Model), SAX (Simple API for XML, and JAXP (Java API for XML Processing) are all APIs for XML parsing
  • 6. 6 Example XML document <?xml version="1.0"?> <weatherReport> <date>7/14/97</date> <city>North Place</city> <state>NX</state> <country>USA</country> High Temp: <high scale="F">103</high> Low Temp: <low scale="F">70</low> Morning: <morning>Partly cloudy, Hazy</morning> Afternoon: <afternoon>Sunny &amp; hot</afternoon> Evening: <evening>Clear and Cooler</evening> </weatherReport> From: XML: A Primer, by Simon St. Laurent
  • 7. 7 Overall structure  An XML document may start with one or more processing instructions (PIs) or directives: <?xml version="1.0"?> <?xml-stylesheet type="text/css" href="ss.css"?>  Following the directives, there must be exactly one tag, called the root element, containing all the rest of the XML: <weatherReport> ... </weatherReport>
  • 8. 8 XML building blocks  Aside from the directives, an XML document is built from:  elements: high in <high scale="F">103</high>  tags, in pairs: <high scale="F">103</high>  attributes: <high scale="F">103</high>  entities: <afternoon>Sunny &amp; hot</afternoon>  character data, which may be:  parsed (processed as XML)--this is the default  unparsed (all characters stand for themselves)
  • 9. 9 Elements and attributes  Attributes and elements are somewhat interchangeable  Example using just elements: <name> <first>David</first> <last>Matuszek</last> </name>  Example using attributes: <name first="David" last="Matuszek"></name>  You will find that elements are easier to use in your programs-- this is a good reason to prefer them  Attributes often contain metadata, such as unique IDs  Generally speaking, browsers display only elements (values enclosed by tags), not tags and attributes
  • 10. 10 Well-formed XML  Every element must have both a start tag and an end tag, e.g. <name> ... </name>  But empty elements can be abbreviated: <break />.  XML tags are case sensitive  XML tags may not begin with the letters xml, in any combination of cases  Elements must be properly nested, e.g. not <b><i>bold and italic</b></i>  Every XML document must have one and only one root element  The values of attributes must be enclosed in single or double quotes, e.g. <time unit="days">  Character data cannot contain < or &
  • 11. 11 Entities  Five special characters must be written as entities: &amp; for & (almost always necessary) &lt; for < (almost always necessary) &gt; for > (not usually necessary) &quot; for " (necessary inside double quotes) &apos; for ' (necessary inside single quotes)  These entities can be used even in places where they are not absolutely required  These are the only predefined entities in XML
  • 12. 12 XML declaration  The XML declaration looks like this: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>  The XML declaration is not required by browsers, but is required by most XML processors (so include it!)  If present, the XML declaration must be first--not even whitespace should precede it  Note that the brackets are <? and ?>  version="1.0" is required (this is the only version so far)  encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or something else, or it can be omitted  standalone tells whether there is a separate DTD
  • 13. 13 Processing instructions  PIs (Processing Instructions) may occur anywhere in the XML document (but usually first)  A PI is a command to the program processing the XML document to handle it in a certain way  XML documents are typically processed by more than one program  Programs that do not recognize a given PI should just ignore it  General format of a PI: <?target instructions?>  Example: <?xml-stylesheet type="text/css" href="mySheet.css"?>
  • 14. 14 Comments  <!-- This is a comment in both HTML and XML -->  Comments can be put anywhere in an XML document  Comments are useful for:  Explaining the structure of an XML document  Commenting out parts of the XML during development and testing  Comments are not elements and do not have an end tag  The blanks after <!-- and before --> are optional  The character sequence -- cannot occur in the comment  The closing bracket must be -->  Comments are not displayed by browsers, but can be seen by anyone who looks at the source code
  • 15. 15 CDATA  By default, all text inside an XML document is parsed  You can force text to be treated as unparsed character data by enclosing it in <![CDATA[ ... ]]>  Any characters, even & and <, can occur inside a CDATA  Whitespace inside a CDATA is (usually) preserved  The only real restriction is that the character sequence ]]> cannot occur inside a CDATA  CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text)
  • 16. 16 Names in XML  Names (as used for tags and attributes) must begin with a letter or underscore, and can consist of:  Letters, both Roman (English) and foreign  Digits, both Roman and foreign . (dot) - (hyphen) _ (underscore) : (colon) should be used only for namespaces  Combining characters and extenders (not used in English)
  • 17. 17 Namespaces  Recall that DTDs are used to define the tags that can be used in an XML document  An XML document may reference more than one DTD  Namespaces are a way to specify which DTD defines a given tag  XML, like Java, uses qualified names  This helps to avoid collisions between names  Java: myObject.myVariable  XML: myDTD:myTag  Note that XML uses a colon (:) rather than a dot (.)
  • 18. 18 Namespaces and URIs  A namespace is defined as a unique string  To guarantee uniqueness, typically a URI (Uniform Resource Indicator) is used, because the author “owns” the domain  It doesn't have to be a “real” URI; it just has to be a unique string  Example: http://www.matuszek.org/ns  There are two ways to use namespaces:  Declare a default namespace  Associate a prefix with a namespace, then use the prefix in the XML to refer to the namespace
  • 19. 19 Namespace syntax  In any start tag you can use the reserved attribute name xmlns: <book xmlns="http://www.matuszek.org/ns">  This namespace will be used as the default for all elements up to the corresponding end tag  You can override it with a specific prefix  You can use almost this same form to declare a prefix: <book xmlns:dave="http://www.matuszek.org/ns">  Use this prefix on every tag and attribute you want to use from this namespace, including end tags--it is not a default prefix <dave:chapter dave:number="1">To Begin</dave:chapter>  You can use the prefix in the start tag in which it is defined: <dave:book xmlns:dave="http://www.matuszek.org/ns">
  • 20. 20 Review of XML rules  Start with <?xml version="1"?>  XML is case sensitive  You must have exactly one root element that encloses all the rest of the XML  Every element must have a closing tag  Elements must be properly nested  Attribute values must be enclosed in double or single quotation marks  There are only five predeclared entities
  • 21. 21 Another well-structured example <novel> <foreword> <paragraph>This is the great American novel. </paragraph> </foreword> <chapter number="1"> <paragraph>It was a dark and stormy night. </paragraph> <paragraph>Suddenly, a shot rang out! </paragraph> </chapter> </novel>
  • 22. 22 XML as a tree  An XML document represents a hierarchy; a hierarchy is a tree novel foreword chapter number="1" paragraph paragraph paragraph This is the great American novel. It was a dark and stormy night. Suddenly, a shot rang out!
  • 23. 23 Valid XML  You can make up your own XML tags and attributes, but...  ...any program that uses the XML must know what to expect!  A DTD (Document Type Definition) defines what tags are legal and where they can occur in the XML  An XML document does not require a DTD  XML is well-structured if it follows the rules given earlier  In addition, XML is valid if it declares a DTD and conforms to that DTD  A DTD can be included in the XML, but is typically a separate document  Errors in XML documents will stop XML programs  Some alternatives to DTDs are XML Schemas and RELAX NG
  • 24. 24 Mixed content  An element may contain other elements, plain text, or both  An element containing only text: <name>David Matuszek</name>  An element (<name>) containing only elements: <name><first>David</first><last>Matuszek</last></name>  An element containing both: <class>CIT597 <time>10:30-12:00 MW</time></class>  An element that contains both text and other elements is said to have mixed content  Mixed content is legal, but bad  Mixed content makes it much harder to define valid XML  Mixed content is more complicated to use in a program  Mixed content adds no power to XML--it is never needed for anything
  • 25. 25 Example XML document, revised <?xml version="1.0"?> <weatherReport> <date>7/14/97</date> <place><city>North Place</city> <state>NX</state> <country>USA</country> </place> <temperatures><high scale="F">103</high> <low scale="F">70</low> </temperatures> <forecast><time>Morning</time> <predict>Partly cloudy, Hazy</predict> </forecast> <forecast><time>Afternoon</time> <predict>Sunny &amp; hot</predict> </forecast> <forecast><time>Evening</time> <predict>Clear and Cooler</predict> </weatherReport>
  • 26. 26 Viewing XML  XML is designed to be processed by computer programs, not to be displayed to humans  Nevertheless, almost all current browsers can display XML documents  They don’t all display it the same way  They may not display it at all if it has errors  For best results, update your browsers to the newest available versions  Remember: HTML is designed to be viewed, XML is designed to be used
  • 27. 27 Extended document standards  You can define your own XML tag sets, but here are some already available:  XHTML: HTML redefined in XML  XAML: eXtensible Application Markup Language  declarative language for Microsoft Silverlight and DirectX graphics programming in Windows Vista or later  SOAP: Simple Object Access Protocol  defines the forms of messages and RPC’s (Remote Procedure Calls)  MathML: Mathematical Markup Language  describes math notation for integrating mathematical formulae into HTML and other document types  SVG: Scalable Vector Graphics  describing two-dimensional vector graphics
  • 28. 28 Vocabulary  SGML: Standard Generalized Markup Language  XML : Extensible Markup Language  DTD: Document Type Definition  element: a start and end tag, along with their contents  attribute: a value given in the start tag of an element  entity: a representation of a particular character or string  PI: a Processing Instruction, to possibly be used by a program that processes this XML  namespace: a unique string that references a DTD  well-formed XML: XML that follows the basic syntax rules  valid XML: well-formed XML that conforms to a DTD