SlideShare a Scribd company logo
Parsing XML Data


Kewang
Sample XML
<?xml version="1.0" encoding="utf-8"?>
<CATALOG>
  <CD>
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
</CATALOG>                               2
SAX
Simple API for XML


                     3
4
SAX sample (1/2)

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser sp = factory.newSAXParser();
XMLReader xr = sp.getXMLReader();
InputSource is = new InputSource(new StringReader(xml.toString()));

xr.setContentHandler(handler);
xr.setErrorHandler(handler);

xr.parse(is);




                                                                 5
SAX sample (2/2)
private DefaultHandler handler = new DefaultHandler() {
  private boolean hasTitle;

  @Override
  public void characters(char[] ch, int start, int length) {
    if (hasTitle) {
      txtResult.setText(new String(ch, start, length));
    }
  }

  @Override
  public void endElement(String uri, String lName, String qName) {
    hasTitle = false;
  }

   @Override
   public void startElement(String uri, String lName, String qName,
Attributes attrs) {
     hasTitle = lName.equals("TITLE");
                                                                      6
   }
};
XML Pull sample
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
XmlPullParser xpp = factory.newPullParser();

xpp.setInput(new StringReader(xml.toString()));

int eventType = xpp.getEventType();

while (eventType != XmlPullParser.END_DOCUMENT) {
  switch (eventType) {
  case XmlPullParser.START_TAG:
    if (xpp.getName().equals("TITLE")) {
      txtResult.setText(xpp.nextText());
    }

        break;
    }

    eventType = xpp.next();                                      7
}
DOM
Document Object Model


                        8
9
W3C DOM example
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new
ByteArrayInputStream(xml.toString().getBytes()));
Element root = doc.getDocumentElement();
NodeList cds = root.getChildNodes();

for (int i = 0; i < cds.getLength(); i++) {
  Node cd = cds.item(i);

  if (cd.getNodeType() == Node.ELEMENT_NODE) {
    NodeList titles = cd.getChildNodes();

    for (int j = 0; j < titles.getLength(); j++) {
      Node title = titles.item(j);

      if (title.getNodeType() == Node.ELEMENT_NODE &&
title.getNodeName().equals("TITLE")) {
        txtResult.setText(title.getFirstChild().getNodeValue());
      }
    }
  }                                                                      10
}
JDOM example


SAXBuilder sax = new SAXBuilder();
Document doc = sax.build(new StringReader(xml.toString()));
Element root = doc.getRootElement();

for (Element elem : root.getChildren("CD")) {
  txtResult.setText(elem.getChildText("TITLE"));
}

                  inner Structure: SAX parser
                  outer Structure: DOM operation
                                                       11
Jsoup example


Document doc = Jsoup.parse(xml.toString());
Elements titles = doc.select("TITLE");

for (Element elem : titles) {
  txtResult.setText(elem.text());
}

          v1.6.2(2012/3/27): add XML parser
          support selector syntax
                                              12
Charts

         13
XML parsing speed

           JSOUP


           JDOM
Method




            W3C


         XMLPULL


            SAX

                   0    100   200   300        400   500   600   700
                                                                  14
                                Milliseconds
Code Line

           JSOUP


           JDOM
Method




            W3C


         XMLPULL


            SAX

                   0   10     20      30   40   50   60
                                                     15
                               Code Line
Which one?
  SAX vs. DOM


                16
Which one?

      Memory Speed     Parser      Modify   Traversing



SAX    Small   Fast     Event       Can't    One-way




DOM    Large   Slow   Tree model    Can      Any-way


                                                       17
References
●   Simple API for XML
●   Document Object Model
●   What is the "Thing" called XML?
●   SAX & DOM parsers
●
    解析XML三种方式(PULL、SAX、DOM)
●
    [分享] 處理簡單的 XML 文件


                                      18

More Related Content

What's hot (20)

PPT
XML SAX PARSING
Eviatar Levy
 
PPTX
Java and XML
Raji Ghawi
 
PPTX
XML Document Object Model (DOM)
BOSS Webtech
 
PPTX
Introductionto xslt
Kumar
 
PPT
XML and XPath details
DSK Chakravarthy
 
PPTX
L16 Object Relational Mapping and NoSQL
Ólafur Andri Ragnarsson
 
PPTX
ODTUG Webcast - Thinking Clearly about XML
Marco Gralike
 
PPTX
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Marco Gralike
 
PPT
Xpath presentation
Alfonso Gabriel López Ceballos
 
PPTX
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
Marco Gralike
 
PPTX
Oracle Database 11g Release 2 - XMLDB New Features
Marco Gralike
 
KEY
Object Relational Mapping in PHP
Rob Knight
 
PPTX
06 xml processing-in-.net
glubox
 
PPTX
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
Marco Gralike
 
PPTX
BGOUG 2012 - XML Index Strategies
Marco Gralike
 
PPTX
Ajax
Yoga Raja
 
PDF
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
Marco Gralike
 
PPTX
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File Server
Marco Gralike
 
PPTX
Miracle Open World 2011 - XML Index Strategies
Marco Gralike
 
XML SAX PARSING
Eviatar Levy
 
Java and XML
Raji Ghawi
 
XML Document Object Model (DOM)
BOSS Webtech
 
Introductionto xslt
Kumar
 
XML and XPath details
DSK Chakravarthy
 
L16 Object Relational Mapping and NoSQL
Ólafur Andri Ragnarsson
 
ODTUG Webcast - Thinking Clearly about XML
Marco Gralike
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Marco Gralike
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
Marco Gralike
 
Oracle Database 11g Release 2 - XMLDB New Features
Marco Gralike
 
Object Relational Mapping in PHP
Rob Knight
 
06 xml processing-in-.net
glubox
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
Marco Gralike
 
BGOUG 2012 - XML Index Strategies
Marco Gralike
 
Ajax
Yoga Raja
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
Marco Gralike
 
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File Server
Marco Gralike
 
Miracle Open World 2011 - XML Index Strategies
Marco Gralike
 

Viewers also liked (19)

PPTX
Semantic web xml-rdf-dom parser
Serdar Sönmez
 
PDF
XML DOM
Hoang Nguyen
 
PPT
Dino's DEV Project
dinoppc40sw07
 
PPT
DOM and SAX
Jussi Pohjolainen
 
PPTX
Jsoup tutorial
Ramakrishna kapa
 
PPTX
Jsoup Tutorial for Beginners - Javatpoint
JavaTpoint.Com
 
PPT
Session 5
Lại Đức Chung
 
PDF
Web Crawling
Carlos Castillo (ChaTo)
 
PPTX
Introduce to XML
videde_group
 
PPT
Understanding XML DOM
Om Vikram Thapa
 
ODP
Web crawler
Daniel Mantovani
 
PPTX
Parsing XML & JSON in Apex
Abhinav Gupta
 
PPT
5 xml parsing
gauravashq
 
PDF
手把手教你如何串接 Log 到各種網路服務
Mu Chun Wang
 
PPTX
java API for XML DOM
Surinder Kaur
 
PDF
Current challenges in web crawling
Denis Shestakov
 
PDF
Crawleando a web feito gente grande com o scrapy
Bernardo Fontes
 
PPTX
Dom parser
sana mateen
 
Semantic web xml-rdf-dom parser
Serdar Sönmez
 
XML DOM
Hoang Nguyen
 
Dino's DEV Project
dinoppc40sw07
 
DOM and SAX
Jussi Pohjolainen
 
Jsoup tutorial
Ramakrishna kapa
 
Jsoup Tutorial for Beginners - Javatpoint
JavaTpoint.Com
 
Introduce to XML
videde_group
 
Understanding XML DOM
Om Vikram Thapa
 
Web crawler
Daniel Mantovani
 
Parsing XML & JSON in Apex
Abhinav Gupta
 
5 xml parsing
gauravashq
 
手把手教你如何串接 Log 到各種網路服務
Mu Chun Wang
 
java API for XML DOM
Surinder Kaur
 
Current challenges in web crawling
Denis Shestakov
 
Crawleando a web feito gente grande com o scrapy
Bernardo Fontes
 
Dom parser
sana mateen
 
Ad

Similar to Parsing XML Data (20)

PDF
Ch23
preetamju
 
PDF
Ch23 xml processing_with_java
ardnetij
 
PPT
Xm lparsers
Suman Lata
 
PPT
Sax Dom Tutorial
vikram singh
 
PPT
Processing XML with Java
BG Java EE Course
 
PPT
XML
thotasrinath
 
PDF
Xml & Java
Slim Ouertani
 
PDF
24sax
Adil Jafri
 
PDF
25dom
Adil Jafri
 
PPT
SAX PARSER
Saranya Arunprasath
 
PDF
Understanding Sax
LiquidHub
 
PDF
Processing XML
Ólafur Andri Ragnarsson
 
PDF
Web Technologies (8/12): XML & HTML Data Processing. Simple API for XML. Simp...
Sabin Buraga
 
PDF
X Usax Pdf
nit Allahabad
 
PPT
Xml Java
cbee48
 
PDF
Building XML Based Applications
Prabu U
 
PPTX
Unit iv xml dom
smitha273566
 
PPTX
WEB PRORAMMING NOTES WITH EXAMPLE PROGRAMS
SATHYABAMAMADHANKUMA
 
PPTX
buildingxmlbasedapplications-180322042009.pptx
NKannanCSE
 
PDF
Service Oriented Architecture - Unit II - Sax
Roselin Mary S
 
Ch23
preetamju
 
Ch23 xml processing_with_java
ardnetij
 
Xm lparsers
Suman Lata
 
Sax Dom Tutorial
vikram singh
 
Processing XML with Java
BG Java EE Course
 
Xml & Java
Slim Ouertani
 
24sax
Adil Jafri
 
25dom
Adil Jafri
 
SAX PARSER
Saranya Arunprasath
 
Understanding Sax
LiquidHub
 
Processing XML
Ólafur Andri Ragnarsson
 
Web Technologies (8/12): XML & HTML Data Processing. Simple API for XML. Simp...
Sabin Buraga
 
X Usax Pdf
nit Allahabad
 
Xml Java
cbee48
 
Building XML Based Applications
Prabu U
 
Unit iv xml dom
smitha273566
 
WEB PRORAMMING NOTES WITH EXAMPLE PROGRAMS
SATHYABAMAMADHANKUMA
 
buildingxmlbasedapplications-180322042009.pptx
NKannanCSE
 
Service Oriented Architecture - Unit II - Sax
Roselin Mary S
 
Ad

More from Mu Chun Wang (20)

PDF
如何在有限資源下實現十年的後端服務演進
Mu Chun Wang
 
PDF
深入淺出 autocomplete
Mu Chun Wang
 
PDF
你畢業後要任職的軟體業到底都在做些什麼事
Mu Chun Wang
 
PDF
網路服務就是一連串搜尋的集合體
Mu Chun Wang
 
PDF
老司機帶你上手 PostgreSQL 關聯式資料庫系統
Mu Chun Wang
 
PDF
使用 PostgreSQL 及 MongoDB 從零開始建置社群必備的按讚追蹤功能
Mu Chun Wang
 
PDF
Funliday 新創生活甘苦談
Mu Chun Wang
 
PDF
大解密!用 PostgreSQL 提升 350 倍的 Funliday 推薦景點計算速度
Mu Chun Wang
 
PDF
如何使用 iframe 製作一個易於更新及更安全的前端套件
Mu Chun Wang
 
PDF
pppr - 解決 JavaScript 無法被搜尋引擎正確索引的問題
Mu Chun Wang
 
PDF
模糊也是一種美 - 從 BlurHash 探討前後端上傳圖片架構
Mu Chun Wang
 
PDF
Google Maps 開始收費了該怎麼辦?
Mu Chun Wang
 
PDF
Git 可以做到的事
Mu Chun Wang
 
PDF
那些大家常忽略的 Cache-Control
Mu Chun Wang
 
PDF
如何利用 OpenAPI 及 WebHooks 讓老舊的網路服務也可程式化
Mu Chun Wang
 
PDF
如何與全世界分享你的 Library
Mu Chun Wang
 
PDF
如何與 Git 優雅地在樹上唱歌
Mu Chun Wang
 
PDF
API Blueprint - API 文件規範的三大領頭之一
Mu Chun Wang
 
PDF
團體共同協作與版本管理 - 01認識共同協作
Mu Chun Wang
 
PDF
Git 經驗分享
Mu Chun Wang
 
如何在有限資源下實現十年的後端服務演進
Mu Chun Wang
 
深入淺出 autocomplete
Mu Chun Wang
 
你畢業後要任職的軟體業到底都在做些什麼事
Mu Chun Wang
 
網路服務就是一連串搜尋的集合體
Mu Chun Wang
 
老司機帶你上手 PostgreSQL 關聯式資料庫系統
Mu Chun Wang
 
使用 PostgreSQL 及 MongoDB 從零開始建置社群必備的按讚追蹤功能
Mu Chun Wang
 
Funliday 新創生活甘苦談
Mu Chun Wang
 
大解密!用 PostgreSQL 提升 350 倍的 Funliday 推薦景點計算速度
Mu Chun Wang
 
如何使用 iframe 製作一個易於更新及更安全的前端套件
Mu Chun Wang
 
pppr - 解決 JavaScript 無法被搜尋引擎正確索引的問題
Mu Chun Wang
 
模糊也是一種美 - 從 BlurHash 探討前後端上傳圖片架構
Mu Chun Wang
 
Google Maps 開始收費了該怎麼辦?
Mu Chun Wang
 
Git 可以做到的事
Mu Chun Wang
 
那些大家常忽略的 Cache-Control
Mu Chun Wang
 
如何利用 OpenAPI 及 WebHooks 讓老舊的網路服務也可程式化
Mu Chun Wang
 
如何與全世界分享你的 Library
Mu Chun Wang
 
如何與 Git 優雅地在樹上唱歌
Mu Chun Wang
 
API Blueprint - API 文件規範的三大領頭之一
Mu Chun Wang
 
團體共同協作與版本管理 - 01認識共同協作
Mu Chun Wang
 
Git 經驗分享
Mu Chun Wang
 

Recently uploaded (20)

PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
July Patch Tuesday
Ivanti
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
July Patch Tuesday
Ivanti
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Biography of Daniel Podor.pdf
Daniel Podor
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 

Parsing XML Data

  • 2. Sample XML <?xml version="1.0" encoding="utf-8"?> <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE>10.90</PRICE> <YEAR>1985</YEAR> </CD> <CD> <TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST> <COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY> <PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD> </CATALOG> 2
  • 4. 4
  • 5. SAX sample (1/2) SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser sp = factory.newSAXParser(); XMLReader xr = sp.getXMLReader(); InputSource is = new InputSource(new StringReader(xml.toString())); xr.setContentHandler(handler); xr.setErrorHandler(handler); xr.parse(is); 5
  • 6. SAX sample (2/2) private DefaultHandler handler = new DefaultHandler() { private boolean hasTitle; @Override public void characters(char[] ch, int start, int length) { if (hasTitle) { txtResult.setText(new String(ch, start, length)); } } @Override public void endElement(String uri, String lName, String qName) { hasTitle = false; } @Override public void startElement(String uri, String lName, String qName, Attributes attrs) { hasTitle = lName.equals("TITLE"); 6 } };
  • 7. XML Pull sample XmlPullParserFactory factory = XmlPullParserFactory.newInstance(); XmlPullParser xpp = factory.newPullParser(); xpp.setInput(new StringReader(xml.toString())); int eventType = xpp.getEventType(); while (eventType != XmlPullParser.END_DOCUMENT) { switch (eventType) { case XmlPullParser.START_TAG: if (xpp.getName().equals("TITLE")) { txtResult.setText(xpp.nextText()); } break; } eventType = xpp.next(); 7 }
  • 9. 9
  • 10. W3C DOM example DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(new ByteArrayInputStream(xml.toString().getBytes())); Element root = doc.getDocumentElement(); NodeList cds = root.getChildNodes(); for (int i = 0; i < cds.getLength(); i++) { Node cd = cds.item(i); if (cd.getNodeType() == Node.ELEMENT_NODE) { NodeList titles = cd.getChildNodes(); for (int j = 0; j < titles.getLength(); j++) { Node title = titles.item(j); if (title.getNodeType() == Node.ELEMENT_NODE && title.getNodeName().equals("TITLE")) { txtResult.setText(title.getFirstChild().getNodeValue()); } } } 10 }
  • 11. JDOM example SAXBuilder sax = new SAXBuilder(); Document doc = sax.build(new StringReader(xml.toString())); Element root = doc.getRootElement(); for (Element elem : root.getChildren("CD")) { txtResult.setText(elem.getChildText("TITLE")); } inner Structure: SAX parser outer Structure: DOM operation 11
  • 12. Jsoup example Document doc = Jsoup.parse(xml.toString()); Elements titles = doc.select("TITLE"); for (Element elem : titles) { txtResult.setText(elem.text()); } v1.6.2(2012/3/27): add XML parser support selector syntax 12
  • 13. Charts 13
  • 14. XML parsing speed JSOUP JDOM Method W3C XMLPULL SAX 0 100 200 300 400 500 600 700 14 Milliseconds
  • 15. Code Line JSOUP JDOM Method W3C XMLPULL SAX 0 10 20 30 40 50 60 15 Code Line
  • 16. Which one? SAX vs. DOM 16
  • 17. Which one? Memory Speed Parser Modify Traversing SAX Small Fast Event Can't One-way DOM Large Slow Tree model Can Any-way 17
  • 18. References ● Simple API for XML ● Document Object Model ● What is the "Thing" called XML? ● SAX & DOM parsers ● 解析XML三种方式(PULL、SAX、DOM) ● [分享] 處理簡單的 XML 文件 18