Cluj Napoca, 28 August 2008
2008 IEEE International Conference on Intelligent Computer Communication and Processing
Digital Libraries Workshop
Towards a GRID-Based Digital Library
Management System.
Gheorghe Sebestyén-Pál1
, Doina Banciu2
, Tünde Bálint1
,
Bogdan Moscaiuc1
, and Ágnes Sebestyén-Pál1
1- Technical University of Cluj-Napoca
2 - ICI Bucharest
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Content
 Classical vs. Digital Libraries
 Recent research on Digital Libraries (DL)
 Main issues and requirements for DLs
 An ontology-based DL model
 Grid-enabled DL
 Implementation considerations of a pilot DL
 Experiments
 Conclusions
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Classical vs. Digital Libraries
 Classical library
 a repository of knowledge organized mainly on
paper
 Digital library
 Not only a digitized version of a classical library
 A new set of functionalities and services are added (e.g.
access control, resources management and allocation,
complex search and processing services, etc.)
 A data exchange and cooperation environment
 DLs are becoming digital content management systems
 Incorporates a wide variety of formats and data types ( text,
audio, video, multi-document complex digital objects)
 Uses a variety of communication and data-exchange
protocols and standards
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
IT and Communication technologies involved in
the implementation of digital libraries
http://mapageweb.umontreal.ca/turner/meta/english/metamap.html
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Goals for modern DLs
 DELOS project’s vision –
 “to enable any person to access all human knowledge
anytime and anywhere, in a friendly, multi-modal,
efficient, and effective way, by overcoming barriers of
distance, language, and culture and by using multiple
Internet-connected devices”
 DL - a knowledge repository and an information
exchange infrastructure that allows:
 data generation,
 processing and
 seamless access to relevant information, regardless of the
geographic distribution of hardware resources, databases
or persons.
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Research in digital libraries
 Delos Network of Excellence –
 Goals: to define and implement digital libraries on new computing and
communication technologies
 Achievements: definition of functional and architectural
requirements for DL implementation
 BRICKS project
 Goals: to design a user and service-oriented space to share
knowledge and resources in a multi-cultural heritage.
 Achievements:
 Definition of a digital library architecture for a very broad and
heterogeneous user community; automatic indexing and annotation
functionalities
 OpenDlib project
 Goal: development of a software toolkit for dedicated DLs generation
 Achievements: tools for content harvesting form existing resources
 Fedora, DSpace – open source software for DLs
 Lucene – open source Search engines
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Research in digital libraries (cont.)
 Diligent project (part of EGEE project)
 Goal: the use of GRID infrastructure for DL implementation
 Achievements: a new vision about the DL concept:
 DL = a dynamic digital content repository and management system
dedicated for a purpose (e.g. a project, an art collection, an academic
course)
 Definition of generic DL services mapped on GRID services
 DLs dedicated for different domains – with powerful processing
capabilities
 SINRED project – National Excellency project
 Goal: development of a national framework for DLs specialized on
technical sciences and research
 Achievements: evaluation of requirements, evaluation of existing
software, infrastructure development, DL model definition,
implementation of a pilot DL
 SIPADOC project – National research program
 Goal: reevaluation of the national patrimony through DLs
 Achievements: evaluation of digitizing tools
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Key issues in DL implementation
 Architectural issues:
 distributed nature of storage, processing and access resources
 Scalability, flexibility, interoperability
 Functional requirements:
 Core functions: storage, indexing and annotation, data-search, content
retrieval, users management
 Content organization should reflect semantic connections
 Processing facilities
 Data processing services – specialized for different fields
 Pattern search and recognition
 QoS issues
 Restricted time to obtain relevant information
 Reasonable time for complex data processing
 User and access control management
 Virtual organizations
 Role-based access
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
DL = Essence & Metadata Management
Text
Audio
Video
Text
Digital content
generation and
harvesting
Management of
essence
Automatic feature
(metadata) extraction
Metadata
Management
Cataloging, indexing,
annotation
Access and
visualization
Cataloging
information system
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
An ontology-based
Digital Library approach
 Ontology: concepts and relations together with a
reasoning engine
 Ontology for technical and scientific domains
 Main concepts:
 Digital objects:
 association of content, metadata and
procedures
 Examples: articles, technical reports,
prospects, PhD Thesis, patents
 Digital collections
 Set of digital objects structured for a
given goal/purpose of based on a
given criterion
 Examples: articles of an author,
documents of a domain
 Events
 Conferences, workshops, seminars
 Processes
 Projects
 Courses
 Virtual organizations
 Roles
 users
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Grid-enabled digital library services
 Why DLs on GRID infrastructure?
 Huge volume of documents/digital objects
 Concurrent access and multiple search engines (see
Google)
 Multimedia streaming
 Automatic indexing and annotation
 Complex processing requires prohibitive time
 User management through virtual organizations
 Job distribution facilities offered by GRID
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
DL functions mapped on GRID services
Computing, storage and communication resources
Digital Library
GRID Services
Collections
management
Catalog and
metadata
management
Digital objects
management
Users’
management
Data
visualization
Virtual
organizations
management
Resource
management
Task
distribution
Processing
Data distribution
and replication
Data processing
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Experiments
 Two approaches:
 DL implementation on Alchemi GRID (Microsoft)
 Job distribution at thread level
 Explicit GRID programming
 Experiments with multimedia streaming (multimedia content
distribution)
 DL implementation on Condor GRID (Open source)
 Job distribution at task level
 Job and data distribution is transparent to the DL application
( distribution is made through separate scripts)
 Experiments with “key-word search” in the whole DL content
 The execution time decreased with the number of executor
computers
 For more than 5 executors the scheduling and communication
time is comparable with the execution time
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
A pilot implementation of a Digital library
framework developed with GRID support
 Goal: implementation of a digital content storage and retrieval
system dedicated for educational and scientific activities (courses,
projects, etc.)
 Main requirements:
 A DL adaptable for a given purpose/goal
 Access controlled and restricted with virtual organizations
 Ontology-based approach (concepts, relations, semantic search)
 Advanced search procedures
 GRID-enabled full-text search services – for better reaction time
 Access through Internet browsers
 The result:
 A distributed digital library application, which allows:
 Management of digital objects (upload, storage, indexing, metadata
creation
 Management of collections
 Management of users and virtual organizations
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Pilot DL details:
(www.bib-dig.utcluj.ro)
 Management of digital objects
 Digital Documents’ upload,
 Annotation, metadata generation according with
Dublin Core
 Distributed Storage of data
 Management of collections
 Define a new collection
 Attach new documents to an existing collection
 Associate access rights to a collection
 Management of users and virtual organizations
 Define new users and new virtual organizations
 Define roles
 Associate roles to users and collections
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Snapshots of the DL application’s interface
bib-dig.utcluj.ro
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Snapshots of the DL application’s interface
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Search techniques in DLs
 through key-word or index search:
 Database techniques
 through semantic Information
Retrieval:
 Semantic graph with documents
and concepts
 through non-semantic Information
Retrieval:
 Naive Bayes Algorithm
 Probabilistic approach
 Based on probabilistic
similarity between documents
 Topic-Based Vector Space
Model Algorithm
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Experimental results
Execution time v. s. number of executor nodes
0
1000
2000
3000
4000
5000
6000
7000
8000
1 2 3 4 5
Nodes
Time(s)
Search execution time
Scheduling and
communication time
(case 1)
Scheduling and
communication time
(case 2)
Total time (case1)
Total time (case2)
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Experiments
Debrecen, 3-5 September 2008, DAPSYS’08
7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Conclusions
 DLs are complex content management systems that extend the functionalities of
classical libraries:
 Semantic organization of a wide variety of information formats
 Multiple search and data retrieval techniques (including full-text and
semantic search):
 Key-word full-text search
 Semantic search
 Statistical and probabilistic retrieval and classification
 Access control to distributed and remote data
 DLs are Data exchange and cooperation environments
 Useful for remote and cooperative work
 DLs must include powerful search and data retrieval engines
 GRID infrastructures may be a feasible support in the implementation of DLs
 For more efficient parallel search, classification or automatic annotation
Cluj Napoca, 28 August 2008
2008 IEEE International Conference on Intelligent Computer Communication and Processing
Digital Libraries Workshop
Thank you for your
attention
Questions ?

More Related Content

PPT
Digital Libraries
PPTX
Toward universal information access on the digital object cloud
PPT
Digital library and metadata
PPT
Digital Libray
PDF
08 chapter 03
PPT
Aksum University digital libraries
PDF
Digital Library Initiatives in India : An Overview
PPT
Metadata and Scotland’s information environment: potential benefits of Web 2.0
Digital Libraries
Toward universal information access on the digital object cloud
Digital library and metadata
Digital Libray
08 chapter 03
Aksum University digital libraries
Digital Library Initiatives in India : An Overview
Metadata and Scotland’s information environment: potential benefits of Web 2.0

What's hot (18)

PPT
Dlindia
PPTX
User Focused Digital Library: A Practical Guide
PPT
Hartley Presentation on Cataloging & Metadata Trends
ODP
New challenges for digital scholarship and curation in the era of ubiquitous ...
PPTX
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
PPT
Digital library presentation
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
PPT
Digital library
PDF
Introduction to Digital libraries
PPTX
DIGITAL LIBRARY ARCHITECTURE
PPTX
Digital library
PPT
Website designing company_in_delhi_digitization practices
PDF
The open semantic enterprise enterprise data meets web data
PPT
Digital Library Initiatives in Philippine Academic Libraries: the Rizal Libra...
PPTX
Digital library technologies
PDF
Digital Libraries and the quest for information curation
Dlindia
User Focused Digital Library: A Practical Guide
Hartley Presentation on Cataloging & Metadata Trends
New challenges for digital scholarship and curation in the era of ubiquitous ...
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
Digital library presentation
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Digital library
Introduction to Digital libraries
DIGITAL LIBRARY ARCHITECTURE
Digital library
Website designing company_in_delhi_digitization practices
The open semantic enterprise enterprise data meets web data
Digital Library Initiatives in Philippine Academic Libraries: the Rizal Libra...
Digital library technologies
Digital Libraries and the quest for information curation
Ad

Viewers also liked (20)

PPTX
PPTX
A Strategic Approach to User Experience
PDF
Paper presentation at International Conference on Unmanned Aircraft Systems 2013
PDF
Experts’ Perspective on Education Management in Electronic Media Age for Dev...
PDF
NET-EUCEN Workshop - “User-centricity in the Educational and Inclusiveness do...
PDF
ODS paper presentation at the 3rd International Conference on e Learning, Bel...
PDF
Structured Forests for Fast Edge Detection [Paper Presentation]
PPTX
G9 Human Evolution
PPT
553 what are digital libraries
PPTX
PPTX
DOCX
Research methidology
PPTX
Unit III
PPTX
PPTX
How You Feel When You Get A Crush
PPTX
research methodology
PPTX
Unit VIII
PDF
07 04-2015 russia power point presentation
PPTX
MOBILE NUMBER PORTABILITY--MNP
PPTX
Best Paper winning PPT
A Strategic Approach to User Experience
Paper presentation at International Conference on Unmanned Aircraft Systems 2013
Experts’ Perspective on Education Management in Electronic Media Age for Dev...
NET-EUCEN Workshop - “User-centricity in the Educational and Inclusiveness do...
ODS paper presentation at the 3rd International Conference on e Learning, Bel...
Structured Forests for Fast Edge Detection [Paper Presentation]
G9 Human Evolution
553 what are digital libraries
Research methidology
Unit III
How You Feel When You Get A Crush
research methodology
Unit VIII
07 04-2015 russia power point presentation
MOBILE NUMBER PORTABILITY--MNP
Best Paper winning PPT
Ad

Similar to Dapsys08 dl on_grid (20)

PPT
Aggregation as Tactic
PPT
Aggregation as tactic sm new
PPT
Digital Libraries
PPT
Consortium on Digitization of Indian Agricultural Library Resources
PPT
Building Heterogeneous Networks of Digital Libraries on the Semantic Web
PPT
Digital Libraries of the Future: Use of Semantic Web and Social Bookmarking t...
PPT
Intro to Digitization Projects
PPT
Digital Libraries of the Future
PPT
Developments in Access to Art Information: EnCompass Digital Portal. 2003
PDF
Project management report-on Digital Libraries
PPT
20080903arsenalsofnemesis 04
PPT
JeromeDL Tutorial
PPT
Edinburgh DataShare - DSpace for Data
PDF
CLARIAH Toogdag 2018: A distributed network of digital heritage information
PPT
A Cultural Heritage Repository as Source for Learning Materials
PPT
Geo-annotations in Semantic Digital Libraries
PPTX
Decentralised identifiers and knowledge graphs
 
PPT
Open Archives Initiative Object Reuse and Exchange
DOC
Semantic web
Aggregation as Tactic
Aggregation as tactic sm new
Digital Libraries
Consortium on Digitization of Indian Agricultural Library Resources
Building Heterogeneous Networks of Digital Libraries on the Semantic Web
Digital Libraries of the Future: Use of Semantic Web and Social Bookmarking t...
Intro to Digitization Projects
Digital Libraries of the Future
Developments in Access to Art Information: EnCompass Digital Portal. 2003
Project management report-on Digital Libraries
20080903arsenalsofnemesis 04
JeromeDL Tutorial
Edinburgh DataShare - DSpace for Data
CLARIAH Toogdag 2018: A distributed network of digital heritage information
A Cultural Heritage Repository as Source for Learning Materials
Geo-annotations in Semantic Digital Libraries
Decentralised identifiers and knowledge graphs
 
Open Archives Initiative Object Reuse and Exchange
Semantic web

More from madhuvardhan (20)

PPT
Mdld show-all
DOCX
Dspace madhu s
PPT
Ecdl2004
PPT
Class 5-introto dl
PPT
Dapsys08 dl on_grid
DOCX
E learning
DOCX
ugc net
DOCX
Kumra (1)
DOCX
Print net
PPTX
Binary true ppt
DOCX
Open access
DOCX
Style manual assingment (1)
DOCX
lib notes
DOCX
Open access (1)
DOC
Mc computer glossary new
DOCX
Binding standards ms
DOCX
PPT
Class 5-introto dl
PPT
553 what are digital libraries
DOC
Researchmethodologynotes
Mdld show-all
Dspace madhu s
Ecdl2004
Class 5-introto dl
Dapsys08 dl on_grid
E learning
ugc net
Kumra (1)
Print net
Binary true ppt
Open access
Style manual assingment (1)
lib notes
Open access (1)
Mc computer glossary new
Binding standards ms
Class 5-introto dl
553 what are digital libraries
Researchmethodologynotes

Recently uploaded (20)

PDF
HQ #118 / 'Building Resilience While Climbing the Event Mountain
DOCX
Handbook of entrepreneurship- Chapter 7- Types of business organisations
PPTX
Supply Chain under WAR (Managing Supply Chain Amid Political Conflict).pptx
PDF
the role of manager in strategic alliances
PDF
Challenges of Managing International Schools (www.kiu. ac.ug)
PDF
Pink Cute Simple Group Project Presentation.pdf
PPTX
Oracle Cloud Infrastructure Overview July 2020 v2_EN20200717.pptx
DOCX
Center Enamel Can Provide Pressure Vessels for Maldives Chemical Industry.docx
PPTX
Accounting Management SystemBatch-4.pptx
PPTX
Side hustles: 14 powerful tips to embrace the future of work
PDF
Nante Industrial Plug Socket Connector Sustainability Insights
PDF
Chembond Chemicals Limited Presentation 2025
PPTX
IMM marketing mix of four ps give fjcb jjb
PPT
Retail Management and Retail Markets and Concepts
PPT
BCG内部幻灯片撰写. slide template BCG.slide template
PPTX
Biomass_Energy_PPT_FIN AL________________.pptx
PDF
From Legacy to Velocity: how we rebuilt everything in 8 months.
PPTX
Cơ sở hạ tầng công nghệ thông tin trong thời đại kỹ thuật số
PDF
The Influence of Historical Figures on Legal Communication (www.kiu.ac.ug)
PDF
The Impact of Immigration on National Identity (www.kiu.ac.ug)
HQ #118 / 'Building Resilience While Climbing the Event Mountain
Handbook of entrepreneurship- Chapter 7- Types of business organisations
Supply Chain under WAR (Managing Supply Chain Amid Political Conflict).pptx
the role of manager in strategic alliances
Challenges of Managing International Schools (www.kiu. ac.ug)
Pink Cute Simple Group Project Presentation.pdf
Oracle Cloud Infrastructure Overview July 2020 v2_EN20200717.pptx
Center Enamel Can Provide Pressure Vessels for Maldives Chemical Industry.docx
Accounting Management SystemBatch-4.pptx
Side hustles: 14 powerful tips to embrace the future of work
Nante Industrial Plug Socket Connector Sustainability Insights
Chembond Chemicals Limited Presentation 2025
IMM marketing mix of four ps give fjcb jjb
Retail Management and Retail Markets and Concepts
BCG内部幻灯片撰写. slide template BCG.slide template
Biomass_Energy_PPT_FIN AL________________.pptx
From Legacy to Velocity: how we rebuilt everything in 8 months.
Cơ sở hạ tầng công nghệ thông tin trong thời đại kỹ thuật số
The Influence of Historical Figures on Legal Communication (www.kiu.ac.ug)
The Impact of Immigration on National Identity (www.kiu.ac.ug)

Dapsys08 dl on_grid

  • 1. Cluj Napoca, 28 August 2008 2008 IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards a GRID-Based Digital Library Management System. Gheorghe Sebestyén-Pál1 , Doina Banciu2 , Tünde Bálint1 , Bogdan Moscaiuc1 , and Ágnes Sebestyén-Pál1 1- Technical University of Cluj-Napoca 2 - ICI Bucharest
  • 2. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Content  Classical vs. Digital Libraries  Recent research on Digital Libraries (DL)  Main issues and requirements for DLs  An ontology-based DL model  Grid-enabled DL  Implementation considerations of a pilot DL  Experiments  Conclusions
  • 3. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Classical vs. Digital Libraries  Classical library  a repository of knowledge organized mainly on paper  Digital library  Not only a digitized version of a classical library  A new set of functionalities and services are added (e.g. access control, resources management and allocation, complex search and processing services, etc.)  A data exchange and cooperation environment  DLs are becoming digital content management systems  Incorporates a wide variety of formats and data types ( text, audio, video, multi-document complex digital objects)  Uses a variety of communication and data-exchange protocols and standards
  • 4. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS IT and Communication technologies involved in the implementation of digital libraries http://mapageweb.umontreal.ca/turner/meta/english/metamap.html
  • 5. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Goals for modern DLs  DELOS project’s vision –  “to enable any person to access all human knowledge anytime and anywhere, in a friendly, multi-modal, efficient, and effective way, by overcoming barriers of distance, language, and culture and by using multiple Internet-connected devices”  DL - a knowledge repository and an information exchange infrastructure that allows:  data generation,  processing and  seamless access to relevant information, regardless of the geographic distribution of hardware resources, databases or persons.
  • 6. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Research in digital libraries  Delos Network of Excellence –  Goals: to define and implement digital libraries on new computing and communication technologies  Achievements: definition of functional and architectural requirements for DL implementation  BRICKS project  Goals: to design a user and service-oriented space to share knowledge and resources in a multi-cultural heritage.  Achievements:  Definition of a digital library architecture for a very broad and heterogeneous user community; automatic indexing and annotation functionalities  OpenDlib project  Goal: development of a software toolkit for dedicated DLs generation  Achievements: tools for content harvesting form existing resources  Fedora, DSpace – open source software for DLs  Lucene – open source Search engines
  • 7. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Research in digital libraries (cont.)  Diligent project (part of EGEE project)  Goal: the use of GRID infrastructure for DL implementation  Achievements: a new vision about the DL concept:  DL = a dynamic digital content repository and management system dedicated for a purpose (e.g. a project, an art collection, an academic course)  Definition of generic DL services mapped on GRID services  DLs dedicated for different domains – with powerful processing capabilities  SINRED project – National Excellency project  Goal: development of a national framework for DLs specialized on technical sciences and research  Achievements: evaluation of requirements, evaluation of existing software, infrastructure development, DL model definition, implementation of a pilot DL  SIPADOC project – National research program  Goal: reevaluation of the national patrimony through DLs  Achievements: evaluation of digitizing tools
  • 8. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Key issues in DL implementation  Architectural issues:  distributed nature of storage, processing and access resources  Scalability, flexibility, interoperability  Functional requirements:  Core functions: storage, indexing and annotation, data-search, content retrieval, users management  Content organization should reflect semantic connections  Processing facilities  Data processing services – specialized for different fields  Pattern search and recognition  QoS issues  Restricted time to obtain relevant information  Reasonable time for complex data processing  User and access control management  Virtual organizations  Role-based access
  • 9. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS DL = Essence & Metadata Management Text Audio Video Text Digital content generation and harvesting Management of essence Automatic feature (metadata) extraction Metadata Management Cataloging, indexing, annotation Access and visualization Cataloging information system
  • 10. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS An ontology-based Digital Library approach  Ontology: concepts and relations together with a reasoning engine  Ontology for technical and scientific domains  Main concepts:  Digital objects:  association of content, metadata and procedures  Examples: articles, technical reports, prospects, PhD Thesis, patents  Digital collections  Set of digital objects structured for a given goal/purpose of based on a given criterion  Examples: articles of an author, documents of a domain  Events  Conferences, workshops, seminars  Processes  Projects  Courses  Virtual organizations  Roles  users
  • 11. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Grid-enabled digital library services  Why DLs on GRID infrastructure?  Huge volume of documents/digital objects  Concurrent access and multiple search engines (see Google)  Multimedia streaming  Automatic indexing and annotation  Complex processing requires prohibitive time  User management through virtual organizations  Job distribution facilities offered by GRID
  • 12. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS DL functions mapped on GRID services Computing, storage and communication resources Digital Library GRID Services Collections management Catalog and metadata management Digital objects management Users’ management Data visualization Virtual organizations management Resource management Task distribution Processing Data distribution and replication Data processing
  • 13. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Experiments  Two approaches:  DL implementation on Alchemi GRID (Microsoft)  Job distribution at thread level  Explicit GRID programming  Experiments with multimedia streaming (multimedia content distribution)  DL implementation on Condor GRID (Open source)  Job distribution at task level  Job and data distribution is transparent to the DL application ( distribution is made through separate scripts)  Experiments with “key-word search” in the whole DL content  The execution time decreased with the number of executor computers  For more than 5 executors the scheduling and communication time is comparable with the execution time
  • 14. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS A pilot implementation of a Digital library framework developed with GRID support  Goal: implementation of a digital content storage and retrieval system dedicated for educational and scientific activities (courses, projects, etc.)  Main requirements:  A DL adaptable for a given purpose/goal  Access controlled and restricted with virtual organizations  Ontology-based approach (concepts, relations, semantic search)  Advanced search procedures  GRID-enabled full-text search services – for better reaction time  Access through Internet browsers  The result:  A distributed digital library application, which allows:  Management of digital objects (upload, storage, indexing, metadata creation  Management of collections  Management of users and virtual organizations
  • 15. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Pilot DL details: (www.bib-dig.utcluj.ro)  Management of digital objects  Digital Documents’ upload,  Annotation, metadata generation according with Dublin Core  Distributed Storage of data  Management of collections  Define a new collection  Attach new documents to an existing collection  Associate access rights to a collection  Management of users and virtual organizations  Define new users and new virtual organizations  Define roles  Associate roles to users and collections
  • 16. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Snapshots of the DL application’s interface bib-dig.utcluj.ro
  • 17. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Snapshots of the DL application’s interface
  • 18. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Search techniques in DLs  through key-word or index search:  Database techniques  through semantic Information Retrieval:  Semantic graph with documents and concepts  through non-semantic Information Retrieval:  Naive Bayes Algorithm  Probabilistic approach  Based on probabilistic similarity between documents  Topic-Based Vector Space Model Algorithm
  • 19. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Experimental results Execution time v. s. number of executor nodes 0 1000 2000 3000 4000 5000 6000 7000 8000 1 2 3 4 5 Nodes Time(s) Search execution time Scheduling and communication time (case 1) Scheduling and communication time (case 2) Total time (case1) Total time (case2)
  • 20. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Experiments
  • 21. Debrecen, 3-5 September 2008, DAPSYS’08 7th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS Conclusions  DLs are complex content management systems that extend the functionalities of classical libraries:  Semantic organization of a wide variety of information formats  Multiple search and data retrieval techniques (including full-text and semantic search):  Key-word full-text search  Semantic search  Statistical and probabilistic retrieval and classification  Access control to distributed and remote data  DLs are Data exchange and cooperation environments  Useful for remote and cooperative work  DLs must include powerful search and data retrieval engines  GRID infrastructures may be a feasible support in the implementation of DLs  For more efficient parallel search, classification or automatic annotation
  • 22. Cluj Napoca, 28 August 2008 2008 IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Thank you for your attention Questions ?