Research experience
&
Scientific publications
Felipe Oliveira Gutierrez
felipe.o.gutierrez@gmail.com
February 1th, 2018
Summary
1. uStorage architecture
Coupling of two storage models (blocks and objects)
2. FAIR RDM system
Findable, Accessible, Interoperable, and Reusable
Research Data Management system
3. Workflow to support Volunteer and Scalable computing
Boinc and Hadoop
4. Personal projects
2
Feb 1th, 2018
3
Feb 1th, 2018
1 - uStorage architecture. 2014/2016
Block Storage
● raw block
● same size
● no metadata
● scalability limited
● ID to local system
● good performance OS mount
point
● needs HA for prepare to fail
● high performance hardware
● reduced availability
● random RW loads
● structured data sets 4
Object Storage
● buckets
● different size
● expandable metadata
● very scalable
● global unique ID on DS
● bad performance OS mount
point
● already prepared to fail
● hardware flexibility
● high availability
● generally read
● unstructured data sets
Feb 1th, 2018
uStorage architecture
uStorage architecture
5
Feb 1th, 2018
uStorage architecture
6
Feb 1th, 2018
7
Block Storage Object Storage
Feb 1th, 2018
uStorage architecture
8
Cache LRU algorithm
Feb 1th, 2018
uStorage architecture
uStorage architecture
9
uStorage
cache
cache free space
Eviction Clean or remove buckets
from the cache to open
space for new buckets.
Cache miss or
Cache hit fails
Restore or retrieve buckets
from the cloud that are not
present on the cache.
Cache hit successful Bucket found on the cache
Feb 1th, 2018
Equal size => Cache: 20 GiB = Files: 20 GiB
10
Feb 1th, 2018
uStorage architecture
4x size => Cache: 5 GiB < Files: 20 GiB
11
Feb 1th, 2018
uStorage architecture
12
4xsize=>Cache:5GiB<
Files:20GiB
63K
23K
Feb 1th, 2018
2 - FAIR RDM system. 2016/2017
● Problem
○ Inadequate RDM solution for NGS data
○ Individual storage and backup
○ Files and metadata decoupled
○ No exchange (meta)data
○ Not FAIR (Findable, Accessible, Interoperable, Reusable)
13
Feb 1th, 2018
14
FAIR RDM system
Feb 1th, 2018
15
FAIR RDM system
Feb 1th, 2018
FAIR RDM system
● FAIR - Findable, Accessible, Interoperable, and Reusable
● Distributed and Scalable
● Easy to use
● Manageable by the research organization
○ Security
○ Privacy of (meta)data (different access to different users and groups)
○ Variety of user roles (PI, Data steward, researcher, administrator)
○ Variety of data type (e.g.Next Generation Sequence, Metabolomics)
16
Feb 1th, 2018
17
FAIR RDM system
Feb 1th, 2018
18
FAIR RDM system
Feb 1th, 2018
19
FAIR RDM system
Feb 1th, 2018
20
FAIR RDM system
Feb 1th, 2018
21
STANDARD steps
STANDARD steps
SPECIFIC steps
FAIR RDM system
22
STANDARD
SPECIFIC
STANDARD
HEAD
3 - mc2
- My sCientific Cloud. 2012/2013
● Distributed and scalable environment for bioinformatic applications
○ CSGrid
● Availability through different eScience workflow environments
● Different processing models to deploy applications
○ Cluster MPI
○ Volunteer processing
○ MapReduce
23
Feb 1th, 2018
24
Feb 1th, 2018
mc2
- My sCientific Cloud
25
Feb 1th, 2018
mc2
- My sCientific Cloud
26
Feb 1th, 2018
mc2
- My sCientific Cloud
27
Feb 1th, 2018
4 - Personal projects
https://felipeogutierrez.blogspot.com & https://github.com/felipegutierrez
#FileSystems#Storage
Scientific publications
● Gutierrez F.O. et al. uStorage - A Storage Architecture to Provide Block-Level Storage Through
Object-Based Storage. Service-Oriented and Cloud Computing Conference. ESOCC 2017.
Computer Science, Springer - https://doi.org/10.1007/978-3-319-67262-5_16
● Felipe Gutierrez, et al. - FAIR Sequencing Data Repository based on iRODS - 9th iRODS UGM
2017 - Netherlands - https://goo.gl/wtBwGS
● Olabarriaga, S; Gutierrez, F.O. et al. FAIR Next Generation Sequencing Data Repository based on
iRODS. BioSB 2017: Dutch Bioinformatics & Systems Biology - Netherlands.
● GUTIERREZ, F. O. et al. Support for bioinformatics applications through volunteer and scalable
computing frameworks. Second International Workshop on Parallelism in Bioinformatics, 2014,
Madri. https://doi.org/10.1109/CLUSTER.2014.6968780
● GUTIERREZ, F. O. et al. . Providing volunteer computing at the infrastructure level to support
e-science applications. VII Brazilian e-Science Workshop 2013, https://goo.gl/Wam9Ui
● GUTIERREZ, F. O. et al. Provisão de computação intensiva de dados para suporte a aplicações
científicas. II Escola Regional de Alto Desempenho, 2013, Salvador.
28
Feb 1th, 2018
Felipe Oliveira Gutierrez
felipe.o.gutierrez@gmail.com
February 1th, 2018

More Related Content

PPTX
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
PDF
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
PDF
Introduction to Open Data and Data Science
PDF
Blockchain at internet scale
PPTX
SC1 Workshop 2 Technical overview
PPTX
NoSQL Databases
PPTX
«NoSQL Databases and Polyglot Persistence»
PDF
NoSQL Databases
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Introduction to Open Data and Data Science
Blockchain at internet scale
SC1 Workshop 2 Technical overview
NoSQL Databases
«NoSQL Databases and Polyglot Persistence»
NoSQL Databases

What's hot (20)

PDF
Big data converted
ODP
Open Source Business Intelligence Overview
PPTX
Release webinar: Sansa and Ontario
PDF
PPTX
Exploring MongoDB & Elasticsearch: Better Together
PPTX
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
PPTX
Introduction
ODP
Big Data - How important it is
PDF
Rails with MongoDB
PPTX
CRM - Data Collection, Storage and Acces.
PDF
Analytics using r programming
PDF
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
PPTX
BDE-BDVA Webinar: BDE Technical Overview
PPTX
Introduction to Redis Data Structures
PPTX
Advanced Databases: Introduction to NoSQL, Big Data and Google's Big Table
PDF
Honey on the Wire KohaCon18
PPTX
Visualizing Austin's data with Elasticsearch and Kibana
PPTX
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
PPTX
B4UConference_Design Big Data System
PPTX
Platform introduction & Summary
Big data converted
Open Source Business Intelligence Overview
Release webinar: Sansa and Ontario
Exploring MongoDB & Elasticsearch: Better Together
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Introduction
Big Data - How important it is
Rails with MongoDB
CRM - Data Collection, Storage and Acces.
Analytics using r programming
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
BDE-BDVA Webinar: BDE Technical Overview
Introduction to Redis Data Structures
Advanced Databases: Introduction to NoSQL, Big Data and Google's Big Table
Honey on the Wire KohaCon18
Visualizing Austin's data with Elasticsearch and Kibana
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
B4UConference_Design Big Data System
Platform introduction & Summary
Ad

Similar to Research experience and scientific publications (20)

ODP
Clouds, Grids and Data
PPTX
My Other Computer is a Data Center: The Sector Perspective on Big Data
PPTX
Utilising Cloud Computing for Research through Infrastructure, Software and D...
PPT
Bhupeshbansal bigdata
PPTX
Cloud storage
PPTX
The Big Data Stack
PPTX
Architecture and Standards
PDF
Managing Big Data: An Introduction to Data Intensive Computing
PPTX
Paving the way to open and interoperable research data service workflows
PPT
Computing Outside The Box June 2009
PPT
Computing Outside The Box September 2009
PDF
AN AUTOMATED APPROACH TO CLOUD STORAGE SERVICE SELECTION.pdf
PPT
SQL or NoSQL, that is the question!
PPTX
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
PPTX
Green Shoots: Research Data Management Pilot at Imperial College London
PDF
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
PDF
The Rise of Cloud Computing Systems
PPT
Google Cloud Computing on Google Developer 2008 Day
PDF
Developing institutional RDM services
Clouds, Grids and Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Bhupeshbansal bigdata
Cloud storage
The Big Data Stack
Architecture and Standards
Managing Big Data: An Introduction to Data Intensive Computing
Paving the way to open and interoperable research data service workflows
Computing Outside The Box June 2009
Computing Outside The Box September 2009
AN AUTOMATED APPROACH TO CLOUD STORAGE SERVICE SELECTION.pdf
SQL or NoSQL, that is the question!
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Green Shoots: Research Data Management Pilot at Imperial College London
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
The Rise of Cloud Computing Systems
Google Cloud Computing on Google Developer 2008 Day
Developing institutional RDM services
Ad

Recently uploaded (20)

PPTX
Chapter_05_System Modeling for software engineering
PPTX
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PDF
What Makes a Great Data Visualization Consulting Service.pdf
PPTX
HackYourBrain__UtrechtJUG__11092025.pptx
PPTX
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
PPTX
Why 2025 Is the Best Year to Hire Software Developers in India
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PDF
Cloud Native Aachen Meetup - Aug 21, 2025
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PPTX
FLIGHT TICKET API | API INTEGRATION PLATFORM
PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
PPTX
Chapter 1 - Transaction Processing and Mgt.pptx
PPTX
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
PPTX
UNIT II: Software design, software .pptx
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PPTX
AI Tools Revolutionizing Software Development Workflows
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PDF
Odoo Construction Management System by CandidRoot
PDF
IT Consulting Services to Secure Future Growth
Chapter_05_System Modeling for software engineering
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
What Makes a Great Data Visualization Consulting Service.pdf
HackYourBrain__UtrechtJUG__11092025.pptx
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
Why 2025 Is the Best Year to Hire Software Developers in India
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
Cloud Native Aachen Meetup - Aug 21, 2025
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
FLIGHT TICKET API | API INTEGRATION PLATFORM
Presentation - Summer Internship at Samatrix.io_template_2.pptx
Chapter 1 - Transaction Processing and Mgt.pptx
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
UNIT II: Software design, software .pptx
SAP Business AI_L1 Overview_EXTERNAL.pptx
AI Tools Revolutionizing Software Development Workflows
Top 10 Project Management Software for Small Teams in 2025.pdf
Odoo Construction Management System by CandidRoot
IT Consulting Services to Secure Future Growth

Research experience and scientific publications

  • 2. Summary 1. uStorage architecture Coupling of two storage models (blocks and objects) 2. FAIR RDM system Findable, Accessible, Interoperable, and Reusable Research Data Management system 3. Workflow to support Volunteer and Scalable computing Boinc and Hadoop 4. Personal projects 2 Feb 1th, 2018
  • 3. 3 Feb 1th, 2018 1 - uStorage architecture. 2014/2016
  • 4. Block Storage ● raw block ● same size ● no metadata ● scalability limited ● ID to local system ● good performance OS mount point ● needs HA for prepare to fail ● high performance hardware ● reduced availability ● random RW loads ● structured data sets 4 Object Storage ● buckets ● different size ● expandable metadata ● very scalable ● global unique ID on DS ● bad performance OS mount point ● already prepared to fail ● hardware flexibility ● high availability ● generally read ● unstructured data sets Feb 1th, 2018 uStorage architecture
  • 7. 7 Block Storage Object Storage Feb 1th, 2018 uStorage architecture
  • 8. 8 Cache LRU algorithm Feb 1th, 2018 uStorage architecture
  • 9. uStorage architecture 9 uStorage cache cache free space Eviction Clean or remove buckets from the cache to open space for new buckets. Cache miss or Cache hit fails Restore or retrieve buckets from the cloud that are not present on the cache. Cache hit successful Bucket found on the cache Feb 1th, 2018
  • 10. Equal size => Cache: 20 GiB = Files: 20 GiB 10 Feb 1th, 2018 uStorage architecture
  • 11. 4x size => Cache: 5 GiB < Files: 20 GiB 11 Feb 1th, 2018 uStorage architecture
  • 13. 2 - FAIR RDM system. 2016/2017 ● Problem ○ Inadequate RDM solution for NGS data ○ Individual storage and backup ○ Files and metadata decoupled ○ No exchange (meta)data ○ Not FAIR (Findable, Accessible, Interoperable, Reusable) 13 Feb 1th, 2018
  • 16. FAIR RDM system ● FAIR - Findable, Accessible, Interoperable, and Reusable ● Distributed and Scalable ● Easy to use ● Manageable by the research organization ○ Security ○ Privacy of (meta)data (different access to different users and groups) ○ Variety of user roles (PI, Data steward, researcher, administrator) ○ Variety of data type (e.g.Next Generation Sequence, Metabolomics) 16 Feb 1th, 2018
  • 23. 3 - mc2 - My sCientific Cloud. 2012/2013 ● Distributed and scalable environment for bioinformatic applications ○ CSGrid ● Availability through different eScience workflow environments ● Different processing models to deploy applications ○ Cluster MPI ○ Volunteer processing ○ MapReduce 23 Feb 1th, 2018
  • 24. 24 Feb 1th, 2018 mc2 - My sCientific Cloud
  • 25. 25 Feb 1th, 2018 mc2 - My sCientific Cloud
  • 26. 26 Feb 1th, 2018 mc2 - My sCientific Cloud
  • 27. 27 Feb 1th, 2018 4 - Personal projects https://felipeogutierrez.blogspot.com & https://github.com/felipegutierrez #FileSystems#Storage
  • 28. Scientific publications ● Gutierrez F.O. et al. uStorage - A Storage Architecture to Provide Block-Level Storage Through Object-Based Storage. Service-Oriented and Cloud Computing Conference. ESOCC 2017. Computer Science, Springer - https://doi.org/10.1007/978-3-319-67262-5_16 ● Felipe Gutierrez, et al. - FAIR Sequencing Data Repository based on iRODS - 9th iRODS UGM 2017 - Netherlands - https://goo.gl/wtBwGS ● Olabarriaga, S; Gutierrez, F.O. et al. FAIR Next Generation Sequencing Data Repository based on iRODS. BioSB 2017: Dutch Bioinformatics & Systems Biology - Netherlands. ● GUTIERREZ, F. O. et al. Support for bioinformatics applications through volunteer and scalable computing frameworks. Second International Workshop on Parallelism in Bioinformatics, 2014, Madri. https://doi.org/10.1109/CLUSTER.2014.6968780 ● GUTIERREZ, F. O. et al. . Providing volunteer computing at the infrastructure level to support e-science applications. VII Brazilian e-Science Workshop 2013, https://goo.gl/Wam9Ui ● GUTIERREZ, F. O. et al. Provisão de computação intensiva de dados para suporte a aplicações científicas. II Escola Regional de Alto Desempenho, 2013, Salvador. 28 Feb 1th, 2018