SlideShare a Scribd company logo
Efficient & effective
data management for research projects
ILRI's Data Management
Platform
Carlos Quiros
June, 2015
• Back in 2011
• Current status
• How we did it
• Example of a process
• CKAN
• Key decisions made
• Technology and skills required
Contents
Back in 2011
Survey design
• Too many
• Not common indicators
• <> Variables
• <> Calculations
Survey implementation
• Too many tools
• No protocols
• Poor field data
cleaning
• No standard process
Storage
• In files
• Too many formats
• Too many versions
• Messy data cleaning
• No accountability
Availability & accessibility
• Nothing
Now
Survey design
• Too many
• Common indicators
• = Variables
• = Calculations
Storage
• Server database
• No formats
• One version
• Central cleaning
• Accountability
Availability & accessibility
• CKAN
• OData
Survey implementation
• 2 tools (ODK, CSPro)
• Protocols
• Field data cleaning
• Standard process
• Standard tools
How we went around it
Storage• Server database
• How to integrate ODK and CSPro?
• How to make it easy for scientists?
• How to manage user decentralization?
• Increase accountability?
Availability and accessibility• What to use? CKAN, Dataverse, etc.
 CKAN
• How to extend it to serve our purpose?
• How to integrate it with a server database?
• How to manage our metadata and vocabularies?
• How to do this?
• Data interoperability? RDF, OData, Gdata, etc?
 OData
• How to do it?
Survey implementation• Support only two tools
• Wrote protocols
• Wrote field data cleaning applications
• Wrote policies and implementation plans
• Wrote standard processes and tools for processing the data
• Worked closely with teams
• Created a central place for all the surveys
• Separated surveys in modules
• Worked on common indicators
• Management supports this process
Survey design (ongoing)
Example of a process
Testing &
Review (.xls)
Uploaded to
Formhub to test
account
Testing &
Review
(ODK Collect)
Ok
?
Field
Deployment
Uploaded to
Formhub to
project account
Data
collection
Upload data
to Formhub
End of
Data
Collecti
on
Sharing in
Data Portal
Data Cleaning from
server using MySQL for
Excel
Detailed breakdown of ILRI’s RMD workflow with ODK
Coding
.doc  .xls
Start
Draft tool
(.doc) Consultation
Final tool
(.doc)
Who
Code
s
RMG Staff
Project Team Member
Create MySQL
schema with
ODKToMySQL
MySQL
schema in
server
Convert data to
JSON with
FormhubToJSO
N
Data in
JSON
format
Upload JSON into
MySQL Schema
with
JSONToMySQL
Metadata
for portal
Initialize META in
schema
S = Scientist input / usage
S S S
S
S
S
S
ILRI’s data portal (CKAN) – http://data.ilri.org/portal/
• CKAN?
• The Open Knowledge Foundation
• Biggest deployed data portal software
• USA data portal
• UK data portal
• EU data portal
• Open Africa
• What do you get out of the box?
• Create datasets with minimum metadata
• Name, Abstract, Author, Date
• Tags into controlled vocabulary
• Powerful search engine
• Public / private access to datasets
• Able to attach resources (files) to a dataset
• Data interoperability through powerful API and RDF
• Arrange datasets into organization and topics
• What can you do by creating extensions
• Add new vocabularies (e.g., Language, Countries, etc.)
• Add new metadata fields
• Visualize different kinds of data (e.g., maps)
• Change theme (colors, logos, fonts, etc.)
• Create data hubs by harvesting other CKANs
• What ever else you want…..
Key decisions made
• Use open source for all RDM
Pros:
• Bigger pool of tools
• Flexible
• Innovation
Cons:
• Complex skill set
• Learning curve
• Relational Database Management System (RDMS)
Pros:
• Central place
• Auditing
Cons:
• DB management skill set
• Scientist have no idea on how to work with a RDMS
• CKAN
Pros:
• There is nothing better out there
• Flexible and extendible
Cons:
• Programming in several languages is required
• Learning curve
Technology and skills required
• Server
• Linux (Ubuntu server) [Linux administration]
• http://www.ubuntu.com/download/server
• Database server
• MySQL – An open source database system [DB administration, SQL]
• http://www.mysql.com/
• Data processing software [Linux, C++, Python]
• ODK – A toolset for collecting data on mobile devices.
• https://opendatakit.org/
• CSPro – A software for creating data entry applications.
• https://www.census.gov/population/international/software/cspro/
• Formhub – A software tools that collects ODK data.
• https://github.com/SEL-Columbia/formhub
• ODK Tools – A toolbox for processing ODK survey data into MySQL databases.
• https://github.com/ilri/odktools
• META – A toolbox for managing research data in MySQL databases.
• https://github.com/ilri/meta
• CSProTools – A toolbox for processing CSPro survey data into MySQL databases.
• https://github.com/ilri/csprotools
• Data sharing and interoperability
• CKAN – The open source data portal software. [Linux, Python, WebDev]
• http://ckan.org/
• http://docs.ckan.org/en/latest/maintaining/installing/index.html
• http://docs.ckan.org/en/latest/extensions/index.html
• Odata – Allow the creation and consumption of queryable and interoperable data
resources in a simple and standard way. [Linux, Java, WebDev]
• http://www.odata.org/
Thank you
Visit us @
http://data.ilri.org/

More Related Content

PPTX
KD-2013-Optimizing-Document-Search-using-Lucene
Harshakumar Ummerpillai
 
PPTX
Lantea platform
Neuzilla
 
PPTX
R training at Aimia
Ali Arsalan Kazmi
 
PPTX
Spark - Migration Story
Roman Chukh
 
PPTX
Introduction to Conductor
Jason Gleason
 
PPTX
Presto for apps deck varada prestoconf
Ori Reshef
 
PDF
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Spark Summit
 
PDF
Sap business objects data services toc
saddagiri
 
KD-2013-Optimizing-Document-Search-using-Lucene
Harshakumar Ummerpillai
 
Lantea platform
Neuzilla
 
R training at Aimia
Ali Arsalan Kazmi
 
Spark - Migration Story
Roman Chukh
 
Introduction to Conductor
Jason Gleason
 
Presto for apps deck varada prestoconf
Ori Reshef
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Spark Summit
 
Sap business objects data services toc
saddagiri
 

What's hot (20)

PPTX
The Elastic Stack as a SIEM
John Hubbard
 
PPT
Document management #RWIRW
Alison McNab
 
PPTX
Data warehouse 11 introduction to data transformation
Vaibhav Khanna
 
PDF
Presto: Fast SQL on Everything
David Phillips
 
PPTX
Maxis Alchemize imug 2017
BrandonWilhelm4
 
PDF
ELK in Security Analytics
nullowaspmumbai
 
PDF
DataGraft Platform: RDF Database-as-a-Service
Marin Dimitrov
 
PDF
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Nilesh Shah
 
PDF
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
ODP
Are we there yet?
Johann Höchtl
 
PDF
Spark with Delta Lake
Knoldus Inc.
 
PPTX
R Then and Now
Revolution Analytics
 
PPTX
GOKb and Refine (Kuali Days 2013)
GOKb Project
 
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PPTX
The AmeriFlux Network Data Management System
Integrated Carbon Observation System (ICOS)
 
PDF
Business objects data services advanced
saddagiri
 
PPTX
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
PPTX
Using Scalding for Data Driven Product Development at LinkedIn
Sasha Ovsankin
 
PPTX
Centralizing Storage without going off the Rails
inside-BigData.com
 
PPTX
Introduction to NoSQL and MongoDB
Ahmed Farag
 
The Elastic Stack as a SIEM
John Hubbard
 
Document management #RWIRW
Alison McNab
 
Data warehouse 11 introduction to data transformation
Vaibhav Khanna
 
Presto: Fast SQL on Everything
David Phillips
 
Maxis Alchemize imug 2017
BrandonWilhelm4
 
ELK in Security Analytics
nullowaspmumbai
 
DataGraft Platform: RDF Database-as-a-Service
Marin Dimitrov
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Nilesh Shah
 
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
Are we there yet?
Johann Höchtl
 
Spark with Delta Lake
Knoldus Inc.
 
R Then and Now
Revolution Analytics
 
GOKb and Refine (Kuali Days 2013)
GOKb Project
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
The AmeriFlux Network Data Management System
Integrated Carbon Observation System (ICOS)
 
Business objects data services advanced
saddagiri
 
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
Using Scalding for Data Driven Product Development at LinkedIn
Sasha Ovsankin
 
Centralizing Storage without going off the Rails
inside-BigData.com
 
Introduction to NoSQL and MongoDB
Ahmed Farag
 
Ad

Similar to Efficient & effective data management for research projects : ILRI's Data Management Platform (20)

PDF
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Debraj GuhaThakurta
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Ellucian Live 2014 Presentation on Reporting and BI
Kent Brooks
 
PPTX
Breed data scientists_ A Presentation.pptx
GautamPopli1
 
PDF
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
 
PPT
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Victor Holman
 
PPTX
Data Stream Processing for Beginners with Kafka and CDC
Abhijit Kumar
 
PDF
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
HostedbyConfluent
 
PPTX
Data Vault Automation at the Bijenkorf
Rob Winters
 
PPTX
Data Warehouse Optimization
Cloudera, Inc.
 
PPTX
Southwickc lampert lodlam_training
ssouthwick
 
PPTX
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
PPTX
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
PDF
Fast, Flexible Application Development with Oracle Database Cloud Service
Gustavo Rene Antunez
 
PDF
ActiveMigrate - ECM Renovation Roadshow
Zia Consulting
 
PDF
Beyond DevOps: How Netflix Bridges the Gap?
C4Media
 
PDF
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems
 
PPTX
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
CTSI at UCSF
 
PDF
USG Summit - September 2014 - Web Management using Drupal
Eric Sembrat
 
PDF
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
Marek Maśko
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Debraj GuhaThakurta
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Ellucian Live 2014 Presentation on Reporting and BI
Kent Brooks
 
Breed data scientists_ A Presentation.pptx
GautamPopli1
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Victor Holman
 
Data Stream Processing for Beginners with Kafka and CDC
Abhijit Kumar
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
HostedbyConfluent
 
Data Vault Automation at the Bijenkorf
Rob Winters
 
Data Warehouse Optimization
Cloudera, Inc.
 
Southwickc lampert lodlam_training
ssouthwick
 
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Gustavo Rene Antunez
 
ActiveMigrate - ECM Renovation Roadshow
Zia Consulting
 
Beyond DevOps: How Netflix Bridges the Gap?
C4Media
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems
 
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
CTSI at UCSF
 
USG Summit - September 2014 - Web Management using Drupal
Eric Sembrat
 
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
Marek Maśko
 
Ad

More from CIARD Movement (20)

PPTX
Social Media in: Disseminating and Sharing Agriculture Data/Information
CIARD Movement
 
PDF
DSpace at ILRI : A semi-technical overview of “CGSpace”
CIARD Movement
 
PPTX
University of Nairobi, Open Access Initiatives
CIARD Movement
 
PPT
Knowledge Management at KEFRI
CIARD Movement
 
PPT
Open Research Data – the KALRO experience
CIARD Movement
 
PPTX
JKUAT Case on Open Access
CIARD Movement
 
PPTX
JKUAT Case on Open Access
CIARD Movement
 
PPTX
Open Data and Open Science in Agriculture: Management
CIARD Movement
 
PPTX
Open Access Initiatives and Challenges in Kenya: Universities
CIARD Movement
 
PPT
ICT Centre of Excellence and Open Data –iCEOD
CIARD Movement
 
PPTX
Open Data and Big Data Capacity Building Initiative
CIARD Movement
 
PPTX
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
CIARD Movement
 
PPT
Open Data and Open Science in Agriculture : Experiences and Opinions
CIARD Movement
 
PPTX
Open Access, Open Data and Open Science in the context of agricultural research
CIARD Movement
 
PPTX
Introducing the GODAN Secretariat
CIARD Movement
 
PPTX
Research Data Management at International Food Policy Research Institute-IFPRI
CIARD Movement
 
PPTX
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
CIARD Movement
 
PPTX
The CIARD RINGValeri
CIARD Movement
 
PPT
RDA Wheat Data Interoperability Cookbook and last developments
CIARD Movement
 
PPTX
Turning three thesauri into a Global Agricultural Concept Scheme
CIARD Movement
 
Social Media in: Disseminating and Sharing Agriculture Data/Information
CIARD Movement
 
DSpace at ILRI : A semi-technical overview of “CGSpace”
CIARD Movement
 
University of Nairobi, Open Access Initiatives
CIARD Movement
 
Knowledge Management at KEFRI
CIARD Movement
 
Open Research Data – the KALRO experience
CIARD Movement
 
JKUAT Case on Open Access
CIARD Movement
 
JKUAT Case on Open Access
CIARD Movement
 
Open Data and Open Science in Agriculture: Management
CIARD Movement
 
Open Access Initiatives and Challenges in Kenya: Universities
CIARD Movement
 
ICT Centre of Excellence and Open Data –iCEOD
CIARD Movement
 
Open Data and Big Data Capacity Building Initiative
CIARD Movement
 
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
CIARD Movement
 
Open Data and Open Science in Agriculture : Experiences and Opinions
CIARD Movement
 
Open Access, Open Data and Open Science in the context of agricultural research
CIARD Movement
 
Introducing the GODAN Secretariat
CIARD Movement
 
Research Data Management at International Food Policy Research Institute-IFPRI
CIARD Movement
 
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
CIARD Movement
 
The CIARD RINGValeri
CIARD Movement
 
RDA Wheat Data Interoperability Cookbook and last developments
CIARD Movement
 
Turning three thesauri into a Global Agricultural Concept Scheme
CIARD Movement
 

Recently uploaded (20)

DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PDF
Study Material and notes for Women Empowerment
ComputerScienceSACWC
 
PPTX
Autodock-for-Beginners by Rahul D Jawarkar.pptx
Rahul Jawarkar
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PDF
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PDF
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
PPTX
FSSAI (Food Safety and Standards Authority of India) & FDA (Food and Drug Adm...
Dr. Paindla Jyothirmai
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PPTX
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PDF
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Study Material and notes for Women Empowerment
ComputerScienceSACWC
 
Autodock-for-Beginners by Rahul D Jawarkar.pptx
Rahul Jawarkar
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
FSSAI (Food Safety and Standards Authority of India) & FDA (Food and Drug Adm...
Dr. Paindla Jyothirmai
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 

Efficient & effective data management for research projects : ILRI's Data Management Platform

  • 1. Efficient & effective data management for research projects ILRI's Data Management Platform Carlos Quiros June, 2015
  • 2. • Back in 2011 • Current status • How we did it • Example of a process • CKAN • Key decisions made • Technology and skills required Contents
  • 3. Back in 2011 Survey design • Too many • Not common indicators • <> Variables • <> Calculations Survey implementation • Too many tools • No protocols • Poor field data cleaning • No standard process Storage • In files • Too many formats • Too many versions • Messy data cleaning • No accountability Availability & accessibility • Nothing Now Survey design • Too many • Common indicators • = Variables • = Calculations Storage • Server database • No formats • One version • Central cleaning • Accountability Availability & accessibility • CKAN • OData Survey implementation • 2 tools (ODK, CSPro) • Protocols • Field data cleaning • Standard process • Standard tools
  • 4. How we went around it Storage• Server database • How to integrate ODK and CSPro? • How to make it easy for scientists? • How to manage user decentralization? • Increase accountability? Availability and accessibility• What to use? CKAN, Dataverse, etc.  CKAN • How to extend it to serve our purpose? • How to integrate it with a server database? • How to manage our metadata and vocabularies? • How to do this? • Data interoperability? RDF, OData, Gdata, etc?  OData • How to do it? Survey implementation• Support only two tools • Wrote protocols • Wrote field data cleaning applications • Wrote policies and implementation plans • Wrote standard processes and tools for processing the data • Worked closely with teams • Created a central place for all the surveys • Separated surveys in modules • Worked on common indicators • Management supports this process Survey design (ongoing)
  • 5. Example of a process Testing & Review (.xls) Uploaded to Formhub to test account Testing & Review (ODK Collect) Ok ? Field Deployment Uploaded to Formhub to project account Data collection Upload data to Formhub End of Data Collecti on Sharing in Data Portal Data Cleaning from server using MySQL for Excel Detailed breakdown of ILRI’s RMD workflow with ODK Coding .doc  .xls Start Draft tool (.doc) Consultation Final tool (.doc) Who Code s RMG Staff Project Team Member Create MySQL schema with ODKToMySQL MySQL schema in server Convert data to JSON with FormhubToJSO N Data in JSON format Upload JSON into MySQL Schema with JSONToMySQL Metadata for portal Initialize META in schema S = Scientist input / usage S S S S S S S
  • 6. ILRI’s data portal (CKAN) – http://data.ilri.org/portal/ • CKAN? • The Open Knowledge Foundation • Biggest deployed data portal software • USA data portal • UK data portal • EU data portal • Open Africa • What do you get out of the box? • Create datasets with minimum metadata • Name, Abstract, Author, Date • Tags into controlled vocabulary • Powerful search engine • Public / private access to datasets • Able to attach resources (files) to a dataset • Data interoperability through powerful API and RDF • Arrange datasets into organization and topics • What can you do by creating extensions • Add new vocabularies (e.g., Language, Countries, etc.) • Add new metadata fields • Visualize different kinds of data (e.g., maps) • Change theme (colors, logos, fonts, etc.) • Create data hubs by harvesting other CKANs • What ever else you want…..
  • 7. Key decisions made • Use open source for all RDM Pros: • Bigger pool of tools • Flexible • Innovation Cons: • Complex skill set • Learning curve • Relational Database Management System (RDMS) Pros: • Central place • Auditing Cons: • DB management skill set • Scientist have no idea on how to work with a RDMS • CKAN Pros: • There is nothing better out there • Flexible and extendible Cons: • Programming in several languages is required • Learning curve
  • 8. Technology and skills required • Server • Linux (Ubuntu server) [Linux administration] • http://www.ubuntu.com/download/server • Database server • MySQL – An open source database system [DB administration, SQL] • http://www.mysql.com/ • Data processing software [Linux, C++, Python] • ODK – A toolset for collecting data on mobile devices. • https://opendatakit.org/ • CSPro – A software for creating data entry applications. • https://www.census.gov/population/international/software/cspro/ • Formhub – A software tools that collects ODK data. • https://github.com/SEL-Columbia/formhub • ODK Tools – A toolbox for processing ODK survey data into MySQL databases. • https://github.com/ilri/odktools • META – A toolbox for managing research data in MySQL databases. • https://github.com/ilri/meta • CSProTools – A toolbox for processing CSPro survey data into MySQL databases. • https://github.com/ilri/csprotools • Data sharing and interoperability • CKAN – The open source data portal software. [Linux, Python, WebDev] • http://ckan.org/ • http://docs.ckan.org/en/latest/maintaining/installing/index.html • http://docs.ckan.org/en/latest/extensions/index.html • Odata – Allow the creation and consumption of queryable and interoperable data resources in a simple and standard way. [Linux, Java, WebDev] • http://www.odata.org/
  • 9. Thank you Visit us @ http://data.ilri.org/

Editor's Notes