The Materials Project
Ecosystem
A Complete Software and Data
Platform for Materials Informatics
Shyue Ping Ong, University of California, San Diego
“Information wants to be free.”
– Steward Brand, 1960s
“Information wants to be free and
code wants to be wrong.”
– RSA Conference 2008
“Materials information and code
wants to be free and right.”
The Materials Project is an open science
project to make the computed properties of
all known inorganic materials publicly
available to all researchers to accelerate
materials innovation.
June 2011: Materials Genome Initiative which
aims to “fund computational tools, software, new
methods for material characterization, and the
development of open standards and databases that
will make the process of discovery and development
of advanced materials faster, less expensive, and
more predictable”
https://www.materialsproject.org
As of Jun 5 2015
q  Over 58,000 unique
compounds, and growing
q  Diverse set of many
properties
q Structural (lattice parameters,
atomic positions, etc.),
q Energetic (formation
energies, phase stability, etc.)
q Electronic structure (DOS,
Bandstructures)
q Elastic constants
q  Suite of Web Apps for
materials analysis
User-friendly Web Apps
Materials Explorer: Search for materials by formula,
elements or properties
Battery Explorer: Search for battery materials by
voltage, capacity and other properties
Crystal Toolkit: Design new materials from existing
materials
Structure Predictor: Predict novel structures
Phase Diagram App: Generate compositional and
grand canonical phase diagrams
Pourbaix Diagram App: Generate Pourbaix
diagrams
Reaction Calculator: Balance reactions and calculate
their enthalpies
Materials Project data in User papers
M. Meinert, M.P. Geisler, Phase stability of chromium based
compensated ferrimagnets with inverse Heusler structure, J.
Magn. Magn. Mater. 341 (2013) 72–74.
J. Rustad, Density functional calculations of the enthalpies of
formation of rare-earth orthophosphates, Am. Mineral. 97
(2012) 791–799.
M. Fondell, T.J. Jacobsson, M. Boman, T. Edvinsson, Optical
quantum confinement in low dimensional hematite, J. Mater.
Chem. A. 2 (2014) 3352.
Web frontend is only the tip of the iceberg…
pymatgen
FireWorks
REST API
custodian
MPWorks
MPEnv
rubicon
The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics
Hierarchical design of codebases
keeps infrastructure nimble to changes
WORKFLOW CODE
CHEMISTRY CODE
Many types of use cases
FireWorks pymatgen custodian MPWorks
Crystal workflows
FireWorks pymatgen custodian rubicon (private)
Molecule workflows
pymatgen
FireWorks
external
MAST, MaterialsHub
external
Berlin ML, JGI, MoDeNa
Sustainable software development
¨  Open-source
¤  Managed via
¤  More eyes => robustness
¤  Contributions from all over the world
¨  Benevolent dictators
¤  Unified vision
¤  Quality control
¨  Clear documentation
¤  Prevent code rot
¤  More users
¨  Continuous integration and testing
¤  Ensure code is always working
Python Materials Genomics (pymatgen)
¨  Core materials analysis powering the Materials
Project
¨  Defines core extensible Python objects for materials
data representation.
¨  Provides a robust and well-documented set of
structure and thermodynamic analysis tools relevant to
many applications.
¨  Establishes an open platform for researchers to
collaboratively develop sophisticated analyses of
materials data.
Extensive Materials Analysis Capabilities
Input/
Output
objects
(Modular, Reusable, Extendable)
Defects and TransformationsElectronic Structure
XRD Patterns
Phase and Pourbaix Diagrams
Functional properties
Comprehensively
documented
Continuously tested
and integrated
Active dev/user community
www.pymatgen.org stats
•  > 6000 views per month on average
•  (~50% increase from previous year)
V2.9.12 è v3.0.13
*Python 2/3 compatible!
Other improvements
•  ABINIT support
•  Defects (Haranczyk/LBNL)
•  Qchem (JCESR)
•  Bug fixes & improvements
Very active user community!
81 forks (developers making changes and contributing)
Actual commits has slowed somewhat, as expected for
a maturing and robust code base.
Pymatgen-db
¨  Database add-on for pymatgen. Enables the
creation of Materials Project-style MongoDB
(www.mongodb.org) databases for management of
materials data. Key features:
¤  Query engine for easy translation of MongoDB docs to
useful pymatgen objects for analysis purposes.
¤  Includes a clean and intuitive web ui (the Materials
Genomics UI) for exploring Mongo collections.
¤  http://pythonhosted.org//pymatgen-db/
Custodian
¨  Simple, robust and flexible just-in-time
(JIT) job management framework.
¤  Wrappers to perform error checking,
job management and error recovery.
¤  Error recovery is an important aspect
for HT: O(100,000) jobs + 1% error
rate => O(1000) errored jobs.
¤  Existing sub-packages for error
handling for VASP, NwChem and
QChem calculations.
¨  Blue: Controlled by subclasses of Job
¨  Red: Defined by ErrorHandlers.
Concrete Example for VASP
calculations
¨  Extensive set of rules have been codified for running VASP
calculations
¨  Significantly reduces error rate of calculations (< 1%)
VaspJob class
¨  auto_npar: automatically modifies NPAR in INCAR to a
relatively optimal number based on detected number of
processors! Enhances vasp calculation efficiency by ~10-30%!!!
¨  auto_gamma: If this is a gamma-only calculation and a
gamma compiled version of vasp exists, use it. Another
10-20% increase in efficiency!
¨  Even without error handling, custodian already significantly
improves resource utilization of running VASP calculations!
VaspJob(vasp_cmd, output_file="vasp.out”,
auto_npar=True, auto_gamma=True,
…<other options>...)
FireWorks is the Workflow Manager
21	
  
Custom material
A cool material !!
Lots of information about
cool material !!
Submit!	
  
Input generation
(parameter choice) Workflow mapping
Supercomputer
submission /
monitoring
Error
handling File Transfer
File Parsing /
DB insertion
FireWorks as a platform
Community can write any
workflow in FireWorks
à
We can automate it over
most supercomputing
resources
structure
charge
Band
structure
DOS
Optical
phonons
XAFS
spectra
GW
Workflows in Development by Internal/
External Collaborations
¨  Elastic constants (in production)
¨  Thermal properties (Phonon / GIBBS: in testing)
¨  Surfaces (in testing)
¨  GW / hybrid calculations
¨  ABINIT workflows (Geoffroy Hautier, UCL)
¨  Any code can be added and automated
Materials
Project DB
How do I
access MP
data?
Materials
Project DB
How do I
access MP
data?
Option 1: Direct access
Most flexible and powerful, but
•  User needs to know db language
•  Security is an issue
•  Fragile – if db tech or schema
changes, user’s analysis breaks
Materials
Project DB
How do I
access MP
data?
Option 2: Web Apps
Pros
•  Intuitive and user-friendly
•  Secure
Cons
•  Significant loss in flexibility
and power
WebApps
Materials
Project DB
How do I
access MP
data?
Option 3: Web Apps
built on RESTful API
Pros
•  Intuitive and user-friendly
•  Secure
WebApps
RESTfulAPI
•  Programmatic access for developers
and researchers
The Materials API
An open platform for accessing Materials
Project data based on REpresentational State
Transfer (REST) principles.
Flexible and scalable to cater to large
number of users, with different access
privileges.
Simple to use and code agnostic.
A REST API maps a URL to a resource.
Example:
GET https://api.dropbox.com/1/account/info
Returns information about a user’s account.
Methods: GET, POST, PUT, DELETE, etc.
Response: Usually JSON or XML or both
Who implements REST APIs?
https://www.materialsproject.org/rest/v2/materials/Fe2O3/vasp/energy
Preamble
Identifier, typically a
formula (Fe2O3), id
(1234) or chemical
system (Li-Fe-O)
Data type (vasp,
exp, etc.)
Property
Request
type
Secure access
An individual API key provides secure access
with defined privileges.
All https requests must supply API key as
either a “x-api-key” header or a GET/POST
“API_KEY” parameter.
API key available at
https://www.materialsproject.org/dashboard
Sample output (JSON)
¨  Intuitive response
format
¨  Machine-readable
(JSON parsers
available for most
programming
languages)
¨  Metadata provides
provenance for
tracking
{
}
created_at: "2014-07-18T11:23:25.415382",
valid_response: true,
version: {
},
-
pymatgen: "2.9.9",
db: "2014.04.18",
rest: "1.0"
response: [
],
-
{
},
-
energy: -67.16532048,
material_id: "mp-24972"
{
},
-
energy: -132.33035197,
material_id: "mp-542309"
{…},+
{…},+
{…},+
{…},+
{…},+
{…},+
{…},+
{…}+
copyright: "Materials Project, 2012"
Can I really access any piece of data
in the Materials Project?
Github-powered RESTful documentation
http://bit.ly/materialsapi
Via the shockingly powerful
https://www.materialsproject.org/rest/v2/query
Demo
http://localhost:8888/notebooks
The Materials API + pymatgen in Education
– UCSD’s NANO 106
¨  Data mined over the Materials Project’s 49,000+ unique
crystals
http://www.bit.ly/sg_stats
P21/c is the most common
space group, comprising
~9.8% of all compounds
The Materials Virtual Lab @ UCSD’s
One-click AIMD
Starting candidates
Topological Screening
(augmented by DFT)
Stability (phase &
EW) screening
Diffusivity
Optimized
candidates
Automated “one-click” MD
workflow based on pymatgen,
custodian and fireworks
AIMD SDSC
Multi-week AIMD simulation
Statistical exclusionary
screening
Y. Mo, S. P. Ong, G. Ceder, “Insights into Diffusion Mechanisms in P2
Layered Oxide Materials by First-Principles Calculations”, submitted
Automated pathway
extraction + NEB
Coming soon (full
launch in next few
weeks)!!
Sounds good, where do I learn more?
¨  The Materials Project
¤  https://www.materialsproject.org/open
¨  The Materials API Github Doc
¤  http://bit.ly/materialsapi
¨  The Materials Virtual Lab (MAVRL) @ UCSD
¤  Slides from Workshop on MP infrastructure (
http://mavrl.org/software)
Thank you.

More Related Content

PDF
The Materials Project - Combining Science and Informatics to Accelerate Mater...
PDF
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
PDF
ICME Workshop Jul 2014 - The Materials Project
PDF
FireWorks workflow software
PDF
Atomate: a high-level interface to generate, execute, and analyze computation...
PDF
The Materials Project: overview and infrastructure
PDF
FireWorks overview
The Materials Project - Combining Science and Informatics to Accelerate Mater...
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
ICME Workshop Jul 2014 - The Materials Project
FireWorks workflow software
Atomate: a high-level interface to generate, execute, and analyze computation...
The Materials Project: overview and infrastructure
FireWorks overview

What's hot (20)

PPTX
Big Data Science with H2O in R
PDF
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PDF
Ipaw14 presentation Quan, Tanu, Ian
PDF
LDV: Light-weight Database Virtualization
PPTX
OREChem Services and Workflows
PPT
Many Task Applications for Grids and Supercomputers
PDF
Sparkling Water 5 28-14
PPT
OGCE Project Overview
PPTX
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
PDF
Scaling up genomic analysis with ADAM
PDF
The Galaxy bioinformatics workflow environment
PDF
Spark the next top compute model
PDF
Sharing massive data analysis: from provenance to linked experiment reports
PPTX
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
PDF
H2O World - Intro to R, Python, and Flow - Amy Wang
PDF
Mining and Untangling Change Genealogies (PhD Defense Talk)
PPTX
Ase2010 shang
PDF
Remote Log Analytics Using DDS, ELK, and RxJS
PDF
GEN: A Database Interface Generator for HPC Programs
PPTX
Indiana University's Advanced Science Gateway Support
Big Data Science with H2O in R
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
Ipaw14 presentation Quan, Tanu, Ian
LDV: Light-weight Database Virtualization
OREChem Services and Workflows
Many Task Applications for Grids and Supercomputers
Sparkling Water 5 28-14
OGCE Project Overview
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
Scaling up genomic analysis with ADAM
The Galaxy bioinformatics workflow environment
Spark the next top compute model
Sharing massive data analysis: from provenance to linked experiment reports
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
H2O World - Intro to R, Python, and Flow - Amy Wang
Mining and Untangling Change Genealogies (PhD Defense Talk)
Ase2010 shang
Remote Log Analytics Using DDS, ELK, and RxJS
GEN: A Database Interface Generator for HPC Programs
Indiana University's Advanced Science Gateway Support
Ad

Viewers also liked (8)

PDF
Data dissemination and materials informatics at LBNL
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
PDF
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
PDF
Combining High-Throughput Computing and Statistical Learning to Develop and U...
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
PDF
Software tools to facilitate materials science research
PDF
Application of the Materials Project database and data mining towards the des...
Data dissemination and materials informatics at LBNL
Combining density functional theory calculations, supercomputing, and data-dr...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Combining density functional theory calculations, supercomputing, and data-dr...
Combining High-Throughput Computing and Statistical Learning to Develop and U...
Combining density functional theory calculations, supercomputing, and data-dr...
Software tools to facilitate materials science research
Application of the Materials Project database and data mining towards the des...
Ad

Similar to The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics (20)

PDF
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
PPTX
Scientific
PDF
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
PDF
OpenML Tutorial ECMLPKDD 2015
PPTX
"Data Provenance: Principles and Why it matters for BioMedical Applications"
PPTX
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
PDF
Building and deploying LLM applications with Apache Airflow
PPTX
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
PDF
Making project data avalialble eNanomapper through Database
PPTX
Data munging and analysis
PPTX
XSEDE14 SciGaP-Apache Airavata Tutorial
DOC
jlettvin.resume.20160922.STAR
PDF
Software tools for high-throughput materials data generation and data mining
PDF
SiddharthaMitra_resume_pdf
PPTX
Swift Parallel Scripting for High-Performance Workflow
PDF
04 open source_tools
PPTX
Ogce Workflow Suite
PDF
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
PDF
VictorCassen
PDF
Deep dive into the native multi model database ArangoDB
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Scientific
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
OpenML Tutorial ECMLPKDD 2015
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Building and deploying LLM applications with Apache Airflow
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Making project data avalialble eNanomapper through Database
Data munging and analysis
XSEDE14 SciGaP-Apache Airavata Tutorial
jlettvin.resume.20160922.STAR
Software tools for high-throughput materials data generation and data mining
SiddharthaMitra_resume_pdf
Swift Parallel Scripting for High-Performance Workflow
04 open source_tools
Ogce Workflow Suite
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
VictorCassen
Deep dive into the native multi model database ArangoDB

More from University of California, San Diego (20)

PDF
A*STAR Webinar on The AI Revolution in Materials Science
PDF
NANO281 Lecture 01 - Introduction to Data Science in Materials Science
PDF
Creating It from Bit - Designing Materials by Integrating Quantum Mechanics, ...
PDF
UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal Structures
PDF
NANO266 - Lecture 14 - Transition state modeling
PDF
NANO266 - Lecture 13 - Ab initio molecular dyanmics
PDF
NANO266 - Lecture 12 - High-throughput computational materials design
PDF
NANO266 - Lecture 11 - Surfaces and Interfaces
PDF
NANO266 - Lecture 10 - Temperature
PDF
UCSD NANO106 - 12 - X-ray diffraction
PDF
UCSD NANO106 - 11 - X-rays and their interaction with matter
PDF
UCSD NANO106 - 10 - Bonding in Materials
PDF
UCSD NANO106 - 09 - Piezoelectricity and Elasticity
PDF
UCSD NANO106 - 08 - Principal Directions and Representation Quadrics
PDF
UCSD NANO106 - 07 - Material properties and tensors
PDF
NANO266 - Lecture 9 - Tools of the Modeling Trade
PDF
NANO266 - Lecture 8 - Properties of Periodic Solids
PDF
NANO266 - Lecture 7 - QM Modeling of Periodic Structures
PDF
UCSD NANO106 - 06 - Plane and Space Groups
PDF
UCSD NANO106 - 05 - Group Symmetry and the 32 Point Groups
A*STAR Webinar on The AI Revolution in Materials Science
NANO281 Lecture 01 - Introduction to Data Science in Materials Science
Creating It from Bit - Designing Materials by Integrating Quantum Mechanics, ...
UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal Structures
NANO266 - Lecture 14 - Transition state modeling
NANO266 - Lecture 13 - Ab initio molecular dyanmics
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 11 - Surfaces and Interfaces
NANO266 - Lecture 10 - Temperature
UCSD NANO106 - 12 - X-ray diffraction
UCSD NANO106 - 11 - X-rays and their interaction with matter
UCSD NANO106 - 10 - Bonding in Materials
UCSD NANO106 - 09 - Piezoelectricity and Elasticity
UCSD NANO106 - 08 - Principal Directions and Representation Quadrics
UCSD NANO106 - 07 - Material properties and tensors
NANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 8 - Properties of Periodic Solids
NANO266 - Lecture 7 - QM Modeling of Periodic Structures
UCSD NANO106 - 06 - Plane and Space Groups
UCSD NANO106 - 05 - Group Symmetry and the 32 Point Groups

Recently uploaded (20)

PPTX
Thyroid disorders presentation for MBBS.pptx
PDF
cell_morphology_organelles_Physiology_ 07_02_2019.pdf
PPTX
ELISA(Enzyme linked immunosorbent assay)
PDF
Social preventive and pharmacy. Pdf
PDF
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
PPTX
Preformulation.pptx Preformulation studies-Including all parameter
PDF
Sujay Rao Mandavilli IJISRT25AUG764 context based approaches to population ma...
PDF
CHEM - GOC general organic chemistry.ppt
PDF
Exploring PCR Techniques and Applications
PDF
Traditional Healing Practices: A Model for Integrative Care in Diabetes Mana...
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PDF
The Physiology Of The Red Blood Cells pdf
PDF
Sumer, Akkad and the mythology of the Toradja Sa'dan.pdf
PDF
Cosmology using numerical relativity - what hapenned before big bang?
PPTX
Chapter 1 Introductory course Biology Camp
PDF
Glycolysis by Rishikanta Usham, Dhanamanjuri University
PPT
Chapter 6 Introductory course Biology Camp
PPTX
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
PPTX
Spectroscopy techniques in forensic science _ppt.pptx
PDF
No dilute core produced in simulations of giant impacts on to Jupiter
Thyroid disorders presentation for MBBS.pptx
cell_morphology_organelles_Physiology_ 07_02_2019.pdf
ELISA(Enzyme linked immunosorbent assay)
Social preventive and pharmacy. Pdf
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
Preformulation.pptx Preformulation studies-Including all parameter
Sujay Rao Mandavilli IJISRT25AUG764 context based approaches to population ma...
CHEM - GOC general organic chemistry.ppt
Exploring PCR Techniques and Applications
Traditional Healing Practices: A Model for Integrative Care in Diabetes Mana...
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
The Physiology Of The Red Blood Cells pdf
Sumer, Akkad and the mythology of the Toradja Sa'dan.pdf
Cosmology using numerical relativity - what hapenned before big bang?
Chapter 1 Introductory course Biology Camp
Glycolysis by Rishikanta Usham, Dhanamanjuri University
Chapter 6 Introductory course Biology Camp
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
Spectroscopy techniques in forensic science _ppt.pptx
No dilute core produced in simulations of giant impacts on to Jupiter

The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

  • 1. The Materials Project Ecosystem A Complete Software and Data Platform for Materials Informatics Shyue Ping Ong, University of California, San Diego
  • 2. “Information wants to be free.” – Steward Brand, 1960s
  • 3. “Information wants to be free and code wants to be wrong.” – RSA Conference 2008
  • 4. “Materials information and code wants to be free and right.”
  • 5. The Materials Project is an open science project to make the computed properties of all known inorganic materials publicly available to all researchers to accelerate materials innovation. June 2011: Materials Genome Initiative which aims to “fund computational tools, software, new methods for material characterization, and the development of open standards and databases that will make the process of discovery and development of advanced materials faster, less expensive, and more predictable” https://www.materialsproject.org
  • 6. As of Jun 5 2015 q  Over 58,000 unique compounds, and growing q  Diverse set of many properties q Structural (lattice parameters, atomic positions, etc.), q Energetic (formation energies, phase stability, etc.) q Electronic structure (DOS, Bandstructures) q Elastic constants q  Suite of Web Apps for materials analysis
  • 7. User-friendly Web Apps Materials Explorer: Search for materials by formula, elements or properties Battery Explorer: Search for battery materials by voltage, capacity and other properties Crystal Toolkit: Design new materials from existing materials Structure Predictor: Predict novel structures Phase Diagram App: Generate compositional and grand canonical phase diagrams Pourbaix Diagram App: Generate Pourbaix diagrams Reaction Calculator: Balance reactions and calculate their enthalpies
  • 8. Materials Project data in User papers M. Meinert, M.P. Geisler, Phase stability of chromium based compensated ferrimagnets with inverse Heusler structure, J. Magn. Magn. Mater. 341 (2013) 72–74. J. Rustad, Density functional calculations of the enthalpies of formation of rare-earth orthophosphates, Am. Mineral. 97 (2012) 791–799. M. Fondell, T.J. Jacobsson, M. Boman, T. Edvinsson, Optical quantum confinement in low dimensional hematite, J. Mater. Chem. A. 2 (2014) 3352.
  • 9. Web frontend is only the tip of the iceberg… pymatgen FireWorks REST API custodian MPWorks MPEnv rubicon
  • 11. Hierarchical design of codebases keeps infrastructure nimble to changes WORKFLOW CODE CHEMISTRY CODE
  • 12. Many types of use cases FireWorks pymatgen custodian MPWorks Crystal workflows FireWorks pymatgen custodian rubicon (private) Molecule workflows pymatgen FireWorks external MAST, MaterialsHub external Berlin ML, JGI, MoDeNa
  • 13. Sustainable software development ¨  Open-source ¤  Managed via ¤  More eyes => robustness ¤  Contributions from all over the world ¨  Benevolent dictators ¤  Unified vision ¤  Quality control ¨  Clear documentation ¤  Prevent code rot ¤  More users ¨  Continuous integration and testing ¤  Ensure code is always working
  • 14. Python Materials Genomics (pymatgen) ¨  Core materials analysis powering the Materials Project ¨  Defines core extensible Python objects for materials data representation. ¨  Provides a robust and well-documented set of structure and thermodynamic analysis tools relevant to many applications. ¨  Establishes an open platform for researchers to collaboratively develop sophisticated analyses of materials data.
  • 15. Extensive Materials Analysis Capabilities Input/ Output objects (Modular, Reusable, Extendable) Defects and TransformationsElectronic Structure XRD Patterns Phase and Pourbaix Diagrams Functional properties Comprehensively documented Continuously tested and integrated Active dev/user community
  • 16. www.pymatgen.org stats •  > 6000 views per month on average •  (~50% increase from previous year) V2.9.12 è v3.0.13 *Python 2/3 compatible! Other improvements •  ABINIT support •  Defects (Haranczyk/LBNL) •  Qchem (JCESR) •  Bug fixes & improvements Very active user community! 81 forks (developers making changes and contributing) Actual commits has slowed somewhat, as expected for a maturing and robust code base.
  • 17. Pymatgen-db ¨  Database add-on for pymatgen. Enables the creation of Materials Project-style MongoDB (www.mongodb.org) databases for management of materials data. Key features: ¤  Query engine for easy translation of MongoDB docs to useful pymatgen objects for analysis purposes. ¤  Includes a clean and intuitive web ui (the Materials Genomics UI) for exploring Mongo collections. ¤  http://pythonhosted.org//pymatgen-db/
  • 18. Custodian ¨  Simple, robust and flexible just-in-time (JIT) job management framework. ¤  Wrappers to perform error checking, job management and error recovery. ¤  Error recovery is an important aspect for HT: O(100,000) jobs + 1% error rate => O(1000) errored jobs. ¤  Existing sub-packages for error handling for VASP, NwChem and QChem calculations. ¨  Blue: Controlled by subclasses of Job ¨  Red: Defined by ErrorHandlers.
  • 19. Concrete Example for VASP calculations ¨  Extensive set of rules have been codified for running VASP calculations ¨  Significantly reduces error rate of calculations (< 1%)
  • 20. VaspJob class ¨  auto_npar: automatically modifies NPAR in INCAR to a relatively optimal number based on detected number of processors! Enhances vasp calculation efficiency by ~10-30%!!! ¨  auto_gamma: If this is a gamma-only calculation and a gamma compiled version of vasp exists, use it. Another 10-20% increase in efficiency! ¨  Even without error handling, custodian already significantly improves resource utilization of running VASP calculations! VaspJob(vasp_cmd, output_file="vasp.out”, auto_npar=True, auto_gamma=True, …<other options>...)
  • 21. FireWorks is the Workflow Manager 21   Custom material A cool material !! Lots of information about cool material !! Submit!   Input generation (parameter choice) Workflow mapping Supercomputer submission / monitoring Error handling File Transfer File Parsing / DB insertion
  • 22. FireWorks as a platform Community can write any workflow in FireWorks à We can automate it over most supercomputing resources structure charge Band structure DOS Optical phonons XAFS spectra GW
  • 23. Workflows in Development by Internal/ External Collaborations ¨  Elastic constants (in production) ¨  Thermal properties (Phonon / GIBBS: in testing) ¨  Surfaces (in testing) ¨  GW / hybrid calculations ¨  ABINIT workflows (Geoffroy Hautier, UCL) ¨  Any code can be added and automated
  • 24. Materials Project DB How do I access MP data?
  • 25. Materials Project DB How do I access MP data? Option 1: Direct access Most flexible and powerful, but •  User needs to know db language •  Security is an issue •  Fragile – if db tech or schema changes, user’s analysis breaks
  • 26. Materials Project DB How do I access MP data? Option 2: Web Apps Pros •  Intuitive and user-friendly •  Secure Cons •  Significant loss in flexibility and power WebApps
  • 27. Materials Project DB How do I access MP data? Option 3: Web Apps built on RESTful API Pros •  Intuitive and user-friendly •  Secure WebApps RESTfulAPI •  Programmatic access for developers and researchers
  • 28. The Materials API An open platform for accessing Materials Project data based on REpresentational State Transfer (REST) principles. Flexible and scalable to cater to large number of users, with different access privileges. Simple to use and code agnostic.
  • 29. A REST API maps a URL to a resource. Example: GET https://api.dropbox.com/1/account/info Returns information about a user’s account. Methods: GET, POST, PUT, DELETE, etc. Response: Usually JSON or XML or both
  • 32. Secure access An individual API key provides secure access with defined privileges. All https requests must supply API key as either a “x-api-key” header or a GET/POST “API_KEY” parameter. API key available at https://www.materialsproject.org/dashboard
  • 33. Sample output (JSON) ¨  Intuitive response format ¨  Machine-readable (JSON parsers available for most programming languages) ¨  Metadata provides provenance for tracking { } created_at: "2014-07-18T11:23:25.415382", valid_response: true, version: { }, - pymatgen: "2.9.9", db: "2014.04.18", rest: "1.0" response: [ ], - { }, - energy: -67.16532048, material_id: "mp-24972" { }, - energy: -132.33035197, material_id: "mp-542309" {…},+ {…},+ {…},+ {…},+ {…},+ {…},+ {…},+ {…}+ copyright: "Materials Project, 2012"
  • 34. Can I really access any piece of data in the Materials Project? Github-powered RESTful documentation http://bit.ly/materialsapi Via the shockingly powerful https://www.materialsproject.org/rest/v2/query
  • 36. The Materials API + pymatgen in Education – UCSD’s NANO 106 ¨  Data mined over the Materials Project’s 49,000+ unique crystals http://www.bit.ly/sg_stats P21/c is the most common space group, comprising ~9.8% of all compounds
  • 37. The Materials Virtual Lab @ UCSD’s One-click AIMD Starting candidates Topological Screening (augmented by DFT) Stability (phase & EW) screening Diffusivity Optimized candidates Automated “one-click” MD workflow based on pymatgen, custodian and fireworks AIMD SDSC Multi-week AIMD simulation Statistical exclusionary screening Y. Mo, S. P. Ong, G. Ceder, “Insights into Diffusion Mechanisms in P2 Layered Oxide Materials by First-Principles Calculations”, submitted Automated pathway extraction + NEB
  • 38. Coming soon (full launch in next few weeks)!!
  • 39. Sounds good, where do I learn more? ¨  The Materials Project ¤  https://www.materialsproject.org/open ¨  The Materials API Github Doc ¤  http://bit.ly/materialsapi ¨  The Materials Virtual Lab (MAVRL) @ UCSD ¤  Slides from Workshop on MP infrastructure ( http://mavrl.org/software)