Data Science: An Emerging
Field for Future Jobs
Jian Qin
School of Information Studies
Syracuse University

A presentation for the Graduate School, Syracuse University
February 22, 2013
DS	

   Talk points
        ›  Data science (DS) and data scientists in the context of
           research data
        ›  Implications and expectations of future research workforce
        ›  Preparing for the challenges and opportunities




                                         GRADUATION SCHOOL PRESENTATION 2013-2-22   2
DS	

   Feeling the pressure
        of data deluge in the
        digital information
        world …




        http://readwrite.com/2011/11/17/
        infographic-data-deluge---8-ze

                                           GRADUATION SCHOOL PRESENTATION 2013-2-22   3
DS	

       …in science research




             http://www.sciencemag.org/content/
             331/6018.cover-expansion


        GRADUATION SCHOOL PRESENTATION 2013-2-22   4
…in our health care
DS	





        http://ars.els-cdn.com/content/image/1-s2.0-S1053811905002508-gr4.jpg

                                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   5
http://www.redfin.com/homes-for-sale#!
                                        market=boston&region_id=112&region_
        …in our neighborhood            type=1&v=8
DS	





                               GRADUATION SCHOOL PRESENTATION 2013-2-22   6
Shift in Science Paradigms
DS	

        Thousand         A few hundred       A few decades                     Today
             years ago          years ago              ago




                                                             Data exploration (eScience)
                                                              unify theory, experiment, and
                                                                        simulation
                                             A computational -- Data captured by
                                                 approach    instruments or generated by
                                                 simulating  simulator
                               Theoretical        complex    -- Processed by software
                                 branch         phenomena    -- Information/Knowledge
                              using models,                  stored in computer
                             generalizations                 -- Scientist analyzes
          Science was                                        database/files using data
           empirical                                         management and statistics
        describing natural         Gray, J. & Szalay, A. (2007). eScience – A transformed scientific method.
          phenomena                http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
           2/22/13 13:54
                                                                 GRADUATION SCHOOL PRESENTATION 2013-2-22           7
DS	

        Research data collections
                     Size          Metadata                      Management
                                   Standards

                     Larger,          Multiple,                        Organized
                    discipline-    comprehensive                    Institutionalized,
                      based




                                                                          Heroic
                                                                        individual
                  Smaller, team-      None or                           inside the
                     based            random                               team

                                   GRADUATION SCHOOL PRESENTATION 2013-2-22          8
Emerging concepts
DS	

                  that are going to stay and
                      matter to your career




                              GRADUATION SCHOOL PRESENTATION 2013-2-22   9
What is data science?
DS	




                     “An emerging area of work
                   concerned with the collection,
                presentation, analysis, visualization,
                 management, and preservation of
                  large collections of information.”


                    Stanton, J. (2012). Introduction to Data Science.
                    http://ischool.syr.edu/media/documents/2012/3/
                                DataScienceBook1_1.pdf
                                             GRADUATION SCHOOL PRESENTATION 2013-2-22   10
DS	

   Data science and scientific research

    Management domain                                    Technical domain
    Plan, design, consult                                    Ingest, store,
     for, implement, and                                  organize, merge,
        evaluate data                                   filter, and transform
    management projects                                    data and create
         and services                                   analysis-ready data




                                 GRADUATION SCHOOL PRESENTATION 2013-2-22   11
Data management is essential
       DS	

                                Laboratory Data                         Data Modeling/
                                            Management Specialist                   Management Specialist
Scientific Data Management                  •  Administer operational database      •  Work closely with the high
Specialist                                  •  Assure the quality of data              performance computing and
•     Design, develop, implement, and          database content                        the IT manager
      manage high-throughput automatic      •  Interact closely with researchers,   •  Develop a data model for
      data processing infrastructure for       lab managers, and platform              complex multi-scale rocks
      large databases in a mature system       coordinators                         •  Design and organize a
•     Develop and improve the               •  Track deliverables against budget       database and complex
      infrastructure supporting this system    and prepare data reports                queries
•     Interface with multiple data          •  Collaborate closely with IT and      •  Integrate and mange multi-
      providers to design, build, and          bioinformatics colleagues               scale rocks subjected to
      maintain their customized databases •  Assist IT in gathering workflow           large-scale scientific
•     Clarify requirements, feature            requirements                            computing applications
      requests and bug reports for software •  Test changes and updates in IT
                                               systems                                http://www.ingrainrocks.com/
      developers and assist in testing                                                data-management-specialist/
      code.                                 •  Create and maintain app
                                               documentation
     http://www.bioinformatics.org/
     forums/forum.php?forum_id=9670
                                                                  GRADUATION SCHOOL PRESENTATION 2013-2-22   12
DS	


         “We’re increasingly finding data in
        the wild, and data scientists are
        involved with gathering data,
        massaging it into a tractable form,
        making it tell its story, and presenting
        that story to others.”
          Loukides, M. (2011). What is data science? Sebastopol, CA: O’Reilly.



                                                  GRADUATION SCHOOL PRESENTATION 2013-2-22   13
DS	

   Emerging job market: Data scientists
        ›  Data scientists are more likely to be involved across the
           data lifecycle:
           –  Acquiring new data sets: 33%
           –  Parsing data sets: 29%
           –  Filtering and organizing data: 40%
           –  Mining data for patterns: 30%
           –  Advanced algorithms to solve analytical problems: 29%
           –  Representing data visually: 38%
           –  Telling a story with data: 34%
           –  Interacting with data dynamically: 37%
           –  Making business decisions based on data: 40%
    http://mashable.com/2012/01/13/career-of-
                                                GRADUATION SCHOOL PRESENTATION 2013-2-22   14
    the-future-data-scientist-infographic/
Are you ready for the data
        challenges and opportunities?


DS	

                   GRADUATION SCHOOL PRESENTATION 2013-2-22   15
Ability to use a       Knowledge
                                                                  Data
DS	

      wide variety         of a subject
                                                                modeling,
             tools for            domain
         documentation,                                       database and
          analysis, and                                       query design
          report of data



                                       Data                               OS,
             Collaboration,
            communication,
                                     scientists                       Programming
                                                                       languages
                and co-
              ordination


                              Content and                  Encoding
    What are                   repository                 languages
                                systems
    expected of data
    scientists?                                GRADUATION SCHOOL PRESENTATION 2013-2-22   16
DS	

Analytical    skills: domain modeling
   Requirement analysis
                                Interview skills, analysis and
                                generalization skills
    Workflow analysis
                                Ability to capture components and
                                sequences in workflows
      Data modeling

                                Ability to translate domain analysis
   Data transformation          into data models
     needs analysis
                                Ability to envision the data model
     Data provenance            within the larger system architecture
      needs analysis


                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   17
Analytical skills: from data sources to patterns,
DS	

   relationships, and trends
                                    Analytical tools


                     “Hacking”


                                                                        Knowledge



                                 Data
                                 products


                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   18
Data management skills: data lifecycle and
DS	

 infrastructural services

      Metadata    Encoding       Semantic         Identify                Infrastructural
      standards   language        control       management                services

     Processed, transformed, derived, calculated, … data                  •  Data source
                                                                             discovery
                                                                          •  Data curation
                      Common data format
                          Image formats
                                                                          •  Data preservation
                          Matrix formats                                  •  Data integration and
                      Microarray file formats                                mashup
                     Communication protocols                              •  Data citation,
                                                                             publication, and
                                                                             distribution
                                                                          •  Data linking and
                                                                             interoperability
                                                                          •  …
                                                   GRADUATION SCHOOL PRESENTATION 2013-2-22   19
Technology skills with excellent communication
DS	

   skills

        TECHNOLOGY SKILLS            COMMUNICATION SKILLS
        ›  Operation systems         ›  Interviews
        ›  Repository systems        ›  “Ice breaking”
        ›  Database systems          ›  Community building
        ›  Programming languages     ›  Institutionalization
        ›  Encoding languages        ›  Stakeholder buy-in
        ›  Specialized programming



                                      GRADUATION SCHOOL PRESENTATION 2013-2-22   20
GRADUATION SCHOOL PRESENTATION 2013-2-22   21
DS	

   Four tracks: choose what you are good at

               Data                                         Data storage
             analytics                                          and
                             Data Science                   management
                             core course:
                              Applied data
                                science
                               Databases
              General
              system                                            Data
            management                                      visualization
                         http://ischool.syr.edu/
                                future/cas/
                           datascience.aspx
                                      GRADUATION SCHOOL PRESENTATION 2013-2-22   22
The iSchool’s version of data science
DS	

    education
                               Ability to use a        Knowledge
                                 wide variety          of a subject                 Data
                                   tools for             domain                   modeling,
                               documentation,                                   database and
                                analysis, and                                   query design
        Eventually the          report of data

        iSchool data science
        program will build                                      Data                       OS,
                                   Collaboration,
        the foundation for        communication,
                                                              scientists               Programming
                                                                                        languages
                                      and co-
        super data                  ordination

        scientists…
                                                    Content and               Encoding
                                                     repository              languages
                                                      systems


                                                    GRADUATION SCHOOL PRESENTATION 2013-2-22     23
DS	




        Thank You!

        Questions?
                GRADUATION SCHOOL PRESENTATION 2013-2-22   24

More Related Content

PPTX
Career in Data Science
PPTX
Understand the Demand of Analyst Opportunity in U.S
DOCX
Datascienceindia article
PDF
Data science and visualization lab presentation
PDF
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
PPTX
Data Science
PPTX
introduction to data science
PDF
Information & data science (1) converted
Career in Data Science
Understand the Demand of Analyst Opportunity in U.S
Datascienceindia article
Data science and visualization lab presentation
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Data Science
introduction to data science
Information & data science (1) converted

What's hot (20)

PDF
Data science
PPTX
Data science & data scientist
PPTX
Big data and Predictive Analytics By : Professor Lili Saghafi
PDF
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
PPTX
A Practical-ish Introduction to Data Science
PPTX
Data science
PDF
Top career opportunities in data science
PDF
Data Science Applications | Data Science For Beginners | Data Science Trainin...
PPTX
Data science unit2
PPTX
When Big Data and Predictive Analytics Collide: Visual Magic Happens
PPTX
Data science Big Data
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
PDF
Introduction to Data Science (Data Summit, 2017)
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PDF
Data+Science : A First Course
PDF
Making an impact with data science
PDF
Data Science Project Lifecycle
PPTX
Data Science
PDF
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
PDF
AI today and its power to transform healthcare
Data science
Data science & data scientist
Big data and Predictive Analytics By : Professor Lili Saghafi
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
A Practical-ish Introduction to Data Science
Data science
Top career opportunities in data science
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data science unit2
When Big Data and Predictive Analytics Collide: Visual Magic Happens
Data science Big Data
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Introduction to Data Science (Data Summit, 2017)
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data+Science : A First Course
Making an impact with data science
Data Science Project Lifecycle
Data Science
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
AI today and its power to transform healthcare
Ad

Viewers also liked (6)

PPTX
Kinerja dan daya saing indonesia
PDF
Infrastructure, Standards, and Policies for Research Data Management
DOCX
República bolivariana de venezuela
PPTX
Presentation SmaileX.com
KEY
Transmedia Ailove conference
Kinerja dan daya saing indonesia
Infrastructure, Standards, and Policies for Research Data Management
República bolivariana de venezuela
Presentation SmaileX.com
Transmedia Ailove conference
Ad

Similar to Data Science: An Emerging Field for Future Jobs (20)

PDF
Educating a New Breed of Data Scientists for Scientific Data Management
PDF
Data Science - Poster - Kirk Borne - RDAP12
PPTX
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012
PPTX
Jeff's what isdatascience
PPTX
Needs for Data Management & Citation Throughout the Information Lifecycle
PPTX
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
PPT
PPTX
Re tooling for data management-support
PDF
Scientific data management (v2)
PPTX
Managing the research life cycle
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
PPSX
New Roles / New Rules: Information Professionals in Research Data Manageme…
PDF
DATA SCIENCE CURRICULUM: CURRENT SCENARIO
PDF
Data Science Curriculum: Current Scenario
PPTX
Good data practices for graduate students
PPT
Supporting Libraries in Leading the Way in Research Data Management
KEY
Advanced Data Mining and Integration Research for Europe (ADMIRE)
PDF
A Multi-institutional Project to Develop Discipline-Specific Data Literacy In...
PDF
Digital Curation for Excel (DCXL)
PPTX
Data science curricula at UW
Educating a New Breed of Data Scientists for Scientific Data Management
Data Science - Poster - Kirk Borne - RDAP12
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012
Jeff's what isdatascience
Needs for Data Management & Citation Throughout the Information Lifecycle
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
Re tooling for data management-support
Scientific data management (v2)
Managing the research life cycle
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
New Roles / New Rules: Information Professionals in Research Data Manageme…
DATA SCIENCE CURRICULUM: CURRENT SCENARIO
Data Science Curriculum: Current Scenario
Good data practices for graduate students
Supporting Libraries in Leading the Way in Research Data Management
Advanced Data Mining and Integration Research for Europe (ADMIRE)
A Multi-institutional Project to Develop Discipline-Specific Data Literacy In...
Digital Curation for Excel (DCXL)
Data science curricula at UW

Data Science: An Emerging Field for Future Jobs

  • 1. Data Science: An Emerging Field for Future Jobs Jian Qin School of Information Studies Syracuse University A presentation for the Graduate School, Syracuse University February 22, 2013
  • 2. DS Talk points ›  Data science (DS) and data scientists in the context of research data ›  Implications and expectations of future research workforce ›  Preparing for the challenges and opportunities GRADUATION SCHOOL PRESENTATION 2013-2-22 2
  • 3. DS Feeling the pressure of data deluge in the digital information world … http://readwrite.com/2011/11/17/ infographic-data-deluge---8-ze GRADUATION SCHOOL PRESENTATION 2013-2-22 3
  • 4. DS …in science research http://www.sciencemag.org/content/ 331/6018.cover-expansion GRADUATION SCHOOL PRESENTATION 2013-2-22 4
  • 5. …in our health care DS http://ars.els-cdn.com/content/image/1-s2.0-S1053811905002508-gr4.jpg GRADUATION SCHOOL PRESENTATION 2013-2-22 5
  • 6. http://www.redfin.com/homes-for-sale#! market=boston&region_id=112&region_ …in our neighborhood type=1&v=8 DS GRADUATION SCHOOL PRESENTATION 2013-2-22 6
  • 7. Shift in Science Paradigms DS Thousand A few hundred A few decades Today years ago years ago ago Data exploration (eScience) unify theory, experiment, and simulation A computational -- Data captured by approach instruments or generated by simulating simulator Theoretical complex -- Processed by software branch phenomena -- Information/Knowledge using models, stored in computer generalizations -- Scientist analyzes Science was database/files using data empirical management and statistics describing natural Gray, J. & Szalay, A. (2007). eScience – A transformed scientific method. phenomena http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt 2/22/13 13:54 GRADUATION SCHOOL PRESENTATION 2013-2-22 7
  • 8. DS Research data collections Size Metadata Management Standards Larger, Multiple, Organized discipline- comprehensive Institutionalized, based Heroic individual Smaller, team- None or inside the based random team GRADUATION SCHOOL PRESENTATION 2013-2-22 8
  • 9. Emerging concepts DS that are going to stay and matter to your career GRADUATION SCHOOL PRESENTATION 2013-2-22 9
  • 10. What is data science? DS “An emerging area of work concerned with the collection, presentation, analysis, visualization, management, and preservation of large collections of information.” Stanton, J. (2012). Introduction to Data Science. http://ischool.syr.edu/media/documents/2012/3/ DataScienceBook1_1.pdf GRADUATION SCHOOL PRESENTATION 2013-2-22 10
  • 11. DS Data science and scientific research Management domain Technical domain Plan, design, consult Ingest, store, for, implement, and organize, merge, evaluate data filter, and transform management projects data and create and services analysis-ready data GRADUATION SCHOOL PRESENTATION 2013-2-22 11
  • 12. Data management is essential DS Laboratory Data Data Modeling/ Management Specialist Management Specialist Scientific Data Management •  Administer operational database •  Work closely with the high Specialist •  Assure the quality of data performance computing and •  Design, develop, implement, and database content the IT manager manage high-throughput automatic •  Interact closely with researchers, •  Develop a data model for data processing infrastructure for lab managers, and platform complex multi-scale rocks large databases in a mature system coordinators •  Design and organize a •  Develop and improve the •  Track deliverables against budget database and complex infrastructure supporting this system and prepare data reports queries •  Interface with multiple data •  Collaborate closely with IT and •  Integrate and mange multi- providers to design, build, and bioinformatics colleagues scale rocks subjected to maintain their customized databases •  Assist IT in gathering workflow large-scale scientific •  Clarify requirements, feature requirements computing applications requests and bug reports for software •  Test changes and updates in IT systems http://www.ingrainrocks.com/ developers and assist in testing data-management-specialist/ code. •  Create and maintain app documentation http://www.bioinformatics.org/ forums/forum.php?forum_id=9670 GRADUATION SCHOOL PRESENTATION 2013-2-22 12
  • 13. DS “We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” Loukides, M. (2011). What is data science? Sebastopol, CA: O’Reilly. GRADUATION SCHOOL PRESENTATION 2013-2-22 13
  • 14. DS Emerging job market: Data scientists ›  Data scientists are more likely to be involved across the data lifecycle: –  Acquiring new data sets: 33% –  Parsing data sets: 29% –  Filtering and organizing data: 40% –  Mining data for patterns: 30% –  Advanced algorithms to solve analytical problems: 29% –  Representing data visually: 38% –  Telling a story with data: 34% –  Interacting with data dynamically: 37% –  Making business decisions based on data: 40% http://mashable.com/2012/01/13/career-of- GRADUATION SCHOOL PRESENTATION 2013-2-22 14 the-future-data-scientist-infographic/
  • 15. Are you ready for the data challenges and opportunities? DS GRADUATION SCHOOL PRESENTATION 2013-2-22 15
  • 16. Ability to use a Knowledge Data DS wide variety of a subject modeling, tools for domain documentation, database and analysis, and query design report of data Data OS, Collaboration, communication, scientists Programming languages and co- ordination Content and Encoding What are repository languages systems expected of data scientists? GRADUATION SCHOOL PRESENTATION 2013-2-22 16
  • 17. DS Analytical skills: domain modeling Requirement analysis Interview skills, analysis and generalization skills Workflow analysis Ability to capture components and sequences in workflows Data modeling Ability to translate domain analysis Data transformation into data models needs analysis Ability to envision the data model Data provenance within the larger system architecture needs analysis GRADUATION SCHOOL PRESENTATION 2013-2-22 17
  • 18. Analytical skills: from data sources to patterns, DS relationships, and trends Analytical tools “Hacking” Knowledge Data products GRADUATION SCHOOL PRESENTATION 2013-2-22 18
  • 19. Data management skills: data lifecycle and DS infrastructural services Metadata Encoding Semantic Identify Infrastructural standards language control management services Processed, transformed, derived, calculated, … data •  Data source discovery •  Data curation Common data format Image formats •  Data preservation Matrix formats •  Data integration and Microarray file formats mashup Communication protocols •  Data citation, publication, and distribution •  Data linking and interoperability •  … GRADUATION SCHOOL PRESENTATION 2013-2-22 19
  • 20. Technology skills with excellent communication DS skills TECHNOLOGY SKILLS COMMUNICATION SKILLS ›  Operation systems ›  Interviews ›  Repository systems ›  “Ice breaking” ›  Database systems ›  Community building ›  Programming languages ›  Institutionalization ›  Encoding languages ›  Stakeholder buy-in ›  Specialized programming GRADUATION SCHOOL PRESENTATION 2013-2-22 20
  • 22. DS Four tracks: choose what you are good at Data Data storage analytics and Data Science management core course: Applied data science Databases General system Data management visualization http://ischool.syr.edu/ future/cas/ datascience.aspx GRADUATION SCHOOL PRESENTATION 2013-2-22 22
  • 23. The iSchool’s version of data science DS education Ability to use a Knowledge wide variety of a subject Data tools for domain modeling, documentation, database and analysis, and query design Eventually the report of data iSchool data science program will build Data OS, Collaboration, the foundation for communication, scientists Programming languages and co- super data ordination scientists… Content and Encoding repository languages systems GRADUATION SCHOOL PRESENTATION 2013-2-22 23
  • 24. DS Thank You! Questions? GRADUATION SCHOOL PRESENTATION 2013-2-22 24