Marjorie M.K. Hlava.
                                 President
                   Access Innovations, Inc.
                   mhlava@accessinn.com
                             505-998-0800

MARCH 28TH
VIRTUAL LUNCH WEBINAR:
VISUALIZATION FOR DATA
ANALYSIS: A NEW WAY TO LOOK
AT CONTENT

                                         1
A picture is worth…
thousand words
   As librarians we normally look at data in lists, citations, and other text
   based presentations. Increasingly however this data can be analyzed,
   manipulated and presented as visual displays. Maps of science, places and
   spaces, increased amounts storage and computing power have made
   working with digital assets possible. Presenting the data in new and visual
   ways allow us to see trends, changes in research directions, coverage,
   demographic trends, data overlap and the white spaces where data does
   not exist on a topic – knowledge gaps are exposed. This talk will cover
   how the data is prepared and options for visual display content…
   100 words x10 = thousand words

                                                                                 2
Why take a visual look?
• As librarians we normally look at data in lists, citations, and other
  text based presentations.
• Increasingly however this data can be analyzed, manipulated and
  presented as visual displays.
•    Maps of science, places and spaces, increased amounts storage
    and computing power have made working with digital assets
    possible.
• Presenting the data in new and visual ways allow us to see trends,
  changes in research directions, coverage, demographic trends, data
  overlap and the white spaces where data does not exist on a topic
  – knowledge gaps are exposed.
                                                                          3
Visualization of data
• Needs                   • Is richer with
  − Measurement              − Linking
  − Metrics                  − Semantic enrichment
  − Numbers                  − Classification
• Shows
  − Adjacency             • Supports
  − Relationships            − Forecasting
  − Trends                   − Trend analysis
  − Co – occurrence          − Segmentation
  − Conceptual distance
                             − Distribution


                                                     4
Man’s attention to
                       visual display to convey
                         knowledge is ancient




Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   5
The art in maps
      is a
 longstanding
   tradition




           Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   6
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   7
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   8
Super imposing data is now common
        A mash up example
                               Traffic Injury Map
                                        UK Data Archive
                                        US National Highway
                                                 Safety Administration
                                        Google Maps Base
                               Accident categories include
                                        children
                                        automobile
                                        bicycle
                                        etc.
                               Data
                                        time
                                        place
                                        type

                               Source:
                               JISC TechWatch: Data Mash-ups September 2010
        Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   9
The most popular APIs for mashups


   a) July 2009                                        b) October 2009




                          Source: JISC TechWatch: Data Mash-ups September 2010
                          Programmable web data
               Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   10
Radio4 website
                     Data source
                     MapTube
                     Credit Crunch Mood Map




                                                                   User Website questionnaire

                  Crowdsourced visualization and mapping


Early responses                                     Final Credit Crunch Mood Map




                    Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
                              Source: JISC TechWatch: Data Mash-ups September                                 2010
                                                                                                              11
Mash up of bird flight migrations and
           weather patterns
http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be




                 Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   12
http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded




                         Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Noise Tube Application uses geo-locations of SMS
like Twitter with GPS sensing on mobile devices

                                      Source: JISC TechWatch: Data Mash-ups September 2010
                                      Programmable web data
                           Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   14
Changes in our life time!




Its only the beginning
                    Source: JISC TechWatch: Data Mash-ups September 2010
                    Programmable web data
         Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   15
Fine, So there are nice visual maps,




What about information from databases and
                libraries??


             Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   16
Start with data – like this XML file




        Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   17
Index or tag using subject terms from
       thesaurus or taxonomy
   date,   category,                   taxonomy term, frequency




              Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   18
Many views of one set of data




         Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   19
Load to a visualization program
         Like Prefuse




        Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   20
Or Pajek




Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   21
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   22
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   23
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   24
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   25
National Information center
  for Educational Media
Albuquerque’s own
»   Sandia developed VxInsight
»   Access Innovations NICEM
Same data - three views
Primary and Secondary Education in US
Shows the US Valley of Science
Little Science taught in elementary years



               Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   26
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Requirements for Visualization
   From a society / publisher perspective
    »   Which topical areas form our core? periphery?
    »   Where is the coverage dense? thin?
    »   Which topical areas are most active? least active?
    »   Which topical areas seem to be emerging?
        declining?
    »   Which topical areas are interrelated? isolated?
    »   What are the overlaps between journals / segments?
    »   Where are the potential expansion points?
   From a thesaurus perspective
    »   What terms are too broadly defined?
    »   How do actual topical relationships differ from the
        thesaurus structure?

                     Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   29
Using visualization to show
   From a society / publisher perspective
    »   Identify Core, Boundary and Cross Border
    »   Provides Indicators
           Activity
           Growth
           Relatedness
           Centrality
    »   Locates Journal domains

   From a thesaurus perspective
    »   Identifies terms that are too broadly defined
    »   Potential Improvements in thesaurus structure using topic
        structures

                          Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   30
Case Study:
          Mapping IEEE thesaurus space
   We are interested in an expanded map that
    includes adjacencies to the IEEE data
    »   Expanded term set shows adjacent white space;
        opportunities for expansion
   Overlaps and edges of the science
    »   We need comparison data
   Learn the directions in the field
    »   Low occurrence rate in IEEE documents?
    »   Linkage to terms in IEEE documents?
   Where do we find these terms? How can we
    add them?


                   Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   31
The process
   Built a rule base to auto index IEEE content
    »   “90 % accuracy out of the box on journal data”*
    »   “80% out of the box on proceedings data”*
   The overlapping data sets
    »   Auto indexed 1.2 million Xplore records
    »   10 years of US Patent data
    »   10 years of Medline
   Term sets used
    »   IEEE thesaurus terms rule base
    »   Medical Subject Headings (MeSH) (and simple rule base)
    »   Defense Technical Information Center (DTIC) Thesaurus (
        and simple rule base)
    »   Similar level of detail to current IEEE thesaurus terms

                    Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   32
Defining expanded term space
         1. The data - Select related corpus

   14k DTIC




                   2k terms


                                       IEEE
   475k patents                                                          PubMed
                                 1.2M documents
                                                                        525k docs




                                                                     24k MeSH

                  Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   33
Defining expanded term space
              2. Identify related terms
Use the IEEE Thesaurus to index the three collections




                2k terms


                                    IEEE
                              1.2M documents




               Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   34
Defining expanded term space
             2. Identify related terms
Use MESH and DTIC to also index the three collections



                 2k terms


                                     IEEE
                               1.2M documents




                Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   35
Defining expanded term space
             3. Resulting term set
The co-indexed items from the three collections



              2k terms


                                  IEEE
                            1.2M documents




             Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   36
Defining expanded term space
              4. Term:Term Matrix
Where do the articles and their indexing intersect?




                Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   37
Visualization Strategies




              Visualization
Matrix          Software




             Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   38
All data up-posted to the top level




          Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   39
Many map options
Previous Experience                                    IEEE Experience




             Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   40
IEEE Portfolio
                                                                      Electromag
                                                                      Compat Soc                         Prof
                                                                                    Reliability       Commun
                                                                                     Society           Society Education
                                              Sensors    Ultrason,              Robot                             Society
                                       Oceanic Council   Ferro …              Autom Soc
                                      Engng Soc
                                                 Instr
                                              Measur Soc
                   CouncilDielectr El                             Nucl Plasma
                 SupercondInsul Soc                                                                         Sys Man
                                                                    Sci Soc                     Computer
                                                                                                              Cyber
          Prod Saf                                                                               Society                                         Photonics
                                           Compon,                                  Systems                   Society
         Engng Soc       Magnetics                                                  Council                                                        Soc
                                          Packag …
                             Soc

             Nanotech                                   Social
              Council                                 Impl Techn
                                                                                 Computer
                                                                                Intelligence
                                                                                  Society                                                                       Eng Med
                                                                                                                                                                Biol Sci



                                                                                                      Council Electr
                                                                                                       Design Auto

                                                                    Industr
         Industry
                                    Geosci Rem                     Electr Soc
         Appl Soc
                                     Sens Soc




                                                                                                                              Antennas
                                                                                                                             Propag Soc
                       Power
 Power &
                    Electron Soc                                  Microwave
Energy Soc
                                                                  Theory Soc




                                                     Circuits &
                                                                                                                   Signal                                        Consumer
                                                     Systems
                                   Electron                                                                       Proc Soc                                       Electr Soc
                                   Dev Soc



                                                                                                                                                               Broadcast
                                                                                               Intell Transp                                                   Techn Soc
                                                                                                  Sys Soc
                 Solid St
               Circuits Soc



                                                                                       Aerosp
                                                                                        Electr                                                                          Vehicular
                                                                                       Sys Soc                                                                         Techn Soc




                                                                                                                                                             Commun
                                                                                                                                                               Soc



                                                                                                                                          Info Theory
                                                                                                                                              Soc




                              Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony                                                               41
Radial Visualization




  Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   42
Subsidiary radials
 Journal of Instrumentation



Compon,                Dielectr El                                         Ultrason,                 Electromag
Packag …                                            Instr                  Ferro …                   Compat Soc
                       Insul Soc
                                                 Measur Soc




 Prod Saf               Council                   Magnetics                Sensors                    Antennas
Engng Soc              Supercond                    Soc                    Council                   Propag Soc




            Nanotech                  Oceanic                 Geosci Rem               Nucl Plasma
             Council                 Engng Soc                 Sens Soc                  Sci Soc




                         Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
The research team
   Access Innovations / Data Harmony
    »   Founded in 1978
    »   Data enrichment and normalization
    »   Suite of Semantic Enrichment tools

   SciTechStrategies
    »   Understanding data through visualization


   IEEE Indexing & Abstracting Group




                    Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   44
Use a Thesaurus to Label Maps
                                   Construction                         Packaging          Consumer
                                                                                            Products
                                    Vehicles,
                                    Parts                                                                                 Welding
                         Gearing


                Automotive +                                           Flow
                Defense                    Boats                                                               Appliances           Food
                           Brakes                                                                    Hygiene
                                           Aircraft
                                                                 Dynamics                Sprayers      Cleaning
                                                                                 IC Engines
            Turbines                                 Industrial
                           Pumps
                                                ValvesProducts                                   Exhaust

  Leisure                   Fitness                                  Outerwear            Footwear
                                                           Control                                              Medical
                                                Pipes                                                           Devices
         Toys                                                                                                               Health Care
Clocks                    Games                                                       Blasting                 Radiology
                                                                      Cooling
                                                                                                                               Measurement

                                                                          Energy                                    Med Instruments    Agriculture
                                                        Cables         Heating                                                             Plants,
                                                                                                                                           Micro-orgs
                                          Conveyers
                                                                                   Oilfield
                                                                                   Services
                                                                                                                              Pharma
                 Lamps                      Components
                                                                           Printing
          Telecom         Computer         Motors
                                                                                                        Acyclic Comp
                           HW/SW                                       Semiconductors Lubricants                              Metals
                                                Optics
                                                     Lasers                                                           Rubber
                                                                                                                       Molding                 Paper
                                         Displays Electronics                         Catalysis
                                               Magn/Elect        Conductors                                          Layers
                         Circuits                                                                                                            Textiles
                                                                         Electrochem
                                                         Magnets                                                    Macromolecules
                                      Disk
                                 Amplifiers                         Photochem      Chemicals                        Coatings




                                           Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony                      45
Questions Answered
   Is there a way, using our own information, to forecast our
    direction?
   Where is the industry headed? What about by technology
    sector?
   Does our coverage match our mission and vision?
   Can we become smarter about our data and potential
    markets using our collection in new ways?
    Are the societies publishing and talking about what their
    charter indicates they cover?
   What are the trends – are topics emerging/cooling?
   Can we use technology and our own data to explore these
    questions while enhancing our data?


                    Well Formed Data • Semantic Enrichment • Taxonomies •46
                                                                          Access Innovations • Data Harmony
Conference Strategy




   Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   47
Publication Strategy




JASIST reference



                     Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   48
We looked at
              Visualization of data
   Finding the Metrics                              How to enrich with
    »   Measurement                                   »   Linking
    »   Numbers                                       »   Semantic enrichment
    »   Terms as indicators                           »   Classification
   Ways to show                                     Maps supporting
    »   Adjacency
    »   Relationships                                 »   Forecasting
    »   Trends                                        »   Trend analysis
    »   Co – occurrence                               »   Segmentation
    »   Conceptual distance                           »   Distribution



                                                                                                                 49
                       Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Effective maps require


   Contextual data
   Detailed data
   Classification methods
   At least two directions in the matrix
   A little art for fun




         Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   50
Changing the way we
                                   interact with reality

Acrossair’s Augmented reality application – just point your phone at it
                     Source: JISC TechWatch: Data Mash-ups September 2010
                     Programmable web data
          Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony   51
It just takes a little imagination




                             Thank you


                               Marjorie M.K. Hlava
                     President, Access Innovations
                                    505-998-0800
                          mhlava@accessinn.com
                                                 52

Visualization for Data Analysis: A New Way to Look at Content

  • 1.
    Marjorie M.K. Hlava. President Access Innovations, Inc. [email protected] 505-998-0800 MARCH 28TH VIRTUAL LUNCH WEBINAR: VISUALIZATION FOR DATA ANALYSIS: A NEW WAY TO LOOK AT CONTENT 1
  • 2.
    A picture isworth… thousand words As librarians we normally look at data in lists, citations, and other text based presentations. Increasingly however this data can be analyzed, manipulated and presented as visual displays. Maps of science, places and spaces, increased amounts storage and computing power have made working with digital assets possible. Presenting the data in new and visual ways allow us to see trends, changes in research directions, coverage, demographic trends, data overlap and the white spaces where data does not exist on a topic – knowledge gaps are exposed. This talk will cover how the data is prepared and options for visual display content… 100 words x10 = thousand words 2
  • 3.
    Why take avisual look? • As librarians we normally look at data in lists, citations, and other text based presentations. • Increasingly however this data can be analyzed, manipulated and presented as visual displays. • Maps of science, places and spaces, increased amounts storage and computing power have made working with digital assets possible. • Presenting the data in new and visual ways allow us to see trends, changes in research directions, coverage, demographic trends, data overlap and the white spaces where data does not exist on a topic – knowledge gaps are exposed. 3
  • 4.
    Visualization of data •Needs • Is richer with − Measurement − Linking − Metrics − Semantic enrichment − Numbers − Classification • Shows − Adjacency • Supports − Relationships − Forecasting − Trends − Trend analysis − Co – occurrence − Segmentation − Conceptual distance − Distribution 4
  • 5.
    Man’s attention to visual display to convey knowledge is ancient Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 5
  • 6.
    The art inmaps is a longstanding tradition Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 6
  • 7.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 7
  • 8.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 8
  • 9.
    Super imposing datais now common A mash up example Traffic Injury Map UK Data Archive US National Highway Safety Administration Google Maps Base Accident categories include children automobile bicycle etc. Data time place type Source: JISC TechWatch: Data Mash-ups September 2010 Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 9
  • 10.
    The most popularAPIs for mashups  a) July 2009 b) October 2009 Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 10
  • 11.
    Radio4 website Data source MapTube Credit Crunch Mood Map User Website questionnaire Crowdsourced visualization and mapping Early responses Final Credit Crunch Mood Map Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Source: JISC TechWatch: Data Mash-ups September 2010 11
  • 12.
    Mash up ofbird flight migrations and weather patterns http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 12
  • 13.
    http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 14.
    Noise Tube Applicationuses geo-locations of SMS like Twitter with GPS sensing on mobile devices Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 14
  • 15.
    Changes in ourlife time! Its only the beginning Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 15
  • 16.
    Fine, So thereare nice visual maps, What about information from databases and libraries?? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 16
  • 17.
    Start with data– like this XML file Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 17
  • 18.
    Index or tagusing subject terms from thesaurus or taxonomy  date, category, taxonomy term, frequency Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 18
  • 19.
    Many views ofone set of data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
  • 20.
    Load to avisualization program Like Prefuse Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 20
  • 21.
    Or Pajek Well FormedData • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 21
  • 22.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 22
  • 23.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 23
  • 24.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 24
  • 25.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 25
  • 26.
    National Information center for Educational Media Albuquerque’s own » Sandia developed VxInsight » Access Innovations NICEM Same data - three views Primary and Secondary Education in US Shows the US Valley of Science Little Science taught in elementary years Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 26
  • 27.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 28.
    Well Formed Data• Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 29.
    Requirements for Visualization  From a society / publisher perspective » Which topical areas form our core? periphery? » Where is the coverage dense? thin? » Which topical areas are most active? least active? » Which topical areas seem to be emerging? declining? » Which topical areas are interrelated? isolated? » What are the overlaps between journals / segments? » Where are the potential expansion points?  From a thesaurus perspective » What terms are too broadly defined? » How do actual topical relationships differ from the thesaurus structure? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29
  • 30.
    Using visualization toshow  From a society / publisher perspective » Identify Core, Boundary and Cross Border » Provides Indicators  Activity  Growth  Relatedness  Centrality » Locates Journal domains  From a thesaurus perspective » Identifies terms that are too broadly defined » Potential Improvements in thesaurus structure using topic structures Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30
  • 31.
    Case Study: Mapping IEEE thesaurus space  We are interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion  Overlaps and edges of the science » We need comparison data  Learn the directions in the field » Low occurrence rate in IEEE documents? » Linkage to terms in IEEE documents?  Where do we find these terms? How can we add them? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31
  • 32.
    The process  Built a rule base to auto index IEEE content » “90 % accuracy out of the box on journal data”* » “80% out of the box on proceedings data”*  The overlapping data sets » Auto indexed 1.2 million Xplore records » 10 years of US Patent data » 10 years of Medline  Term sets used » IEEE thesaurus terms rule base » Medical Subject Headings (MeSH) (and simple rule base) » Defense Technical Information Center (DTIC) Thesaurus ( and simple rule base) » Similar level of detail to current IEEE thesaurus terms Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32
  • 33.
    Defining expanded termspace 1. The data - Select related corpus 14k DTIC 2k terms IEEE 475k patents PubMed 1.2M documents 525k docs 24k MeSH Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33
  • 34.
    Defining expanded termspace 2. Identify related terms Use the IEEE Thesaurus to index the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34
  • 35.
    Defining expanded termspace 2. Identify related terms Use MESH and DTIC to also index the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35
  • 36.
    Defining expanded termspace 3. Resulting term set The co-indexed items from the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36
  • 37.
    Defining expanded termspace 4. Term:Term Matrix Where do the articles and their indexing intersect? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37
  • 38.
    Visualization Strategies Visualization Matrix Software Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38
  • 39.
    All data up-postedto the top level Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 39
  • 40.
    Many map options PreviousExperience IEEE Experience Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 40
  • 41.
    IEEE Portfolio Electromag Compat Soc Prof Reliability Commun Society Society Education Sensors Ultrason, Robot Society Oceanic Council Ferro … Autom Soc Engng Soc Instr Measur Soc CouncilDielectr El Nucl Plasma SupercondInsul Soc Sys Man Sci Soc Computer Cyber Prod Saf Society Photonics Compon, Systems Society Engng Soc Magnetics Council Soc Packag … Soc Nanotech Social Council Impl Techn Computer Intelligence Society Eng Med Biol Sci Council Electr Design Auto Industr Industry Geosci Rem Electr Soc Appl Soc Sens Soc Antennas Propag Soc Power Power & Electron Soc Microwave Energy Soc Theory Soc Circuits & Signal Consumer Systems Electron Proc Soc Electr Soc Dev Soc Broadcast Intell Transp Techn Soc Sys Soc Solid St Circuits Soc Aerosp Electr Vehicular Sys Soc Techn Soc Commun Soc Info Theory Soc Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 41
  • 42.
    Radial Visualization Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 42
  • 43.
    Subsidiary radials Journalof Instrumentation Compon, Dielectr El Ultrason, Electromag Packag … Instr Ferro … Compat Soc Insul Soc Measur Soc Prod Saf Council Magnetics Sensors Antennas Engng Soc Supercond Soc Council Propag Soc Nanotech Oceanic Geosci Rem Nucl Plasma Council Engng Soc Sens Soc Sci Soc Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 44.
    The research team  Access Innovations / Data Harmony » Founded in 1978 » Data enrichment and normalization » Suite of Semantic Enrichment tools  SciTechStrategies » Understanding data through visualization  IEEE Indexing & Abstracting Group Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 44
  • 45.
    Use a Thesaurusto Label Maps Construction Packaging Consumer Products Vehicles, Parts Welding Gearing Automotive + Flow Defense Boats Appliances Food Brakes Hygiene Aircraft Dynamics Sprayers Cleaning IC Engines Turbines Industrial Pumps ValvesProducts Exhaust Leisure Fitness Outerwear Footwear Control Medical Pipes Devices Toys Health Care Clocks Games Blasting Radiology Cooling Measurement Energy Med Instruments Agriculture Cables Heating Plants, Micro-orgs Conveyers Oilfield Services Pharma Lamps Components Printing Telecom Computer Motors Acyclic Comp HW/SW Semiconductors Lubricants Metals Optics Lasers Rubber Molding Paper Displays Electronics Catalysis Magn/Elect Conductors Layers Circuits Textiles Electrochem Magnets Macromolecules Disk Amplifiers Photochem Chemicals Coatings Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 45
  • 46.
    Questions Answered  Is there a way, using our own information, to forecast our direction?  Where is the industry headed? What about by technology sector?  Does our coverage match our mission and vision?  Can we become smarter about our data and potential markets using our collection in new ways? Are the societies publishing and talking about what their charter indicates they cover?  What are the trends – are topics emerging/cooling?  Can we use technology and our own data to explore these questions while enhancing our data? Well Formed Data • Semantic Enrichment • Taxonomies •46 Access Innovations • Data Harmony
  • 47.
    Conference Strategy Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 47
  • 48.
    Publication Strategy JASIST reference Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 48
  • 49.
    We looked at Visualization of data  Finding the Metrics  How to enrich with » Measurement » Linking » Numbers » Semantic enrichment » Terms as indicators » Classification  Ways to show  Maps supporting » Adjacency » Relationships » Forecasting » Trends » Trend analysis » Co – occurrence » Segmentation » Conceptual distance » Distribution 49 Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 50.
    Effective maps require  Contextual data  Detailed data  Classification methods  At least two directions in the matrix  A little art for fun Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 50
  • 51.
    Changing the waywe interact with reality Acrossair’s Augmented reality application – just point your phone at it Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 51
  • 52.
    It just takesa little imagination Thank you Marjorie M.K. Hlava President, Access Innovations 505-998-0800 [email protected] 52

Editor's Notes

  • #45 Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.