Scheduled Scientific
                                 Data Releases
                             Using .backup Volumes


                                            May 23rd 2008

Chris Kurtz
Zach Schimke
Mars Space Flight Facility
Arizona State University
Outline

Introduction: The Mars Space Flight Facility
Spacecraft Data and You
Image Processing
The Problem: Released and Unreleased Data
The Solution: AFS and .backups
Overview of MSFF use of AFS
Feature Requests
Questions




                                                         2
Introduction

NASA/Jet Propulsion Lab funded research institution
Scientists, Mission Planners, Developers, SysAdmins
Four instruments on Mars:
  TES (Thermal Emission Spectrometer)
     Mars Global Surveyor (1996-2006)
  THEMIS (THermal EMission Imaging System)
     Mars Odyssey (2001 to current)
  Mini-TES
     MER Rovers Spirit and Opportunity (2004 to current)
Over 80 TB of collected mission data (including AFS)


                                                               3
Spacecraft Data and You

Instrument captures data on Mars
Spacecraft combines data from all instruments, adds
spacecraft telemetry, and sends to Earth via radio to
be received by the DSN (Deep Space Network)
JPL correlates, decodes, and packages data for each
instrument
MSFF pulls the raw data for its instrument from JPL
MSFF processes the data through multiple steps




                                                        4
Spacecraft Data and You: THEMIS Data Types

           IR DAY             THEMIS Data Types:
IR NIGHT



                               Infrared (IR)
                                  100m per pixel
                                  Daytime and
                                  Nighttime Images
                               Visible Light (VIS)
                                  18m per pixel



                    VISIBLE


                                                        5
Image Processing

 Raw       Calibrated     Projected
(EDR)       (RDR)          (GEO)


                                      SFDU: Standard Formatted
                                       Data Unit
                                      EDR: Experiment Data
                                        Record
                                      RDR: Reduced Data Record:
                                      GEO: Geometrically
                                        Registered Record



        (2x)       (4x)
                                                                  6
Image Processing

Due to the volume of data, two 100-CPU Linux
clusters are used for processing and the resulting
products are stored on a high-end NFS server from
Network Appliance
These data products are made available to Science
Team members immediately via authenticated
services
JPL contract requires data to be released to the public
6 months after being received (to give Operations time
to validate, calibrate, process, perform scientific
analysis, etc) – this is the crux of the problem

                                                          7
Image Processing




                             Snow and Ice in Udzha
                             Crater

                                (VIS – False Color)




Image Credit: NASA/JPL/ASU
                                                      8
Image Processing




Hemetite in Meridiani
Planum
   (IR – False Color)




                        Image Credit: NASA/JPL/ASU
                                                     9
The Problem: Released and Unreleased Data

There is a 6-month grace period between data
collection and public release
Previous methodology was to copy over 25 TB of data
via rsync from internal NFS to stand-alone web
server(s)...This had issues:
  It took forever just to build the file list
  The rsync itself took days
  Releases took longer and longer (we regularly re-
  process old data with updated calibration, so have to re-
  release)
  Webservers needed fast, expensive, redundant disk

                                                              10
The Solution: AFS and .backup


Data is moved from expensive NFS to cheap AFS

AFS excels at storing large amounts of Read Only
data redundantly and at reasonable costs

AFS snapshot backups allow us to keep public data
public and private data private




                                                       11
The Solution: NFS vs AFS

NFS (Network Appliance)
  High Speed (Trunked GigE)
  High throughput (100,000 ops/sec)
  Redundant (Mod. RAID4, clustered servers)
  EXPENSIVE!!! ($5000 per TB)


       VS

 AFS (CentOS Linux Servers)
   Fast RO
   Slower RW
   Redundant (RAID5)
   Cheap! (Less than $1000 per TB)

                                              12
The Solution: .backup volumes

AFS .backup volumes are point-in-time copies that are
independent of the original volume (a “reverse delta”) -
since the original volume can be altered without
affecting the .backup, this is useful!
New methodology:
  All volumes of released data have a .backup volume
  created using standard tools (vos backup)
  Website references backup volume names
This new process takes an hour or two (depending on
how many new .backup volumes are created)
Process moved from SysAdmins to Operations
                                                           13
MSFF and OpenAFS

Once processed, data is stored in AFS in 100-orbit
“chunks” (afs volumes) according to various data
types, such as “themis.RDR.V284XXRDR” (THEMIS
instrument container volume, RDR container volume,
Visible Camera orbits 28400-28499 RDRs)
Co-Investigators at other Universities access the data
via authenticated AFS, FTP, and website as it is
proprietary...for a while
Public access via web, ftp, and AFS




                                                         14
MSFF OpenAFS Specifics

Cell: mars.asu.edu
AFS DB servers are Xen virtual machines
Servers:
  8 AFS File Servers
  CentOS 5.1 (formerly Fedora Core 4)
  15,000 volumes / 35 Tb of AFS storage (RAID 5)
     4000 read/write volumes (8000 .readonly)
     3500 .backup
Nagios monitoring of BOS, Disk Space, rxdebug



                                                        15
Feature Requests

Additional snapshot capability besides .backup
  At least one .snapshot, but more would be nicer.
  File Server implied ACLs for this .snapshot
Volume Autorelease
  Built-in Mechanism to automatically release volumes.
Better VOS granularity
  Allow users to release specific volumes or volume sets
  rather than it being all or nothing.
(Open)LDAP support for PT Server
Better cron support (mostly solved by k5start)

                                                           16
Questions




                     Gusev Crater (VIS – False Color)
Image Credit: NASA/JPL/ASU, Mars Express HSRC Camera, ESA/DLR/FU Berlin (G. Neukum)
                                                                                      17
Final Remarks




Utopia Plains
(IR/VIS – False Color)




                         Image Credit: NASA/JPL/ASU
                                                      18

Scheduled Scientific Data Releases Using .backup Volumes

  • 1.
    Scheduled Scientific Data Releases Using .backup Volumes May 23rd 2008 Chris Kurtz Zach Schimke Mars Space Flight Facility Arizona State University
  • 2.
    Outline Introduction: The MarsSpace Flight Facility Spacecraft Data and You Image Processing The Problem: Released and Unreleased Data The Solution: AFS and .backups Overview of MSFF use of AFS Feature Requests Questions 2
  • 3.
    Introduction NASA/Jet Propulsion Labfunded research institution Scientists, Mission Planners, Developers, SysAdmins Four instruments on Mars: TES (Thermal Emission Spectrometer) Mars Global Surveyor (1996-2006) THEMIS (THermal EMission Imaging System) Mars Odyssey (2001 to current) Mini-TES MER Rovers Spirit and Opportunity (2004 to current) Over 80 TB of collected mission data (including AFS) 3
  • 4.
    Spacecraft Data andYou Instrument captures data on Mars Spacecraft combines data from all instruments, adds spacecraft telemetry, and sends to Earth via radio to be received by the DSN (Deep Space Network) JPL correlates, decodes, and packages data for each instrument MSFF pulls the raw data for its instrument from JPL MSFF processes the data through multiple steps 4
  • 5.
    Spacecraft Data andYou: THEMIS Data Types IR DAY THEMIS Data Types: IR NIGHT Infrared (IR) 100m per pixel Daytime and Nighttime Images Visible Light (VIS) 18m per pixel VISIBLE 5
  • 6.
    Image Processing Raw Calibrated Projected (EDR) (RDR) (GEO) SFDU: Standard Formatted Data Unit EDR: Experiment Data Record RDR: Reduced Data Record: GEO: Geometrically Registered Record (2x) (4x) 6
  • 7.
    Image Processing Due tothe volume of data, two 100-CPU Linux clusters are used for processing and the resulting products are stored on a high-end NFS server from Network Appliance These data products are made available to Science Team members immediately via authenticated services JPL contract requires data to be released to the public 6 months after being received (to give Operations time to validate, calibrate, process, perform scientific analysis, etc) – this is the crux of the problem 7
  • 8.
    Image Processing Snow and Ice in Udzha Crater (VIS – False Color) Image Credit: NASA/JPL/ASU 8
  • 9.
    Image Processing Hemetite inMeridiani Planum (IR – False Color) Image Credit: NASA/JPL/ASU 9
  • 10.
    The Problem: Releasedand Unreleased Data There is a 6-month grace period between data collection and public release Previous methodology was to copy over 25 TB of data via rsync from internal NFS to stand-alone web server(s)...This had issues: It took forever just to build the file list The rsync itself took days Releases took longer and longer (we regularly re- process old data with updated calibration, so have to re- release) Webservers needed fast, expensive, redundant disk 10
  • 11.
    The Solution: AFSand .backup Data is moved from expensive NFS to cheap AFS AFS excels at storing large amounts of Read Only data redundantly and at reasonable costs AFS snapshot backups allow us to keep public data public and private data private 11
  • 12.
    The Solution: NFSvs AFS NFS (Network Appliance) High Speed (Trunked GigE) High throughput (100,000 ops/sec) Redundant (Mod. RAID4, clustered servers) EXPENSIVE!!! ($5000 per TB) VS AFS (CentOS Linux Servers) Fast RO Slower RW Redundant (RAID5) Cheap! (Less than $1000 per TB) 12
  • 13.
    The Solution: .backupvolumes AFS .backup volumes are point-in-time copies that are independent of the original volume (a “reverse delta”) - since the original volume can be altered without affecting the .backup, this is useful! New methodology: All volumes of released data have a .backup volume created using standard tools (vos backup) Website references backup volume names This new process takes an hour or two (depending on how many new .backup volumes are created) Process moved from SysAdmins to Operations 13
  • 14.
    MSFF and OpenAFS Onceprocessed, data is stored in AFS in 100-orbit “chunks” (afs volumes) according to various data types, such as “themis.RDR.V284XXRDR” (THEMIS instrument container volume, RDR container volume, Visible Camera orbits 28400-28499 RDRs) Co-Investigators at other Universities access the data via authenticated AFS, FTP, and website as it is proprietary...for a while Public access via web, ftp, and AFS 14
  • 15.
    MSFF OpenAFS Specifics Cell:mars.asu.edu AFS DB servers are Xen virtual machines Servers: 8 AFS File Servers CentOS 5.1 (formerly Fedora Core 4) 15,000 volumes / 35 Tb of AFS storage (RAID 5) 4000 read/write volumes (8000 .readonly) 3500 .backup Nagios monitoring of BOS, Disk Space, rxdebug 15
  • 16.
    Feature Requests Additional snapshotcapability besides .backup At least one .snapshot, but more would be nicer. File Server implied ACLs for this .snapshot Volume Autorelease Built-in Mechanism to automatically release volumes. Better VOS granularity Allow users to release specific volumes or volume sets rather than it being all or nothing. (Open)LDAP support for PT Server Better cron support (mostly solved by k5start) 16
  • 17.
    Questions Gusev Crater (VIS – False Color) Image Credit: NASA/JPL/ASU, Mars Express HSRC Camera, ESA/DLR/FU Berlin (G. Neukum) 17
  • 18.
    Final Remarks Utopia Plains (IR/VIS– False Color) Image Credit: NASA/JPL/ASU 18