SlideShare a Scribd company logo
University of Iowa | Mobile Sensing Laboratory
CSense: A Stream-Processing Toolkit
for Robust and High-Rate Mobile
Sensing Applications
IPSN 2014
Farley Lai, Syed Shabih Hasan, Austin Laugesen, Octav Chipara
Department of Computer Science
University of Iowa | Mobile Sensing Laboratory |
Mobile Sensing Applications (MSAs)
CSense Toolkit 2
Speaker
Models
Speech
Recording
VAD
Feature
Extraction
HTTP
Upload
Sitting
Standing
Walking
Running
Climbing Stairs
…
Bluetooth
Data
Collection
Feature
Extraction
Activity
Classification
Speaker Identification
Activity Recognition
University of Iowa | Mobile Sensing Laboratory |
• Mobile sensing applications are difficult to implement on
Android devices
– concurrency
– high frame rates
– robustness
• Resource limitations and Java VM worsen these problems
– additional cost of virtualization
– significant overhead of garbage collection
Challenges
CSense Toolkit 3
University of Iowa | Mobile Sensing Laboratory |
• Support for MSAs
– SeeMon, Coordinator: constrained queries
– JigSaw: customized pipelines
 CSense provides a high-level stream programming
abstraction general and suitable for a broad range of MSAs
• CSense builds on prior data flow models
– Synchronous data flows: static scheduling and optimizations
• e.g., StreamIt, Lustre
– Async. data flows: more flexible but have lower performance
• e.g., Click, XStream/Wavescript
Related Work
CSense Toolkit 4
University of Iowa | Mobile Sensing Laboratory |
• Programming model
• Compiler
• Run-time environment
• Evaluation
CSense Toolkit
CSense Toolkit 5
University of Iowa | Mobile Sensing Laboratory |
• Applications modeled as Stream Flow Graphs (SFG)
– builds on prior work on asynchronous data flow graphs
– incorporates novel features to support MSA
Programming Model
CSense Toolkit 6
addComponent("audio", new AudioComponentC(rateInHz, 16));
addComponent("rmsClassifier", new RMSClassifierC(rms));
addComponent("mfcc", new MFCCFeaturesG(speechT, featureT))
...
link("audio", "rmsClassifier");
toTap("rmsClassifier::below");
link("rmsClassifier::above", "mfcc::sin");
fromMemory("mfcc::fin");
...
create
components
wire
components
University of Iowa | Mobile Sensing Laboratory |
• Goal: Reduce memory overhead introduced by garbage
collection and copy operations
• Pass-by-reference semantics
– allows for sharing data between components
• Explicit inclusion of memory management in SFGs
– focuses programmer’s attention on memory operations
– enables static analysis by tracking data exchanges globally
– allows for efficient implementation
Memory Management
CSense Toolkit 7
University of Iowa | Mobile Sensing Laboratory |
• Data flows from sources, through links, to taps
• Implementation:
– sources implement memory pools that hold several frames
– references counters used to track sharing of frames
– taps decrement reference counters
Memory Management
CSense Toolkit 8
Audio data MFCCs Filenames
University of Iowa | Mobile Sensing Laboratory |
• Goal: Expressive concurrency model that may be analyzed
statically
• Components are partitioned into execution domains
– components in the same domain are executed on a thread
– frame exchanges between domains are mediated using shared
queues
• Other data sharing between components are using a tuple space
• Concurrency is specified as constraints
– NEW_DOMAIN / SAME_DOMAIN
– heuristic assignment of components to domains to minimize data
exchanges between domains
• Static analysis may identify some data races
Concurrency Model
CSense Toolkit 9
University of Iowa | Mobile Sensing Laboratory | CSense Toolkit 10
Concurrency Model
getComponent("audio").setThreading(Threading.NEW_DOMAIN);
getComponent("httpPost").setThreading(Threading.NEW_DOMAIN);
getComponent("mfcc").setThreading(Threading.SAME_DOMAIN);
Compiler transformation
University of Iowa | Mobile Sensing Laboratory |
• Goal: Promote component reuse across MSAs
• A rich type system that extends Java’s type system
– most components use generic type systems
– insight: frame sizes are essential in configuring components
• detect configuration errors / optimization opportunities
Type System
CSense Toolkit 11
VectorC energyT = TypeC.newFloatVector();
energyT.addConstraint(Constraint.GT(8000));
energyT.addConstraint(Constraint.LT(24000));
VectorC speechT = TypeC.newFloatVector(128);
VectorC featureT = TypeC.newFloatVector(11);
University of Iowa | Mobile Sensing Laboratory |
• Not all configurations may be implemented efficiently
Flow Analysis
CSense Toolkit 12
Constraints:
energyT > 8000
energyT < 24000
speechT = 128
featuresT = 11
energyT speechT
Inefficient 10,000 128
Efficient 10,240 (128 * 80) 128
University of Iowa | Mobile Sensing Laboratory |
• Not all configurations may be implemented efficiently
Flow Analysis
CSense Toolkit 13
Constraints:
energyT > 8000
energyT < 24000
speechT = 128
featuresT = 11
energyT speechT
Inefficient 10,000 128
Efficient 10,240 (128 * 80) 128
Mrms=1 Mmfcc=80
An efficient implementation exists when
Mrms * energyT = Mmfcc * speechT
University of Iowa | Mobile Sensing Laboratory |
• Goal: determine configurations have efficient frame
conversions
• Problem may be formulated as an integer linear program
– constraints: generated from type constraints
– optimization: minimize total memory usage
– solution: specifies frame sizes and multipliers for application
• An efficient frame conversion may not exist
– the compiler relaxes conversion rules
Flow Analysis
CSense Toolkit 14
University of Iowa | Mobile Sensing Laboratory |
• Static analysis:
– composition errors, memory usage errors, race conditions
• Flow analysis:
– whole-application configuration and optimization
• Stream Flow Graph transformations:
– domain partitioning, type conversions, MATLAB component
coalescing
• Code generation:
– Android application/service, MATLAB (C code + JNI stubs)
CSense Compiler
CSense Toolkit 15
University of Iowa | Mobile Sensing Laboratory |
• Components exchange data using push/pull semantics
• Runtime includes a scheduler for each domain
– task queue + event queue
– wake lock – for power management
CSense Runtime
CSense Toolkit 16
Scheduler1Task Queue
Event Queue
Scheduler2 Task Queue
Event Queue
Memory Pool
University of Iowa | Mobile Sensing Laboratory |
• Micro benchmarks evaluate the runtime performance
– synchronization primitives + memory management
• Implemented the MSA using CSense
– Speaker identification
– Activity recognition
– Audiology application
• Setup
– Galaxy Nexus, TI OMAP 4460 ARM A9@1.2 GHz, 1 GB
– Android 4.2
– MATLAB 2012b and MATLAB Coder 2.3
Evaluation
17CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Scheduler: memory management + synchronization primitives
• Memory management options
– GC: garbage collection
– MP: memory pool
• Concurrent access to queues and memory pools
– L: Java reentrant lock
– C: CSense atomic variable based synchronization primitives
Producer-Consumer Benchmark
18CSense Toolkit
University of Iowa | Mobile Sensing Laboratory | 19
Producer-Consumer Throughput
• Garbage collection overhead limits scalability
• Concurrency primitives have a significant impact on performance
30%
13.8x
CSense Toolkit
19x
University of Iowa | Mobile Sensing Laboratory |
• Reentrant locks incurs GC due to implicit allocations
• CSense runtime has low garbage collection overhead
Producer-Consumer GC Overhead
20
no garbage
collection
(in this benchmark)
CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Benefits of flow analysis
• Runtime overhead
MFCC Benchmark
21CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Flow analysis eliminates unnecessary memory copy
• Benefits of larger but efficient frame allocations
– reduced number of component invocations and disk I/O
overhead
– Increased cache locality
MFCC Benchmark CPU Usage
22
45% decrease
CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Runtime overhead is low for a wide range of data rates
MFCC Runtime Overhead
23
1.83%
2.39%
CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Programming model
– efficient memory management
– flexible concurrency model
– rich type system
• Compiler
– whole-application configuration & optimization
– static and flow analyses
• Efficient runtime environment
• Evaluation
– implemented three typical MSAs
– benchmarks indicate significant performance improvements
• 19X throughput boost compared with naïve Java baseline
• 45% CPU time reduced with flow analysis
• Low garbage collection overhead
Conclusions
24CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• National Science Foundation (NeTs grant #1144664 )
• Carver Foundation (grant #14-43555 )
Acknowledgements
25CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Runtime scheduler overhead of a complex 6-domain
application that accesses both phone sensors and remote
shimmer motes over bluetooth
ActiSense Benchmark
26CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Runtime scheduler overhead of a complex 6-domain
application that accesses both phone sensors and remote
shimmer motes over bluetooth
ActiSense Benchmark
27CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
• Overall domain scheduler overhead is small despite a longer
pipeline
ActiSense CPU Usage
28
50 Hz
60 Hz
CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
AudioSense
29CSense Toolkit
University of Iowa | Mobile Sensing Laboratory |
AudioSense
30CSense Toolkit

More Related Content

Viewers also liked (16)

PDF
Amazon Dynamo
Farley Lai
 
PPTX
High Performance Stream Processing and Optimizations
Farley Lai
 
PDF
Deep C
Olve Maudal
 
PDF
Rackow casandra ppp_slides
Casandra Rackow
 
PPTX
Couch db
amini gazar
 
PDF
Couch db
Christian Castillo
 
PDF
Intro To Couch Db
Shahar Evron
 
PPTX
Google - Bigtable
영원 서
 
ODP
Introduction to Knockout Js
Knoldus Inc.
 
ODP
Introduction to Apache Cassandra
Knoldus Inc.
 
PPT
Couch db
Rashmi Agale
 
PDF
CouchDB – A Database for the Web
Karel Minarik
 
KEY
Real World CouchDB
John Wood
 
PPTX
GOOGLE BIGTABLE
Tomcy Thankachan
 
PDF
The Google Chubby lock service for loosely-coupled distributed systems
Romain Jacotin
 
Amazon Dynamo
Farley Lai
 
High Performance Stream Processing and Optimizations
Farley Lai
 
Deep C
Olve Maudal
 
Rackow casandra ppp_slides
Casandra Rackow
 
Couch db
amini gazar
 
Intro To Couch Db
Shahar Evron
 
Google - Bigtable
영원 서
 
Introduction to Knockout Js
Knoldus Inc.
 
Introduction to Apache Cassandra
Knoldus Inc.
 
Couch db
Rashmi Agale
 
CouchDB – A Database for the Web
Karel Minarik
 
Real World CouchDB
John Wood
 
GOOGLE BIGTABLE
Tomcy Thankachan
 
The Google Chubby lock service for loosely-coupled distributed systems
Romain Jacotin
 

Similar to CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing Applications (20)

DOCX
Sensing meets mobile social networks
rajesh00012
 
PDF
soft-shake.ch - Optimizing iOS applications
soft-shake.ch
 
PDF
RTF - Prasad bhatt
Roopa Nadkarni
 
PPTX
HPC Application Profiling & Analysis
Rishi Pathak
 
PDF
HPC Application Profiling and Analysis
Rishi Pathak
 
PDF
Unit testing on embedded target with C++Test
Engineering Software Lab
 
PDF
COLLABORATECOM-2013, Austin, Texas, United States, 20 October 2013
Charith Perera
 
PDF
Introduction To SPOT
pauldeng
 
PDF
Challenges in Maintaining a High Performance Search Engine Written in Java
lucenerevolution
 
PDF
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
0xdaryl
 
PDF
Android media framework overview
Jerrin George
 
PDF
Performance and memory profiling for embedded system design
Mr. Chanuwan
 
PPTX
Profiling Multicore Systems to Maximize Core Utilization
mentoresd
 
PDF
HIS 2017 Mark Batty-Industrial concurrency specification for C/C++
jamieayre
 
PPTX
A model driven development framework for developing sense-compute-control app...
Pankesh Patel
 
PPTX
Plenary Session: application drive design alberto sv
chiportal
 
PPTX
17-Android.pptx
PRANAVKUMAR699137
 
PPTX
Symbian OS
Adit Pathak
 
PPTX
Parallel Programming Models: Shared variable model
SHASHIKANT346021
 
PPT
Introduction to android
Jawad Mohmand
 
Sensing meets mobile social networks
rajesh00012
 
soft-shake.ch - Optimizing iOS applications
soft-shake.ch
 
RTF - Prasad bhatt
Roopa Nadkarni
 
HPC Application Profiling & Analysis
Rishi Pathak
 
HPC Application Profiling and Analysis
Rishi Pathak
 
Unit testing on embedded target with C++Test
Engineering Software Lab
 
COLLABORATECOM-2013, Austin, Texas, United States, 20 October 2013
Charith Perera
 
Introduction To SPOT
pauldeng
 
Challenges in Maintaining a High Performance Search Engine Written in Java
lucenerevolution
 
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
0xdaryl
 
Android media framework overview
Jerrin George
 
Performance and memory profiling for embedded system design
Mr. Chanuwan
 
Profiling Multicore Systems to Maximize Core Utilization
mentoresd
 
HIS 2017 Mark Batty-Industrial concurrency specification for C/C++
jamieayre
 
A model driven development framework for developing sense-compute-control app...
Pankesh Patel
 
Plenary Session: application drive design alberto sv
chiportal
 
17-Android.pptx
PRANAVKUMAR699137
 
Symbian OS
Adit Pathak
 
Parallel Programming Models: Shared variable model
SHASHIKANT346021
 
Introduction to android
Jawad Mohmand
 
Ad

Recently uploaded (20)

PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Ad

CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing Applications

  • 1. University of Iowa | Mobile Sensing Laboratory CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing Applications IPSN 2014 Farley Lai, Syed Shabih Hasan, Austin Laugesen, Octav Chipara Department of Computer Science
  • 2. University of Iowa | Mobile Sensing Laboratory | Mobile Sensing Applications (MSAs) CSense Toolkit 2 Speaker Models Speech Recording VAD Feature Extraction HTTP Upload Sitting Standing Walking Running Climbing Stairs … Bluetooth Data Collection Feature Extraction Activity Classification Speaker Identification Activity Recognition
  • 3. University of Iowa | Mobile Sensing Laboratory | • Mobile sensing applications are difficult to implement on Android devices – concurrency – high frame rates – robustness • Resource limitations and Java VM worsen these problems – additional cost of virtualization – significant overhead of garbage collection Challenges CSense Toolkit 3
  • 4. University of Iowa | Mobile Sensing Laboratory | • Support for MSAs – SeeMon, Coordinator: constrained queries – JigSaw: customized pipelines  CSense provides a high-level stream programming abstraction general and suitable for a broad range of MSAs • CSense builds on prior data flow models – Synchronous data flows: static scheduling and optimizations • e.g., StreamIt, Lustre – Async. data flows: more flexible but have lower performance • e.g., Click, XStream/Wavescript Related Work CSense Toolkit 4
  • 5. University of Iowa | Mobile Sensing Laboratory | • Programming model • Compiler • Run-time environment • Evaluation CSense Toolkit CSense Toolkit 5
  • 6. University of Iowa | Mobile Sensing Laboratory | • Applications modeled as Stream Flow Graphs (SFG) – builds on prior work on asynchronous data flow graphs – incorporates novel features to support MSA Programming Model CSense Toolkit 6 addComponent("audio", new AudioComponentC(rateInHz, 16)); addComponent("rmsClassifier", new RMSClassifierC(rms)); addComponent("mfcc", new MFCCFeaturesG(speechT, featureT)) ... link("audio", "rmsClassifier"); toTap("rmsClassifier::below"); link("rmsClassifier::above", "mfcc::sin"); fromMemory("mfcc::fin"); ... create components wire components
  • 7. University of Iowa | Mobile Sensing Laboratory | • Goal: Reduce memory overhead introduced by garbage collection and copy operations • Pass-by-reference semantics – allows for sharing data between components • Explicit inclusion of memory management in SFGs – focuses programmer’s attention on memory operations – enables static analysis by tracking data exchanges globally – allows for efficient implementation Memory Management CSense Toolkit 7
  • 8. University of Iowa | Mobile Sensing Laboratory | • Data flows from sources, through links, to taps • Implementation: – sources implement memory pools that hold several frames – references counters used to track sharing of frames – taps decrement reference counters Memory Management CSense Toolkit 8 Audio data MFCCs Filenames
  • 9. University of Iowa | Mobile Sensing Laboratory | • Goal: Expressive concurrency model that may be analyzed statically • Components are partitioned into execution domains – components in the same domain are executed on a thread – frame exchanges between domains are mediated using shared queues • Other data sharing between components are using a tuple space • Concurrency is specified as constraints – NEW_DOMAIN / SAME_DOMAIN – heuristic assignment of components to domains to minimize data exchanges between domains • Static analysis may identify some data races Concurrency Model CSense Toolkit 9
  • 10. University of Iowa | Mobile Sensing Laboratory | CSense Toolkit 10 Concurrency Model getComponent("audio").setThreading(Threading.NEW_DOMAIN); getComponent("httpPost").setThreading(Threading.NEW_DOMAIN); getComponent("mfcc").setThreading(Threading.SAME_DOMAIN); Compiler transformation
  • 11. University of Iowa | Mobile Sensing Laboratory | • Goal: Promote component reuse across MSAs • A rich type system that extends Java’s type system – most components use generic type systems – insight: frame sizes are essential in configuring components • detect configuration errors / optimization opportunities Type System CSense Toolkit 11 VectorC energyT = TypeC.newFloatVector(); energyT.addConstraint(Constraint.GT(8000)); energyT.addConstraint(Constraint.LT(24000)); VectorC speechT = TypeC.newFloatVector(128); VectorC featureT = TypeC.newFloatVector(11);
  • 12. University of Iowa | Mobile Sensing Laboratory | • Not all configurations may be implemented efficiently Flow Analysis CSense Toolkit 12 Constraints: energyT > 8000 energyT < 24000 speechT = 128 featuresT = 11 energyT speechT Inefficient 10,000 128 Efficient 10,240 (128 * 80) 128
  • 13. University of Iowa | Mobile Sensing Laboratory | • Not all configurations may be implemented efficiently Flow Analysis CSense Toolkit 13 Constraints: energyT > 8000 energyT < 24000 speechT = 128 featuresT = 11 energyT speechT Inefficient 10,000 128 Efficient 10,240 (128 * 80) 128 Mrms=1 Mmfcc=80 An efficient implementation exists when Mrms * energyT = Mmfcc * speechT
  • 14. University of Iowa | Mobile Sensing Laboratory | • Goal: determine configurations have efficient frame conversions • Problem may be formulated as an integer linear program – constraints: generated from type constraints – optimization: minimize total memory usage – solution: specifies frame sizes and multipliers for application • An efficient frame conversion may not exist – the compiler relaxes conversion rules Flow Analysis CSense Toolkit 14
  • 15. University of Iowa | Mobile Sensing Laboratory | • Static analysis: – composition errors, memory usage errors, race conditions • Flow analysis: – whole-application configuration and optimization • Stream Flow Graph transformations: – domain partitioning, type conversions, MATLAB component coalescing • Code generation: – Android application/service, MATLAB (C code + JNI stubs) CSense Compiler CSense Toolkit 15
  • 16. University of Iowa | Mobile Sensing Laboratory | • Components exchange data using push/pull semantics • Runtime includes a scheduler for each domain – task queue + event queue – wake lock – for power management CSense Runtime CSense Toolkit 16 Scheduler1Task Queue Event Queue Scheduler2 Task Queue Event Queue Memory Pool
  • 17. University of Iowa | Mobile Sensing Laboratory | • Micro benchmarks evaluate the runtime performance – synchronization primitives + memory management • Implemented the MSA using CSense – Speaker identification – Activity recognition – Audiology application • Setup – Galaxy Nexus, TI OMAP 4460 ARM [email protected] GHz, 1 GB – Android 4.2 – MATLAB 2012b and MATLAB Coder 2.3 Evaluation 17CSense Toolkit
  • 18. University of Iowa | Mobile Sensing Laboratory | • Scheduler: memory management + synchronization primitives • Memory management options – GC: garbage collection – MP: memory pool • Concurrent access to queues and memory pools – L: Java reentrant lock – C: CSense atomic variable based synchronization primitives Producer-Consumer Benchmark 18CSense Toolkit
  • 19. University of Iowa | Mobile Sensing Laboratory | 19 Producer-Consumer Throughput • Garbage collection overhead limits scalability • Concurrency primitives have a significant impact on performance 30% 13.8x CSense Toolkit 19x
  • 20. University of Iowa | Mobile Sensing Laboratory | • Reentrant locks incurs GC due to implicit allocations • CSense runtime has low garbage collection overhead Producer-Consumer GC Overhead 20 no garbage collection (in this benchmark) CSense Toolkit
  • 21. University of Iowa | Mobile Sensing Laboratory | • Benefits of flow analysis • Runtime overhead MFCC Benchmark 21CSense Toolkit
  • 22. University of Iowa | Mobile Sensing Laboratory | • Flow analysis eliminates unnecessary memory copy • Benefits of larger but efficient frame allocations – reduced number of component invocations and disk I/O overhead – Increased cache locality MFCC Benchmark CPU Usage 22 45% decrease CSense Toolkit
  • 23. University of Iowa | Mobile Sensing Laboratory | • Runtime overhead is low for a wide range of data rates MFCC Runtime Overhead 23 1.83% 2.39% CSense Toolkit
  • 24. University of Iowa | Mobile Sensing Laboratory | • Programming model – efficient memory management – flexible concurrency model – rich type system • Compiler – whole-application configuration & optimization – static and flow analyses • Efficient runtime environment • Evaluation – implemented three typical MSAs – benchmarks indicate significant performance improvements • 19X throughput boost compared with naïve Java baseline • 45% CPU time reduced with flow analysis • Low garbage collection overhead Conclusions 24CSense Toolkit
  • 25. University of Iowa | Mobile Sensing Laboratory | • National Science Foundation (NeTs grant #1144664 ) • Carver Foundation (grant #14-43555 ) Acknowledgements 25CSense Toolkit
  • 26. University of Iowa | Mobile Sensing Laboratory | • Runtime scheduler overhead of a complex 6-domain application that accesses both phone sensors and remote shimmer motes over bluetooth ActiSense Benchmark 26CSense Toolkit
  • 27. University of Iowa | Mobile Sensing Laboratory | • Runtime scheduler overhead of a complex 6-domain application that accesses both phone sensors and remote shimmer motes over bluetooth ActiSense Benchmark 27CSense Toolkit
  • 28. University of Iowa | Mobile Sensing Laboratory | • Overall domain scheduler overhead is small despite a longer pipeline ActiSense CPU Usage 28 50 Hz 60 Hz CSense Toolkit
  • 29. University of Iowa | Mobile Sensing Laboratory | AudioSense 29CSense Toolkit
  • 30. University of Iowa | Mobile Sensing Laboratory | AudioSense 30CSense Toolkit

Editor's Notes

  • #3: With the popularity of smart devices, there is increasing demand of developing mobile sensing application to captureandanalyze physical activities, social interactions and ambient information from rich sensors.Here are two typical mobile sensing applications.The top one is SpeakerIdentification and the bottom one is Activity Recognition.Both applications work in a similar way.First, they collect sensor data that can be local or remote.Next, features are extracted from the sensor data.Finally, the features may be used to perform real time classification or uploaded to a remote server for offline recognition.
  • #4: Though these applications are conceptually straightforward, it is not trivial for programmers to implement efficiently due to the following challenges.This first challenge is Concurrency.Apparently, MSAs are multi-threaded because sensor reading, network communication, interacting with users and the environment may happen concurrently.However,multi-threading is usuallyerror-prone due to data races and even deadlocks.The next challenge is high frames rate.For example, the audio and video sources tend to produce a large mount of data constantly.That stresses memory management.The third challenge is robustness.Well, mobile sensing applications are usually expected to run long-term data collection in the background.It would be unacceptable to bother users due to crash or restarts.So far, our main target is the Android platform. However, the underlying Java virtual machine even worsens the problems because of higher computational overhead and non-deterministic garbage collection.Therefore, we propose the CSense Toolkitto address the challenges without sacrificing the performance.
  • #5: Before introducing the design of CSense, I would like to go through the related work.First, in term of support for MSAs, prior work like SeeMon, Coordinator and JigSaw require programmer to use their special constructs to develop specific types sensing applications.CSense, on the other hand, provides a high-level stream programming abstraction general and suitable for a broad range of MSAs Second, CSensebuildsonthedataflowmodels. There are two categories.One is the synchronous data flow, like the StreamIt and Lustre, which enforces static scheduling and optimizations.However, if you need to process asynchronous events, you’re on your own to adapt it.The other is asynchronous data flow, like Click/XStream which provides asynchronous constructs but sacrifices some performance.Our CSense toolkit adopts the asynchronous data flow model but improve the performance with compile time analysis.
  • #6: For the remainder of the talk, I will introduce the CSense programming model, compiler and runtime environment.Evaluation results will be presented later.
  • #7: Here is the programming model.A MSA is represented as a SFG which is a directed acyclic graph with nodes implemented as components connected through input and output ports.The application of Speaker Identification is shown as an example.The following Javacode segmentshows how we create and wire the components.
  • #8: What’s different between CSense and previous work is the focus of memory management.The goal here is to reduce memory overhead introduced by garbage collection and copy operations.The CSense programming model not only adopts the pass-by-ref semantics to facilitate data sharing between components but also makes explicit inclusion of memory management in a SFG which focuses programmers’ attention on memory operations, tracks data exchange globally and allows for efficient implementation.
  • #9: Let’s take a look at the memory management in the SpeakerIdentification example.In a SFG, only two special components called source and tap are allowed to perform memory management.The sources implement memory pools and pre-allocate frames.When in execution, the frames are taken from the pools and flow form sources, through links to taps.The tap puts the frame back to the memory pool to ensure no leaks.In this example, there are three sources, the audio component, S1 and S2.The data flows follow the colored links and reach the corresponding taps.On the other hand, if a frame is shared between components, its associated reference counter is incremented.When reaching taps, the reference counter is decremented.If the counter is zero, the frame is put back to its memory pool.
  • #10: As for the concurrency challenge, the goal is to expose the concurrency model that may be analyzed statically.The idea is partition the components in a SFG into execution domains.A domain is a connected subgraph of components executed on a single thread.Any frame exchange between domains should be mediated by a shared queue.Other data sharing between components are using a tuple space.Currently, the CSense programming model provides several concurrency constraints such the NEW_DOMAIN and SAME_DOMAIN. Based on the domain partition information, it is possible for compiler analysis to identify data races.
  • #11: Here is the same example.The audio and httpPost components declare new domains.The domain partitioning starts with the two components and expands by including other components in the downstream direction.The other components are added to the domains of adjacent components.After partitioning the SFG, the first four components including S1, S2 and T1 to T3 are in one domain while the httpPost and T4 are in the other domain. With this information, the compiler is able to transform the graph by inserting a shared queue between the two domains for data exchange.Another special concurrency option is SAME_DOMAIN.This annotation is used for a group component that is composed of several related subcomponents.It make sense that those components should be partitioned into the same domain to avoid cross-domain data exchange overhead.
  • #12: Next, we introduce the type system which extends the Java generic types.It is designed ensure the correctness of component composition and facilitates efficient component reuse in different applications.In a SFG, all the input/output ports are typed and allow programmers to specify frame size constraints.The frame size is the amount of data produced or consumed once by a component through a port.Here is the code segment showing how to specify the type constraints.
  • #13: Apparently, there are many frame size configurations satisfying the constraints.However, not all the configurations can be implemented efficiently.In this example, let’s focus on the output port of the energyT and the input port of the speechTThe energyT constrains the output frame size to be greater than 8000 and less than 24,000.The speechT constrains the input frame size to be exactly 128.Now, consider the following two configurations.The first configuration sets the energyT frame size to 10,000 which is not a multiple of 128.That is, there won’t be efficient frame size conversion without frame remainder that causes additional memory copy.In contrast, the second configuration set the energyT frame size to 10,240 which can be divided into 80 frames of the SpeechT. This allows for an efficient frame size conversion.
  • #14: Now, to make it general, we introduce the concept of multiplier which is the number of executions for a component to produce or consume the entire input or output.Whattheflowanalysisdoesistofindtheconstrainedframesizesandmultipliersthatresultinacommonmultiple.Thecommonmultipleisthe resulting frame size to allocate and representedastheequalityconstraintwhichisimplicitlyaddedbythecompiler.
  • #15: Next, to apply the flow analysis to the entire SFG, the compiler formulates an integer program by adding all theconstraints for each pair of input/output ports. Thenthecompilercalls an external solver to derive a solutiontotheframesizesandmultipliers.Theobjectiveis tominimize the totalmemoryusage.Nonetheless, if there is no such solution, the compiler may return inefficient configurationsandshows awarning.Theprogrammersmayrelaxtheconstraintsforanefficientsolution.
  • #16: So, in summary, given all the information about the SFG of a MSA, the CSense compiler first performs the staticanalysisto prevent composition errors, memory usage errors, race conditions.Second, the compilerapplies flowanalysis to derivewhole-applicationframe sizeconfigurationsofcomponents.Third,thecompilermay transformstheSFGbyinsertingsharedqueuesbetweendomains,andtypeconvertersbetween pairs of input/outputports whichare not compatible.Inaddition,connected MATLABcomponentsmaybecoalesced.A MATLAB component is created by wrapping the C code of the MATLB function generated by the MATLAB coder.The coalescing simply combine the MATLAB functions first and then generate a single component toreducedataexchangeoverheadbetweentheJavaspaceandthenativespace.Finally,thecompilergeneratesthetargetAndroidapplication code that links to the native MATLAB functions.
  • #17: Aftertheapplicationisinstalledonthetargetdevice,itisexecutedbytheCSenseruntimetodrivethedataflowfromsourcesthroughcomponentstotaps.Theruntimeincludes aschedulerforeachdomain.Aschedulemaintainsatask queue,an event queueandan Androidwakelock.The task queue allows a component to schedule for execution as soon as possible.The timer queue allows a component to schedule a delayed event to process at a specified time.The wake lock is associated with the Android power management.Whenever there is no wake lock acquiredby any application, the Android device is put to deep sleep soon.Forourschedulers,if the task queue is empty, scheduler determines whether to release wake lock and goes to sleep.
  • #18: Next, I am going to present the CSense runtime performance evaluations based on several benchmarks.Besides,wehaveimplementedthreeMSAsto validate CSense.The speakeridentification.Theactivityrecognition.Thehearingaidsurvey application for audiology that combines subjective questionnaire and objective data collection to capture the listening context.Here is the experimental setup.WeuseGalaxyNexus,Android4.2, MATLABandMATLABcoder.
  • #19: Ourfirst producer-consumer benchmark is conducted to evaluate the performance of data exchange between two domains via a shared queue. We are especially interested in the impacts ofdifferentmemorymanagementoptionsandsynchronizationprimitives.Formemorymanagement, there are two configurationstoallocateframes.GC stands for garbage collection.Framesarecreatedwhenneeded.MP stands for the memory pool.Framesarepre-allocated and reused.As for the concurrent access to the sharedqueue and memory pool,Configuration L stands for the Java reentrant lock.Configuration C stands for the CSense atomic variable based synchronization primitives which utilize the hardware compare-and-swap instructions. It is designed for a thread to retry acquiring the access to a shared resource without being suspended on failure.
  • #20: Here we show the throughputinthisfigure.The x-axis represents the production rate while the y-axis represents the consumption rate.Ideally,bothratesshouldbeequal.Now, as you can see, the GC and L lead to the lowest throughput.Replace GC with memory pools, the throughput is improved by 13.8 times.Replace the Java reentrant lock with the CSense synchronization primitives, the throughput is further improved by 30%.So, the total throughput improvement is about 19x times.The is mainly becauseGC and the Java reentrant lockcausefrequentthreadsuspensions and switching.In summary, Garbage collection overhead limits scalability and concurrency primitives have a significant impact on performance.
  • #21: Next, we want to further understand the garbage collection overhead.In this figure, the x-axis stands for the production rate and the y-axis stands for the time spent in garbage collection.As you can see, with memory pools and the CSense synchronization primitives, it is possible to achieve zero garbage collection.If only memory pools are used, the Java reentrant lock still incurs garbage collection because of implicit object creations.Insummary, the CSense runtime incurslittle garbage collection overhead.
  • #22: Next, we evaluate the benefits of flow analysis and the runtime overhead in the MFCC benchmark which is the simplified Speaker Identification application by removing the httpPost component.
  • #23: Here,weshowthebenefitsofflowanalysis as the reduction of CPUusage.In the left figure, the x-axis stands for the audio sampling rate.The y-axis stands for the total CPU usage of the benmark.As you can see, with flow analysis, the total CPU usage can be reduced up to 45% at the highest sampling rate.To further understand reduction of CPU usage, we break down the total CPU usage into per-component CPU usage in the right figure.Thex-axisstandsforthecomponents.The y-axis stands for the component CPU usage.For theMFCC component, flow analysis eliminates unnecessary memory copy and increases cache locality for execution.For the other components, flow analysis leads to larger but efficient frame allocations that reduce the number of component invocations and disk I/O overhead especially for those components writing to the storage.
  • #24: Finally, we want to understand the CSense runtime overhead.The overhead is computed by subtracting thesumofcomponent CPU time from the total application CPU time.In the figure, the x-axis represents the sampling rate.The y-axis of the bottom figure shows the percentage of the overhead over the total CPU time.As you can see, the percentage of the overhead is low and does not grow with the workload.In the top figure, we further decompose the runtime overhead into the scheduler overhead and sleep overhead.The sleep overhead is incurred when the scheduler calls to sleep() which should be small.The schedule overhead is spent to pass frames between components and access memory pools.Clearly,the scheduler overhead is even smaller than the sleep overhead.Therefore,weconcludetheruntime overhead is low for a wide range of data rates.
  • #25: Alright,I have introduced the main design of the CSense toolkit.Inconclusion,theCSenseprogrammingmodelprovidesefficient memorymanagement, a flexible concurrencymodel anda richtypesystem.TheCSensecompilerperformswhole-application optimization based the static and flow analyses.TheCSenseruntimeisefficientwithlowoverheadandintegrated withAndroidwakelocks.Wehave implementedthreetypicalMSAsto validateCSense.Thebenchmarks indicate significant performance improvementswithmemorypools,CSensesynchronizationprimitives andflow analysis.
  • #26: We especially thank and acknowledge our funding sources.Now, I think it’s time to take your questions.
  • #29: Accelerometer pipelines involve intensive operationsDomain CPU usage grows with sampling rates and length of pipelinesShimmer pipelines involve more components and thus more overheadMaking predictions per sec induce smaller superframe sizeDomain CPU timePhone 60 HzShimmer 50 Hz
  • #30: Electronic surveysAmbient sound samples and GPSDeployed for six months as part of a clinical studyReliability = uploaded / collected0  server offline due to power outages&lt; 100%  move out of wireless signal cover in the study area Reliability during weeklong deployments
  • #31: Mature to support long-term deployments