


default search action
31st ICS 2017: Chicago, IL, USA
- William D. Gropp, Pete Beckman, Zhiyuan Li, Francisco J. Cazorla:

Proceedings of the International Conference on Supercomputing, ICS 2017, Chicago, IL, USA, June 14-16, 2017. ACM 2017, ISBN 978-1-4503-5020-4
Automata and tree-mining optimization
- Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, Michela Becchi:

Demystifying automata processing: GPUs, FPGAs or Micron's AP? 1:1-1:11 - Junqiao Qiu

, Zhijia Zhao, Bo Wu, Abhinav Vishnu, Shuaiwen Leon Song:
Enabling scalability-sensitive speculative parallelization for FSM computations. 2:1-2:10 - Nikhil Hegde, Jianqiao Liu, Milind Kulkarni:

SPIRIT: a framework for creating distributed recursive tree applications. 3:1-3:11 - Elaheh Sadredini

, Reza Rahimi, Ke Wang, Kevin Skadron
:
Frequent subtree mining on the automata processor: challenges and opportunities. 4:1-4:11
GPUs - part 1
- Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov

, Jack J. Dongarra:
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs. 5:1-5:10 - Kyung Hoon Kim, Rahul Boyapati, Jiayi Huang

, Yuho Jin, Ki Hwan Yum, Eun Jung Kim:
Packet coalescing exploiting data redundancy in GPGPU architectures. 6:1-6:10 - Andreas Derler, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger

:
Dynamic scheduling for efficient hierarchical sparse matrix operations on the GPU. 7:1-7:10
Compilation techniques
- Aleksandar Zlateski, H. Sebastian Seung:

Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUs. 8:1-8:10 - Ehsan Totoni, Todd A. Anderson, Tatiana Shpeisman:

HPAT: high performance analytics with scripting ease-of-use. 9:1-9:10 - Diogo Nunes Sampaio, Louis-Noël Pouchet, Fabrice Rastello:

Simplification and runtime resolution of data dependence constraints for loop transformations. 10:1-10:11 - Suyash Gupta

, Rahul Shrivastava, V. Krishna Nandivada
:
Optimizing recursive task parallel programs. 11:1-11:11
GPUs - part 2
- Kaixi Hou, Weifeng Liu

, Hao Wang
, Wu-chun Feng:
Fast segmented sort on GPUs. 12:1-12:10 - Markus Steinberger

, Rhaleb Zayer, Hans-Peter Seidel:
Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU. 13:1-13:11 - Rakshith Kunchum, Ankur Chaudhry, Aravind Sukumaran-Rajam

, Qingpeng Niu, Israt Nisa, P. Sadayappan
:
On improving performance of sparse matrix-matrix multiplication on GPUs. 14:1-14:11 - Keren Zhou

, Guangming Tan, Xiuxia Zhang, Chaowei Wang, Ninghui Sun:
A performance analysis framework for exploiting GPU microarchitectural capability. 15:1-15:10
Application load imbalance, task and data mapping
- Jiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos

:
GraphGrind: addressing load imbalance of graph partitioning. 16:1-16:10 - Juan J. Galvez, Nikhil Jain, Laxmikant V. Kalé:

Automatic topology mapping of diverse large-scale parallel applications. 17:1-17:10 - Seongdae Yu, Seongbeom Park, Woongki Baek:

Design and implementation of bandwidth-aware memory placement and migration policies for heterogeneous memory systems. 18:1-18:10
Hardware design
- Xi-Yue Xiang, Wentao Shi, Saugata Ghose, Lu Peng, Onur Mutlu

, Nian-Feng Tzeng
:
Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviation. 19:1-19:11 - J. Rubén Titos Gil

, Antonio Flores, Ricardo Fernández-Pascual
, Alberto Ros
, Manuel E. Acacio
:
Way-combining directory: an adaptive and scalable low-cost coherence directory. 20:1-20:10
Runtimes and algorithms for parallel-application performance and reliability support
- Sicong Zhuang, Marc Casas

:
Iteration-fusing conjugate gradient. 21:1-21:10 - Antonio J. Peña

, Vicenç Beltran
, Carsten Clauss, Thomas Moschny:
Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques. 22:1-22:10 - Aurangzeb, Rudolf Eigenmann:

HiPA: history-based piecewise approximation for functions. 23:1-23:10
Data aggregation and hardware/software co-design approaches
- Peng Jiang

, Gagan Agrawal:
Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation. 24:1-24:11 - Joao P. L. de Carvalho

, Guido Araujo, Alexandro Baldassin
:
Revisiting phased transactional memory. 25:1-25:10 - Haikun Liu, Yujie Chen, Xiaofei Liao, Hai Jin, Bingsheng He

, Long Zheng, Rentong Guo:
Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures. 26:1-26:10 - Xuanhua Shi, Ming Li, Wei Liu, Hai Jin, Chen Yu, Yong Chen

:
SSDUP: a traffic-aware ssd burst buffer for HPC systems. 27:1-27:10 - Cristobal Ortega

, Miquel Moretó
, Marc Casas
, Ramon Bertran
, Alper Buyuktosunoglu, Alexandre E. Eichenberger, Pradip Bose:
libPRISM: an intelligent adaptation of prefetch and SMT levels. 28:1-28:10

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














