


default search action
ICPP 2023: Salt Lake City, UT, USA
- Proceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023. ACM 2023

Numerics (In-Person)
- Sameer Deshmukh

, Rio Yokota
, George Bosilca
, Qianxiang Ma
:
O(N) distributed direct factorization of structured dense matrices using runtime systems. 1-10
Optimization of AI/ML (In Person)
- Georgia Channing

, Ria Patel
, Paula Olaya
, Ariel Keller Rorabaugh
, Osamu Miyashita
, Silvina Caíno-Lores
, Catherine D. Schuman
, Florence Tama
, Michela Taufer
:
Composable Workflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Classification. 1
Numerics (In-Person)
- M. Ridwan Apriansyah

, Rio Yokota
:
Computing the k-th Eigenvalue of Symmetric H2-Matrices. 11-20 - Junqing Lin

, Honghe Zhang
, Xiaolong Shi
, Jingwei Sun
, Xianzhi Yu
, Jun Yao
, Guangzhong Sun
:
EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs. 21-30
Compression and Encoding (In Person)
- Fangzheng Lin

, Kasidis Arunruangsirilert
, Heming Sun
, Jiro Katto
:
Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability. 31-40 - Mi Zhang

, Qihan Kang
, Patrick P. C. Lee
:
Minimizing Network and Storage Costs for Consensus with Flexible Erasure Coding. 41-50 - Shui Jiang

, Tsung-Wei Huang
, Bei Yu
, Tsung-Yi Ho
:
SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU. 51-61
AI/ML Performance (Remote Session)
- Lixiao Cui

, Kedi Yang
, Yusen Li
, Gang Wang
, Xiaoguang Liu
:
DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management. 62-71 - Hesheng Sun

, Xinyi Chen
, Zhuzhong Qian
, Zengji Li
, Ning Chen
, Tuo Cao
, Suwei Xu
, Yitong Zhou
:
BIRP: Batch-aware Inference Workload Redistribution and Parallel Scheme for Edge Collaboration. 72-81 - Yongwen Qiu

, Yongmei Lei
, Guozheng Wang
:
PSRA-HGADMM: A Communication Efficient Distributed ADMM Algorithm. 82-91 - Zhenxing Li

, Qiang Cao
, Yajie Chen
, Wenrui Yan
:
CoTrain: Efficient Scheduling for Large-Model Training upon GPU and CPU in Parallel. 92-101 - Zixuan Chen

, Lei Shi
, Xuandong Liu
, Jiahui Li
, Sen Liu
, Yang Xu
:
OSP: Boosting Distributed Model Training with 2-stage Synchronization. 102-111 - Yuning Zhang

, Zao Zhang
, Wei Bao
, Dong Yuan
:
ITIF: Integrated Transformers Inference Framework for Multiple Tenants on GPU. 112-121
Graph Algorithms (In Person)
- Bin Guo

, Emil Sekerinski
:
Parallel Order-Based Core Maintenance in Dynamic Graphs. 122-131 - Md Abdul Motaleb Faysal

, Maximilian H. Bremer
, Cy P. Chan
, John Shalf
, Shaikh Arifuzzaman
:
Fast Parallel Index Construction for Efficient K-truss-based Local Community Detection in Large Graphs. 132-141 - Samiran Kawtikwar

, Mohammad Almasri
, Wen-Mei Hwu
, Rakesh Nagi
, Jinjun Xiong
:
BEEP: Balanced Efficient subgraph Enumeration in Parallel. 142-152
Programming Models (In Person)
- Omri Mor

, George Bosilca
, Marc Snir
:
Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine. 153-162 - Romain Pereira

, Adrien Roussel
, Patrick Carribault
, Thierry Gautier
:
Investigating Dependency Graph Discovery Impact on Task-based MPI+OpenMP Applications Performances. 163-172 - Eric Wright

, Johannes Doerfert
, Shilei Tian
, Barbara M. Chapman
, Sunita Chandrasekaran
:
Implementing OpenMP's SIMD Directive in LLVM's GPU Runtime. 173-182
Applications (Remote Session)
- Peng Wang

, Yu Liu
, Zhelong Zhao
, Ke Zhou
, Zhihai Huang
, Yanxiong Chen
:
Smart Cache Insertion and Promotion Policy for Content Delivery Networks. 183-192 - Haowen Zhang

, Jing Li
, He Zhao
, Tong Zhou
, Nianzu Sheng
, Hengyu Pan
:
BlockPilot: A Proposer-Validator Parallel Execution Framework for Blockchain. 193-202 - Chenyang Jiao

, Weihua Zhang
, Li Shen
:
Communication Optimizations for State-vector Quantum Simulator on CPU+GPU Clusters. 203-212
LMS-Tree Research (Remote Session)
- Zepeng Wang

, Shu Yin
:
RBC: A bandwidth controller to reduce write-stalls and tail latency. 213-222 - Ziyi Lu

, Qiang Cao
, Shucheng Wang
, Jie Yao
, Xiangrui Yang
:
PMLDS: An LSM-Tree Direct Managed Storage for Key-Value Stores on Byte-Addressable Devices. 223-232 - Chen Ding

, Jian Zhou
, Jiguang Wan
, Yiqin Xiong
, Sicen Li
, Shuning Chen
, Hanyang Liu
, Liu Tang
, Ling Zhan
, Kai Lu
, Peng Xu
:
DComp: Efficient Offload of LSM-tree Compaction with Data Processing Units. 233-243
Applications (Remote Session, Part II)
- Jiali Li

, Xianzhang Chen
, Duo Liu
, Ao Ren
, Zhaoyang Zeng
, Yujuan Tan
:
RadarSSD: A Computational Storage for Radar Signal Processing. 244-253
Training (In Person)
- Sixu Hu

, Qinbin Li
, Bingsheng He
:
Communication-Efficient Generalized Neuron Matching for Federated Learning. 254-263 - Jiyao Liu

, Xinliang Wei
, Xuanzhang Liu
, Hongchang Gao
, Yu Wang
:
Group-based Hierarchical Federated Learning: Convergence, Group Formation, and Sampling. 264-273 - Feiwen Zhu

, Michal Futrega
, Han Bao
, Sukru Burc Eryilmaz
, Fei Kong
, Kefeng Duan
, Xinnian Zheng
, Nimrod Angel
, Matthias Jouanneaux
, Maximilian Stadler
, Michal Marcinkiewicz
, Fung Xie
, June Yang
, Michael Andersch
:
FastDimeNet++: Training DimeNet++ in 22 minutes. 274-284
Communication (In Person)
- Thomas Gillis

, Ken Raffenetti
, Hui Zhou
, Yanfei Guo
, Rajeev Thakur
:
Quantifying the Performance Benefits of Partitioned Communication in MPI. 285-294 - George Katevenis

, Manolis Ploumidis
, Manolis Marazakis
:
Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors. 295-305 - Whit Schonbein

, Scott Levy
, Matthew G. F. Dosanjh
, W. Pepper Marts
, Elizabeth Reid
, Ryan E. Grant
:
Modeling and Benchmarking the Potential Benefit of Early-Bird Transmission in Fine-Grained Communication. 306-316
System Software (Remote Session)
- Tiannuo Yang

, Ruobing Chen
, Yusen Li
, Xiaoguang Liu
, Gang Wang
:
CoTuner: A Hierarchical Learning Framework for Coordinately Optimizing Resource Partitioning and Parameter Tuning. 317-326 - Jingrun Zhang

, Guangba Yu
, Zilong He
, Liang Ai
, Pengfei Chen
:
DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems. 327-336 - Yi Bian

, Fangyu Zheng
, Yuewu Wang
, Lingguang Lei
, Yuan Ma
, Jiankuo Dong
, Jiwu Jing
:
AsyncGBP: Unleashing the Potential of Heterogeneous Computing for SSL/TLS with GPU-based Provider. 337-346 - Benran Wang

, Hongyang Chen
, Pengfei Chen
, Zilong He
, Guangba Yu
:
MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry. 347-357 - Xianzhi Zhu

, Yongkun Li
, Lulu Yao
, Zhihao Qi
, Yinlong Xu
, Pengcheng Wang
, Weiguang Wang
, Xia Zhu
:
On Optimizing Traffic Scheduling for Multi-replica Containerized Microservices. 358-368 - Xinxin Qi

, Juan Chen
, Yong Dong
, Yuan Yuan
, Tao Xu
, Rongyu Deng
, Zekai Li
, Kexing Zhou
, Zheng Wang
:
HighRPM: Combining Integrated Measurement and Sofware Power Modeling for High-Resolution Power Monitoring. 369-379
Applications (In Person)
- Suneth Dasantha Ekanayake

, István Zoltan Reguly
, Fabio Luporini
, Gihan Ravideva Mudalige
:
Communication-Avoiding Optimizations for Large-Scale Unstructured-Mesh Applications with OP2. 380-391 - Abbas Haghi

, Lluc Alvarez
, Jordi Fornt
, Juan Miguel De Haro Ruiz
, Roger Figueras
, Max Doblas
, Santiago Marco-Sola
, Miquel Moretó
:
WFAsic: A High-Performance ASIC Accelerator for DNA Sequence Alignment on a RISC-V SoC. 392-401 - Jiechao Gao

, Wenpeng Wang
, Fateme Nikseresht
, Viswajith Govinda Rajan
, Bradford Campbell
:
PFDRL: Personalized Federated Deep Reinforcement Learning for Residential Energy Management. 402-411
Resource Scheduling and Adaptation (In Person)
- Hengwei Xu

, Pengyuan Zhou
, Haiyong Xie
, Yong Liao
:
Mercury: Fast and Optimal Device Placement for Large Deep Learning Models. 412-422 - Suraiya Tairin

, Haiying Shen
, Zeyu Zhang
:
Embracing Uncertainty for Equity in Resource Allocation in ML Training. 423-432 - Ghazanfar Ali

, Mert Side
, Sridutt Bhalachandra
, Nicholas J. Wright
, Yong Chen
:
Performance-Aware Energy-Efficient GPU Frequency Selection using DNN-based Models. 433-442
Federated Learning (Remote Session)
- Jieling Yu

, Ruiting Zhou
, Chen Chen
, Bo Li
, Fang Dong
:
ASFL: Adaptive Semi-asynchronous Federated Learning for Balancing Model Accuracy and Total Latency in Mobile Edge Networks. 443-451 - Mengyao Du

, Miao Zhang
, Lin Liu
, Kai Xu
, Quanjun Yin
:
Credit-based Differential Privacy Stochastic Model Aggregation Algorithm for Robust Federated Learning via Blockchain. 452-461 - Songli Zhang

, Zhenzhe Zheng
, Fan Wu
, Bingshuai Li
, Yunfeng Shao
, Guihai Chen
:
Learning From Your Neighbours: Mobility-Driven Device-Edge-Cloud Federated Learning. 462-471 - Qingyuan Wang

, Bin Gao
, Zhi Zhou
, Fei Xu
, Chenghao Ouyang
:
DAG-Aware Optimization for Geo-Distributed Data Analytics. 472-481 - YuAng Chen

, Yeh-Ching Chung
:
Connectivity-Aware Link Analysis for Skewed Graphs. 482-491 - Haishuang Fan

, Ming Li
, Jingya Wu
, Wenyan Lu
, Xiaowei Li
, Guihai Yan
:
BitColor: Accelerating Large-Scale Graph Coloring on FPGA with Parallel Bit-Wise Engines. 492-502
Graph-Related Techniques (In Person)
- Andrey Prokopenko

, Damien Lebrun-Grandié
, Daniel Arndt
:
Fast tree-based algorithms for DBSCAN for low-dimensional data on GPUs. 503-512 - Qinglin Lu

, Xinyu Wang
, Wenjing Ma
, Yuwen Zhao
, Daokun Chen
, Fangfang Liu
:
GFFT: a Task Graph Based Fast Fourier Transform Optimization Framework. 513-523 - Octavi Obiols-Sales

, Abhinav Vishnu
, Nicholas Malaya
, Aparna Chandramowlishwaran
:
ADARNet: Deep Learning Predicts Adaptive Mesh Refinement. 524-534
Memory and Storage (In Person)
- Louis-Claude Canon

, Anthony Dugois
, Loris Marchal
, Etienne Rivière
:
Hector: A Framework to Design and Evaluate Scheduling Strategies in Persistent Key-Value Stores. 535-545 - Jong-Hyun Jeong

, Myung Kuk Yoon
, Yunho Oh
, Gunjae Koo
:
Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors. 546-555
Networks (Remote Session)
- Fei Dai

, Yawen Chen
, Zhiyi Huang
, Haibo Zhang
:
Wrht: Efficient All-reduce for Distributed DNN Training in Optical Interconnect Systems. 556-565 - Hao Zhang

, Yawen Chen, Zhiyi Huang
, Haibo Zhang
, Fei Dai
:
SEECHIP: A Scalable and Energy-Efficient Chiplet-based GPU Architecture Using Photonic Links. 566-575 - Jinbin Hu

, Yi He
, Jin Wang
, Wangqing Luo
, Jiawei Huang
:
RLB: Reordering-Robust Load Balancing in Lossless Datacenter Networks. 576-584
Scheduling (Remote Session)
- Hehuan Shi

, Lin Chen, Ming Lin
, Raphael C.-W. Phan
:
Scheduling Dependent Batching Tasks. 585-594 - Yicheng Feng

, Shihao Shen
, Mengwei Xu
, Yuanming Ren
, Xiaofei Wang
, Victor C. M. Leung
, Wenyu Wang:
Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds. 595-604 - Diaohan Luo

, Tian Yu
, Yuewen Wu
, Heng Wu
, Tao Wang
, Wenbo Zhang
:
SPLIT: QoS-Aware DNN Inference on Shared GPU via Evenly-Sized Model Splitting. 605-614 - Huadong Li

, Hui Liu
, Changyuan Liu
, Aoqi Chen
, Zhaocheng Niu
, Junzhao Du
:
NeiLatS: Neighbor-Aware Latency-Sensitive Application Scheduling in Heterogeneous Cloud-Edge Environment. 615-624
Inference (In Person)
- Xueyu Hou

, Yongjie Guan
, Tao Han
:
Dystri: A Dynamic Inference based Distributed DNN Service Framework on Edge. 625-634 - Jianfeng Gu

, Yichao Zhu
, Puxuan Wang
, Mohak Chadha
, Michael Gerndt
:
FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference. 635-644 - Beilei Jiang

, Xianwei Cheng
, Yuan Li
, Jocelyn Zhang
, Song Fu
, Qing Yang
, Mingxiong Liu
, Alejandro Olvera
:
Output-Directed Dynamic Quantization for DNN Acceleration. 645-654
Compilation and Checkpointing Techniques (In Person)
- Jan Hückelheim

, Johannes Doerfert
:
ORAQL - Optimistic Responses to Alias Queries in LLVM. 655-664 - Nigel Tan

, Jakob Lüttgau
, Jack Marquez
, Keita Teranishi
, Nicolas M. Morales
, Sanjukta Bhowmick
, Franck Cappello
, Michela Taufer
, Bogdan Nicolae
:
Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication. 665-674 - Masaki Nakata

, Shigeyuki Sato
, Tomoharu Ugawa
:
General-purpose Asynchronous Periodic Checkpointing in Hybrid Memory. 675-684
Memory and Storage (Remote Session)
- Zhenlin Qi

, Shengan Zheng
, Yifeng Hui
, Bowen Zhang
, Linpeng Huang
:
Conflux: Exploiting Persistent Memory and RDMA Bandwidth via Adaptive I/O Mode Selection. 685-694 - Hang An

, Fang Wang
, Dan Feng
, Xiaomin Zou
, Zefeng Liu
, Jianshun Zhang
:
Marlin: A Concurrent and Write-Optimized B+-tree Index on Disaggregated Memory. 695-704 - Weiming Huang

, Yajuan Du
, Mingyang Liu
:
GPU Performance Acceleration via Intra-Group Sharing TLB. 705-714 - Baorong Ding

, Mingcong Han
, Rong Chen
:
DArray: A High Performance RDMA-Based Distributed Array. 715-724 - Hao Zhao

, Si Wu
, Haifeng Liu
, Zhixiang Tang
, Xiaochun He
, Yinlong Xu
:
Toward Optimal Repair and Load Balance in Locally Repairable Codes. 725-735 - Zhigang Cai, Chengyong Tang, Minjun Li, François Trahay, Jun Li, Zhibing Sha, Jiaojiao Wu, Fan Yang, Jianwei Liao:

Re-aligning Across-page Requests for Flash-based Solid-state Drives. 736-745
Optimization of AI/ML (In Person)
- Daegun Yoon

, Sangyoon Oh
:
DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. 746-755 - Shenggui Li

, Hongxin Liu
, Zhengda Bian
, Jiarui Fang
, Haichen Huang
, Yuliang Liu
, Boxiang Wang
, Yang You
:
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. 766-775
Numerics (Remote Session)
- Jie Yan

, Zhang Yang
, Aiqing Zhang
, Zeyao Mo
:
JSweep: A Patch-centric Data-driven Approach for Parallel Sweeps on Large-scale Meshes. 776-785 - Mingzhen Li

, Hailong Yang
, Shanjun Zhang
, Fengwei Yu
, Ruihao Gong
, Yi Liu
, Zhongzhi Luan
, Depei Qian
:
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs. 786-796 - Zhao Liu

, Xuesen Chu
, Xiaojing Lv
, Hanyue Liu
, Haohuan Fu
, Guangwen Yang
:
Accelerating Large-Scale CFD Simulations with Lattice Boltzmann Method on a 40-Million-Core Sunway Supercomputer. 797-806 - Helin Cheng

, Wenxuan Li
, Yuechen Lu
, Weifeng Liu
:
HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors. 807-817 - Ran Zhao

, Chao Li
, Xiaowei Guo
, Yi Liu
, Sifan Long
, Sen Zhang
, Yanlong Qiu
, Canqun Yang
:
An Improved Parallel Overset Grid Method for Fluid Simulation with Moving Boundary. 818-827 - Jing Chen

, Madhavan Manivannan
, Bhavishya Goel
, Miquel Pericàs
:
JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency. 828-838

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














