"Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms",
Simulation Modelling Practice and Theory, vol. 58: Elsevier, pp. 30-39, 11/2015.
Download: simpat2015.pdf (1.63 MB)
"Asymmetric communication models for resource-constrained hierarchical Ethernet networks",
Concurrency and Computation: Practice and Experience, vol. 27, issue 6: Wiley, pp. 1575-1590, 25/04/2015.
Download: cpe3343.pdf (1.69 MB)
"Hierarchical Optimization of MPI Reduce Algorithms",
13th International Conference on Parallel Computing Technologies (PaCT-2015), Petrozavodsk, Russia, Lecture Notes in Computer Science 9251, Springer, pp. 21-34, 31 Aug - 4 Sept, 2015.
Download: pact2015reduce.pdf (812.37 KB)
"Towards Application Energy Measurement and Modelling Tool Support",
13th International Conference on Parallel Computing Technologies (PaCT-2015), Petrozavodsk, Russia, Lecture Notes in Computer Science 9251, Springer, pp. 91-101, 31 Aug - 4 Sept, 2015.
Download: pact2015energy.pdf (383.55 KB)
"Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms",
Concurrency and Computation: Practice and Experience, vol. 28, issue 3: Wiley, pp. 802-821, 03/2016.
Abstract
"Extending τ -Lop to model concurrent MPI communications in multicore clusters",
Future Generation Computer Systems, vol. 61: Elsevier, pp. 66-82, 08/2016.
Download: fgcs2016.pdf (985.73 KB)
"Topology-aware Optimization of Communication Cost of Parallel Applications in Heterogeneous HPC Systems",
School of Computer Science, Dublin, University College Dublin, pp. 106, 09/2016.
Download: thesis.pdf (1 MB)
"Network-Aware Optimization of MPDATA on Homogeneous Multi-core Clusters with Heterogeneous Network",
ICA3PP 2016 Workshops, Granada, Spain, Lecture Notes in Computer Science 10049, Springer, pp. 30-42, 14-16 Dec 2016.
Download: tapems2016.pdf (357.28 KB)
"Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems",
The Journal of Supercomputing, vol. 73, issue 1, pp. 227-239, 01/2017.
Download: JoS-2016-Murilo.pdf (530.26 KB)
"Hierarchical redesign of classic MPI reduction algorithms",
The Journal of Supercomputing, vol. 73, issue 2: Springer, pp. 713-725, 02/2017.
Download: TJS-Hasanov-2016.pdf (593.41 KB)
"Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing",
IEEE Transactions on Parallel and Distributed Systems, vol. 28, issue 3: IEEE, pp. 787-797, 03/2017.
Download: TPDS_15.pdf (812.34 KB)
"New Model-based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters",
IEEE Transactions on Parallel and Distributed Systems, vol. 28, issue 4: IEEE, pp. 1119-1133, 04/2017.
Download: performance-energy-homo-multicore-clusters.pdf (1.27 MB)
"A Survey of Power and Energy Predictive Models in HPC Systems and Applications",
ACM Computing Surveys, vol. 50, issue 3: ACM, 10/2017.
Download: surveypowerenergymodelshpc.pdf (578.85 KB)
"Model-Based Estimation of the Communication Cost of Hybrid Data-Parallel Applications on Heterogeneous Clusters",
IEEE Transactions on Parallel and Distributed Systems, vol. 28, issue 11: IEEE, pp. 3215-3228, 11/2017.
Download: model-based-estimation-tpds-2017.pdf (1.65 MB); model-based-estimation-tpds-2017-supplement.pdf (871.33 KB)
"Additivity: A Selection Criterion for Performance Events for Reliable Energy Predictive Modeling",
Supercomputing Frontiers and Innovations, vol. 4, issue 4, pp. 50-65, 12/2017.
Abstract
Download: 153-992-1-PB.pdf (666.73 KB)
"Out-of-core Implementation for Accelerator Kernels on Heterogeneous Clouds",
The Journal of Supercomputing, vol. 74, issue 2, pp. 551-568, 2018.
Download: paper.pdf (762.34 KB)
"Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy",
IEEE Transactions on Computers, vol. 67, issue 2: IEEE, pp. 160-177, 02/2018.
Download: paperfinal.pdf (1.16 MB)
"Hierarchical Multicore Thread Mapping via Estimation of Remote Communication",
The Journal of Supercomputing, vol. 74, issue 3: Springer, pp. 1321-1340, 03/2018.
"Application level energy measurements and models for hybrid platform with accelerators",
School of Computer Science, Dublin, University College Dublin, pp. 165, 05/2018.
Download: thesis.pdf (1.61 MB)
"A Novel Data-Partitioning Algorithm for Performance Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms",
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 10: IEEE, pp. 2176-2190, 10/2018.
Download: paper_r2.pdf (1.49 MB); tpds2018hpoptasuppl.pdf (3.4 MB)
"Performance Optimization of Multithreaded 2D Fast Fourier Transform on Multicore Processors Using Load Imbalancing Parallel Computing Method",
IEEE Access, vol. 6: IEEE, pp. 64202-64224, 10/2018.
Download: ACCESS2878271.pdf (2.78 MB)
"Parallel Data Partitioning Algorithms for Optimization of Data-Parallel Applications on Modern Extreme-Scale Multicore Platforms for Performance and Energy",
IEEE Access, vol. 6: IEEE, pp. 69075-69106, 11/2018.
Download: IEEEAccess2018PDPA.pdf (3.04 MB)
"Performance Optimization of Multithreaded 2D FFT on Multicore Processors: Challenges and Solution Approaches",
IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), Bengaluru, India, IEEE, pp. 8-17, 17-20 Dec, 2018.
Download: paper.pdf (1.33 MB)
"Recent Advances in Matrix Partitioning for Parallel Computing on Heterogeneous Platforms",
IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 1: IEEE, pp. 218-229, 01/2019.
Download: recent-advances-matrix.pdf (2.77 MB)
"A Survey of Communication Performance Models for High-Performance Computing",
ACM Computing Surveys, vol. 51, issue 6: ACM, 01/2019.