I am a research engineer working on high-performance computing for Argonne National Laboratory. I specialize in HPC I/O, parallel and distributed data storage, distributed algorithms, and networking. I design and develop software for HPC data management targetting various domains including high-energy physics, computational biology, climate modeling, and artificial intelligence.
I excel in software design and development in particular in C, C++, and Python. I defined a methodology for developing HPC data services that is actively used in my team and by my collaborators. I am also particularly good at setting up and driving research collaborations.
Since March 2017, I work as a Software Development Specialist. I conduct research in the field of I/O, storage, communications, and in situ analysis for HPC applications. I work on developing data services for HPC as well as conducting research around HPC storage and I/O. I am a core member of the Mochi project, which aims at developing building blocks for designing HPC data services.
From February 2015 to March 2017, I was a postdoctoral appointee at the Mathematics and Computer Science division of Argonne National Laboratory, under the supervision of Rob Ross. I worked on I/O and storage for HPC applications, as well as collective communication algorithms, and event-driven simulations.
From 2011 to 2014 I completed a PhD under the supervision of Gabriel Antoniu and Luc Bougé, in the KerData team of Inria Rennes and IRISA. My research focused on I/O and in situ analysis for HPC applications. I developed the Damaris middleware for data management using dedicated resources on HPC systems (see Software section). I collaborated with people from Argonne National Lab and the University of Illinois at Urbana-Champaign. During my PhD, I also tought several programming courses (Java, C, C++, OCaml) at the ENS and INSA engineering schools.
In 2011 I completed my 6-month Master internship in the KerData team, supervised by Gabriel Antoniu. My research focused on I/O for HPC simulations.
In 2010 I completed a 3-month internship at the NCSA under the supervision of Marc Snir and Franck Cappello. The goal of this internship was to propose a scalable solution to the I/O performance issues posed by the CM1 atmospheric simulation.
In 2009 I completed a research internship in the field of storage for MapReduce applications on cloud platforms, under the supervision of Luc Bougé and Bogdan Nicolae. During this internship I designed a distributed file system for Hadoop based on the BlobSeer distributed data management service.
From 2008 to 2012 I was a normalien (civil servant with a scholarship) from the Britanny extension of ENS Cachan (now ENS de Rennes). I studied computer science and telecommunications.
Thesis titled "Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations", completed under the supervision of Gabriel Antoniu and Luc Bougé.
Obtained with honors and the rank of 1st/70 students.
French diploma corresponding to a Master's degree with additional professional experience (in my case, research experience).
Obtained with the rank of 4rth/106 students.
workshop Performance Characterization and Provenance of Distributed Task-based Workflows on HPC Platforms, Amal Gueroudji, Chase Phelps, Tanzima Z. Islam, Philip Carns, Shane Snyder, Matthieu Dorier, Robert B. Ross, Line C. Pouchard, 19th Workshop on Workflows in Support of Large-Scale Science (WORKS 2024)
workshop Thallus: An RDMA-based Columnar Data Transport Protocol, Jayjeet Chakraborty, Matthieu Dorier, Philip Carns, Robert Ross, Carlos Maltzahn, Heiner Litz, 2nd Workshop on Hot Topics in System Infrastructure (HotInfra 2024)
conference Diaspora: Resilience-Enabling Services for Real-Time Distributed Workflows, Bogdan Nicolae, Justin M Wozniak, Tekin Bicer, Hai Nguyen, Parth Patel, Haochen Pan, Amal Gueroudji, Maxime Gonthier, Valerie Hayot-Sasson, Eliu Huerta, Kyle Chard, Ryan Chard, Matthieu Dorier, Nageswara SV Rao, Anees Al-Najjar, Alessandra Corsi, Ian Foster, 2024 IEEE 20th International Conference on e-Science (e-Science)
workshop Extending the Mochi Methodology to Enable Dynamic HPC Data Services, Matthieu Dorier, Philip Carns, Robert Ross, Shane Snyder, Rob Latham, Amal Gueroudji, George Amvrosiadis, Chuck Cranor, Jerome Soumagne, 5th International Workshop on Extreme-Scale Storage and Analysis (ESSA 2024)
journal Mochi: A Case Study in Translational Computer Science for High-Performance Computing Data Management, Philip Carns, Matthieu Dorier, Rob Latham, Robert Ross, Shane Snyder, Jerome Soumagne, Computing in Science and Engineering (IEEE CiSE)
workshop HEPnOS: a Specialized Data Service for High Energy Physics Analysis, Sajid Ali, Steven Calvez, Philip Carns, Matthieu Dorier, Pengfei Ding, James Kowalkowski, Robert Latham, Andrew Norman, Marc Paterno, Robert Ross, Saba Sehrish, Shane Snyder, Jerome Soumagne, 4th International Workshop on Extreme-Scale Storage and Analysis (ESSA 2023)
journal Towards Elastic In Situ Analysis for High-performance Computing Simulations, Matthieu Dorier, Zhe Wang, Srinivasan Ramesh, Utkarsh Ayachit, Shane Snyder, Rob Ross, Manish Parashar, Journal of Parallel and Distributed Computing (JPDC)
journal Adaptive Elasticity Policies for Staging-based In Situ Visualization, Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, Manish Parashar, Future Generation Computer Systems (FGCS)
workshop Research Perspectives Toward Autonomic Optimization of In Situ Analysis and Visualization, Zhe Wang, Matthieu Dorier, Manish Parashar, Proceedings of the ISAV workshop (SC22)
conference HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization, Matthieu Dorier, Romain Egele, Prasanna Balaprakash, Jaehoon Koo, Sandeep Madireddy, Srinivasan Ramesh, Allen D. Malony, Rob Ross, Proceedings of the 2022 IEEE International Conference on Cluster Computing (CLUSTER)
conference best paper finalist Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations, Matthieu Dorier, Zhe Wang, Utkarsh Ayachit, Shane Snyder, Rob Ross, Manish Parashar, Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS)
conference Adaptive Placement of Data Analysis Tasks For Staging Based In-Situ Processing, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)
conference SYMBIOMON: A High Performance, Composable Monitoring Service, Srinivasan Ramesh, Robert Ross, Matthieu Dorier, Allen Malony, Philip Carns, Kevin Huck, 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)
workshop An Adaptive Elasticity Policy For Staging Based In-Situ Processing, Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, Manish Parashar, 16th Workshop on Workflows in Support of Large-Scale Science (WORKS)
workshop Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, 2nd Workshop on High-Performance Storage (HPS)
conference SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services, Srinivasan Ramesh, Allen D. Malony, Philip Carns, Robert B. Ross, Matthieu Dorier, Jerome Soumagne, Shane Snyder, Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS)
journal A Terminology for In Situ Visualization and Analysis Systems, Hank Childs al., International Journal of High Performance Computing Applications
conference Staging Based Task Execution for Data-driven, In-Situ Scientific Workflows, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, Proceedings of the 2020 IEEE International Conference on Cluster Computing (CLUSTER)
conference DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training, Bogdan Nicolae, Justin M Wozniak, Matthieu Dorier, Franck Cappello, Proceedings of the 2020 IEEE International Conference on Cluster Computing (CLUSTER)
conference DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models, Bogdan Nicolae, Jiali Li, Justin Wozniak, George Bosilca, Matthieu Dorier, Franck Cappello, Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (Ccgrid)
conference Pufferscale: Rescaling HPC Data Services for High Energy Physics Applications, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Stefan M Wild, Sven Leyffer, Robert Ross, Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (Ccgrid)
journal How Fast Can One Resize a Distributed File System?, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Journal of Parallel and Distributed Computing (Elsevier JPDC)
journal Mochi: Composing Data Services for High-Performance Computing Environments, Robert B Ross, George Amvrosiadis, Philip Carns, Charles D Cranor, Matthieu Dorier, Kevin Harms, Greg Ganger, Garth Gibson, Samuel K Gutierrez, Robert Latham, Bob Robey, Dana Robinson, Bradley Settlemyer, Galen Shipman, Shane Snyder, Jerome Soumagne, Qing Zheng, Journal of Computer Science and Technology (Springer JCST)
report A Software Defined Storage Approach to Exascale Storage Services, Jerome Soumagne, Robert Ross, Galen Shipman, George Amvrosiadis, Neil Fortner, Dana Robinson, Philip Carns, Matthieu Dorier, Robert Latham, Shane Snyder, David Rich, Bradley Settlemyer, Chuck Cranor, Greg Ganger, Qing Zheng, Technical Report
workshop The Challenges of Elastic in Situ Analysis and Visualization, Matthieu Dorier, Orcun Yildiz, Tom Peterka, Robert Ross, Proceedings of the ISAV workshop (SC19)
journal MPI jobs within MPI jobs: A practical way of enabling task-level fault-tolerance in HPC workflows, Justin Wozniak, Matthieu Dorier, Robert Ross, Tong Shu, Tahsin Kurc, Li Tang, Norbert Podhorszki, Matthew Wolf, Future Generation Computer Systems (Elsevier FGCS)
conference Is it Worth Relaxing Fault Tolerance to Speed Up Decommission in Distributed Storage Systems?, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Proceedings of the International Symposium in Cluster, Cloud, and Grid Computing (CCgrid)
workshop Methodology for the Rapid Development of Scalable HPC Data Services, Matthieu Dorier, Philip Carns, Kevin Harms, Robert Latham, Robert Ross, Shane Snyder, Justin Wozniak, Samuel Gutierrez, Bob Robey, Brad Settlemyer, Galen Shipman, Jerome Soumagne, James Kowalkowski, Marc Paterno, Saba Sehrish, Proceedings of the PDSW-DISC 2018 workshop (SC18)
workshop Pufferbench: Evaluating and Optimizing Malleability of Distributed Storage, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Proceedings of the PDSW-DISC 2018 workshop (SC18)
report A Lower Bound for the Commission Times in Replication-Based Distributed Storage Systems, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, RR-9186
workshop CoSS: Proposing a Contract-Based Storage System for HPC, Matthieu Dorier, Matthieu Dreher, Tom Peterka, Robert Ross, Proceedings of the PDSW-DISC 2017 workshop (SC17)
workshop Supporting Task-level Fault-Tolerance in HPC Workflows by Launching MPI Jobs inside MPI Jobs, Matthieu Dorier, Justin Wozniak, Robert Ross, Proceedings of the WORKS 2017 workshop (SC17)
report Performance-Constrained In Situ Visualization of Atmospheric Simulations, Matthieu Dorier, Robert Sisneros, Leonardo Bautista-Gomez, Tom Peterka, Leigh G Orf, Rob Ross, Lokman Rahmani, Gabriel Antoniu, Luc Bougé, RR-8855
poster A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge, Dong Dai, Robert Ross, Dounia Khaldi, Yonghong Yan, Matthieu Dorier, Neda Tavakoli, Yong Chen, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)
poster Design and Evaluation of Topology-aware Scatter and AllGather Algorithms for Dragonfly Networks, Nathanael Cheriere, Matthieu Dorier, Rob Ross, Shadi Ibrahim, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - ACM Student Research Competition
journal Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Robert Sisneros, Orcun Yildiz, Shadi Ibrahim, Tom Peterka, Leigh Orf, ACM Transactions on Parallel Computing (ToPC)
conference Leveraging Burst Buffer Coordination to Prevent I/O Interference, Anthony Kougkas, Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun, IEEE International Conference on eScience
conference Evaluation of Topology-Aware Broadcast Algorithms for Dragonfly Networks, Matthieu Dorier, Misbah Mubarak, Rob Ross, Jianping Kelvin Li, Christopher D. Carothers, Kwan-Liu Ma, IEEE International Conference on Cluster Computing (CLUSTER)
conference Adaptive Performance-Constrained In Situ Visualization of Atmospheric Simulations, Matthieu Dorier, Robert Sisneros, Leonardo Bautista Gomez, Tom Peterka, Leigh Orf, Lokman Rahmani, Gabriel Antoniu, Luc Bougé, IEEE International Conference on Cluster Computing (CLUSTER)
conference On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems, Orcun Yildiz, Matthieu Dorier, Shadi Ibrahim, Rob Ross, Gabriel Antoniu, IEEE International Parallel and Distributed Processing Symposium (IPDPS)
journal On the Energy Footprint of I/O Management in Exascale HPC Systems, Matthieu Dorier, Orcun Yildiz, Shadi Ibrahim, Anne-Cécile Orgerie, Gabriel Antoniu, Elsevier Future Generation Computer Systems (FGCS)
workshop Get Out of the Way! Applying Compression to Internal Data Structures, Robert Latham, Matthieu Dorier, Robert Ross, Proceedings of the PDSW-DISC 2016 workshop (SC16)
journal Using Formal Grammars to Predict I/O Behaviors in HPC: the Omnisc’IO Approach, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Rob Ross, IEEE Transactions on Parallel and Distributed Systems (TPDS)
workshop Lessons Learned from Building In Situ Coupling Frameworks, Matthieu Dorier, Matthieu Dreher, Tom Peterka, Gabriel Antoniu, Bruno Raffin, Justin M. Wozniak, First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV)
report On the Use of Formal Grammars to Predict HPC I/O Behaviors, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Rob Ross, RR-8725
thesis Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations, Matthieu Dorier, PhD thesis
conference Omnisc’IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Robert Ross, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC14)
conference CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination, Matthieu Dorier, Gabriel Antoniu, Robert Ross, Dries Kimpe, Shadi Ibrahim, IEEE International Parallel and Distributed Processing Symposium (IPDPS)
workshop A Performance and Energy Analysis of I/O Management Approaches for Exascale Systems, Orcun Yildiz, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Sixth International Workshop on Data Intensive Distributed Computing (DIDC)
conference Damaris/Viz: a Nonintrusive, Adaptable and User-Friendly In Situ Visualization Framework, Matthieu Dorier, Roberto R. Sisneros, Tom Peterka, Gabriel Antoniu, Dave B. Semeraro, IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)
poster Efficient I/O using Dedicated Cores in Large-Scale HPC Simulations, Matthieu Dorier, IEEE International Parallel and Distributed Processing Symposium, PhD forum
report A Nonintrusive, Adaptable and User-Friendly In Situ Visualization Framework, Matthieu Dorier, Roberto R. Sisneros, Tom Peterka, Gabriel Antoniu, Dave B. Semeraro, RR-8314
conference Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Leigh G. Orf, IEEE International Conference on Cluster Computing (CLUSTER)
report Damaris: Leveraging Multicore Parallelism to Mask I/O Jitter, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Leigh Orf, RR-7706
thesis On the Benefit of Dedicating Cores to Mask I/O Jitter in HPC Simulations, Matthieu Dorier, Master thesis
poster Damaris - Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations, Matthieu Dorier, ACM International Conference on Supercomputing (ICS) - ACM Student Research Competion
conference BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications, Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bougé, Matthieu Dorier, IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
DOE-funded project working on designing efficient building blocks for data services in HPC systems. In this context I am the lead on multiple libraries for storage, I/O, and networking.
New capabilities at ASCR computing facilities drive us to rethink what is possible within the HEP (High-Energy Physics) scientific workflow. Within this project I designed HEPnOS, a distributed storage system specifically optimized for event data produced by HEP workflows.
The purpose of the Joint Laboratory for Extreme Scale Computing (JLESC) is to be an international, virtual organization whose goal is to enhance the ability of member organizations and investigators to make the bridge between Petascale and Extreme computing.
Mochi is a set of libraries that my team and I are developing at Argonne National Laboratory. This project aims at providing libraries to build efficient HPC data services. Such libraries include threading, RPC, RDMA, key/value storage, document storage, blob storage, etc. Some of my most important contributions to these libraries, is thallium, a modern C++ library enabling threading, RPC, and RDMA, on top of the Argobots threading library and the Mercury networking library. The Mochi libraries are currently used by a growing number of users, including people from LANL, LLNL, LBNL, JGU Mainz, Intel, IIT, The HDF Group, BNL, Rutgers University, and FermiLab.
HEPnOS is an object store that I designed and built using the Mochi libraries in the context of a collaboration with FermiLab. The goal of HEPnOS is to provide a simple interface to HEP event-processing workflows in modern C++, backed with an efficient object store, as a potential replacement to the traditional file-based storage approach based on the ROOT file format. HEPnOS is currently actively developed and under evaluation with our partners at FermiLab.
Damaris is a data-management middleware for high-performance computing simulations. It enables to dedicate some of the cores in each node, or entire nodes, to run data management services, including asynchronous data transformation and storage, and in situ visualization and analysis. I developed Damaris during my Master and PhD (2010 to 2014) and published a number of papers about it. Following my graduation, the KerData team of INRIA Rennes hired an engineer to continue developing and supporting it. To this day, Damaris is still actively maintained by the KerData team.
The Mochi project, on which I have been working since 2015 alongside my Argonne colleagues and collaborators from Carnegie Mellon University, Los Alamos National Laboratory, and The HDF Group, was awarded the prestigious R&D 100 Award.
The Gilles Kahn prize is awarded every year by the Société Informatique de France and the French Academy of Science to the three best PhD theses in computer science in France (one first prize and two honorary prizes). It is one of the most prestigious PhD awards in Computer Science in France. It values the originality of the research, the originality of the domain and methods employed, the importance and impact of the results on the community, and the quality of the manuscript.
This label is awarded by the CPU (Conférence des Présidents d’Universités), GENCI, and the Maison de la Simulation, to doctors who demonstrated skills related to high performance computing during their PhD. The C3I label is multidisciplinary, and covers all domains of science, from theory to applied research. The candidate must have shown evidence of skills related to the use and application of HPC (optimization of parallel codes, distributed and parallel algorithms, large scale data management…). The label is awarded as a mean to increase the visibility of the work conducted by these young doctors, and provides them with an additional asset in their career, whether this career evolves in the academic sector or in the industry.
The PhD award from the Fondation Rennes 1 is given every year to 8 outstanding new doctors from the 4 doctoral schools associated with the University of Rennes 1 (2 awards per doctoral school). The candidates are judged on the innovative aspects of their PhD thesis, “innovative” being understood in the sense of impact on socioeconomic development and technology transfers.
The ACM Student Research Competition is an internationally recognized venue enabling undergraduate and graduate students to experience the research world, share research results and exchange ideas with other students, judges, and conference attendees, rub shoulders with academic and industry luminaries, understand the practical applications of their research, perfect their communication skills, and receive prizes and gain recognition from ACM and the greater computing community.