Matthieu Dorier

PhD, Software Development Specialist · Argonne National Labotatory · mdorier @anl.gov

I am a research engineer working on high-performance computing for Argonne National Laboratory. I specialize in HPC I/O, parallel and distributed data storage, distributed algorithms, and networking. I design and develop software for HPC data management targetting various domains including high-energy physics, computational biology, climate modeling, and artificial intelligence.

I excel in software design and development in particular in C, C++, and Python. I defined a methodology for developing HPC data services that is actively used in my team and by my collaborators. I am also particularly good at setting up and driving research collaborations.



Experience

Software Development Specialist

Argonne National Laboratory, Lemont, IL

Since March 2017, I work as a Software Development Specialist. I conduct research in the field of I/O, storage, communications, and in situ analysis for HPC applications. I work on developing data services for HPC as well as conducting research around HPC storage and I/O. I am a core member of the Mochi project, which aims at developing building blocks for designing HPC data services.

2017 - Present

Postdoctoral Researcher

Argonne National Laboratory, Lemont, IL

From February 2015 to March 2017, I was a postdoctoral appointee at the Mathematics and Computer Science division of Argonne National Laboratory, under the supervision of Rob Ross. I worked on I/O and storage for HPC applications, as well as collective communication algorithms, and event-driven simulations.

2015 - 2017

PhD Student, Teaching Assistant

Ecole Normale Supérieure de Rennes, Rennes, France

From 2011 to 2014 I completed a PhD under the supervision of Gabriel Antoniu and Luc Bougé, in the KerData team of Inria Rennes and IRISA. My research focused on I/O and in situ analysis for HPC applications. I developed the Damaris middleware for data management using dedicated resources on HPC systems (see Software section). I collaborated with people from Argonne National Lab and the University of Illinois at Urbana-Champaign. During my PhD, I also tought several programming courses (Java, C, C++, OCaml) at the ENS and INSA engineering schools.

2011 - 2014

Internship

KerData team, IRISA / Inria Rennes Bretagne Atlantique, Rennes, France

In 2011 I completed my 6-month Master internship in the KerData team, supervised by Gabriel Antoniu. My research focused on I/O for HPC simulations.

2011

Internship

National Center for Supercomputing Applications, Urbana-Champaign, IL

In 2010 I completed a 3-month internship at the NCSA under the supervision of Marc Snir and Franck Cappello. The goal of this internship was to propose a scalable solution to the I/O performance issues posed by the CM1 atmospheric simulation.

2010

Internship

PARIS team, IRISA / Inria Rennes Bretagne Atlantique, Rennes, France

In 2009 I completed a research internship in the field of storage for MapReduce applications on cloud platforms, under the supervision of Luc Bougé and Bogdan Nicolae. During this internship I designed a distributed file system for Hadoop based on the BlobSeer distributed data management service.

2009

Normalien

Ecole Normale Supérieure de Rennes, France

From 2008 to 2012 I was a normalien (civil servant with a scholarship) from the Britanny extension of ENS Cachan (now ENS de Rennes). I studied computer science and telecommunications.

2008 - 2012


Education

Ecole Normale Supérieure de Rennes, IRISA, France

PhD - High performance computing

Thesis titled "Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations", completed under the supervision of Gabriel Antoniu and Luc Bougé.

2011 - 2014

Ecole Normale Supérieure de Cachan / Rennes and University of Rennes I

Master's degree - Research in computer science

Obtained with honors and the rank of 1st/70 students.

2009 - 2011

Ecole Normale Supérieure de Cachan

Magistère - Computer science and telecommunications

French diploma corresponding to a Master's degree with additional professional experience (in my case, research experience).

2008 - 2011

Ecole Normale Supérieure de Cachan / Rennes and University of Rennes I

Bachelor's degree - Computer science

Obtained with the rank of 4rth/106 students.

2008 - 2009


Publications

  • 2023

    journal Mochi: A Case Study in Translational Computer Science for High-Performance Computing Data Management, Philip Carns, Matthieu Dorier, Rob Latham, Robert Ross, Shane Snyder, Jerome Soumagne, Computing in Science and Engineering (IEEE CiSE)

    workshop HEPnOS: a Specialized Data Service for High Energy Physics Analysis, Sajid Ali, Steven Calvez, Philip Carns, Matthieu Dorier, Pengfei Ding, James Kowalkowski, Robert Latham, Andrew Norman, Marc Paterno, Robert Ross, Saba Sehrish, Shane Snyder, Jerome Soumagne, 4th International Workshop on Extreme-Scale Storage and Analysis (ESSA 2023)

    journal Towards Elastic In Situ Analysis for High-performance Computing Simulations, Matthieu Dorier, Zhe Wang, Srinivasan Ramesh, Utkarsh Ayachit, Shane Snyder, Rob Ross, Manish Parashar, Journal of Parallel and Distributed Computing (JPDC)

    journal Adaptive Elasticity Policies for Staging-based In Situ Visualization, Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, Manish Parashar, Future Generation Computer Systems (FGCS)

  • 2022

    workshop Research Perspectives Toward Autonomic Optimization of In Situ Analysis and Visualization, Zhe Wang, Matthieu Dorier, Manish Parashar, Proceedings of the ISAV workshop (SC22)

    conference HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization, Matthieu Dorier, Romain Egele, Prasanna Balaprakash, Jaehoon Koo, Sandeep Madireddy, Srinivasan Ramesh, Allen D. Malony, Rob Ross, Proceedings of the 2022 IEEE International Conference on Cluster Computing (CLUSTER)

    conference best paper finalist Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations, Matthieu Dorier, Zhe Wang, Utkarsh Ayachit, Shane Snyder, Rob Ross, Manish Parashar, Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS)

  • 2021

    conference Adaptive Placement of Data Analysis Tasks For Staging Based In-Situ Processing, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

    conference SYMBIOMON: A High Performance, Composable Monitoring Service, Srinivasan Ramesh, Robert Ross, Matthieu Dorier, Allen Malony, Philip Carns, Kevin Huck, 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

    workshop An Adaptive Elasticity Policy For Staging Based In-Situ Processing, Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, Manish Parashar, 16th Workshop on Workflows in Support of Large-Scale Science (WORKS)

    workshop Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, 2nd Workshop on High-Performance Storage (HPS)

    conference SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services, Srinivasan Ramesh, Allen D. Malony, Philip Carns, Robert B. Ross, Matthieu Dorier, Jerome Soumagne, Shane Snyder, Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS)

  • 2020

    journal A Terminology for In Situ Visualization and Analysis Systems, Hank Childs al., International Journal of High Performance Computing Applications

    conference Staging Based Task Execution for Data-driven, In-Situ Scientific Workflows, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, Proceedings of the 2020 IEEE International Conference on Cluster Computing (CLUSTER)

    conference DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training, Bogdan Nicolae, Justin M Wozniak, Matthieu Dorier, Franck Cappello, Proceedings of the 2020 IEEE International Conference on Cluster Computing (CLUSTER)

    conference DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models, Bogdan Nicolae, Jiali Li, Justin Wozniak, George Bosilca, Matthieu Dorier, Franck Cappello, Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (Ccgrid)

    conference Pufferscale: Rescaling HPC Data Services for High Energy Physics Applications, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Stefan M Wild, Sven Leyffer, Robert Ross, Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (Ccgrid)

    journal How Fast Can One Resize a Distributed File System?, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Journal of Parallel and Distributed Computing (Elsevier JPDC)

    journal Mochi: Composing Data Services for High-Performance Computing Environments, Robert B Ross, George Amvrosiadis, Philip Carns, Charles D Cranor, Matthieu Dorier, Kevin Harms, Greg Ganger, Garth Gibson, Samuel K Gutierrez, Robert Latham, Bob Robey, Dana Robinson, Bradley Settlemyer, Galen Shipman, Shane Snyder, Jerome Soumagne, Qing Zheng, Journal of Computer Science and Technology (Springer JCST)

  • 2019

    report A Software Defined Storage Approach to Exascale Storage Services, Jerome Soumagne, Robert Ross, Galen Shipman, George Amvrosiadis, Neil Fortner, Dana Robinson, Philip Carns, Matthieu Dorier, Robert Latham, Shane Snyder, David Rich, Bradley Settlemyer, Chuck Cranor, Greg Ganger, Qing Zheng, Technical Report

    workshop The Challenges of Elastic in Situ Analysis and Visualization, Matthieu Dorier, Orcun Yildiz, Tom Peterka, Robert Ross, Proceedings of the ISAV workshop (SC19)

    journal MPI jobs within MPI jobs: A practical way of enabling task-level fault-tolerance in HPC workflows, Justin Wozniak, Matthieu Dorier, Robert Ross, Tong Shu, Tahsin Kurc, Li Tang, Norbert Podhorszki, Matthew Wolf, Future Generation Computer Systems (Elsevier FGCS)

    conference Is it Worth Relaxing Fault Tolerance to Speed Up Decommission in Distributed Storage Systems?, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Proceedings of the International Symposium in Cluster, Cloud, and Grid Computing (CCgrid)

  • 2018

    workshop Methodology for the Rapid Development of Scalable HPC Data Services, Matthieu Dorier, Philip Carns, Kevin Harms, Robert Latham, Robert Ross, Shane Snyder, Justin Wozniak, Samuel Gutierrez, Bob Robey, Brad Settlemyer, Galen Shipman, Jerome Soumagne, James Kowalkowski, Marc Paterno, Saba Sehrish, Proceedings of the PDSW-DISC 2018 workshop (SC18)

    workshop Pufferbench: Evaluating and Optimizing Malleability of Distributed Storage, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Proceedings of the PDSW-DISC 2018 workshop (SC18)

    report A Lower Bound for the Commission Times in Replication-Based Distributed Storage Systems, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, RR-9186

  • 2017

    workshop CoSS: Proposing a Contract-Based Storage System for HPC, Matthieu Dorier, Matthieu Dreher, Tom Peterka, Robert Ross, Proceedings of the PDSW-DISC 2017 workshop (SC17)

    workshop Supporting Task-level Fault-Tolerance in HPC Workflows by Launching MPI Jobs inside MPI Jobs, Matthieu Dorier, Justin Wozniak, Robert Ross, Proceedings of the WORKS 2017 workshop (SC17)

  • 2016

    report Performance-Constrained In Situ Visualization of Atmospheric Simulations, Matthieu Dorier, Robert Sisneros, Leonardo Bautista-Gomez, Tom Peterka, Leigh G Orf, Rob Ross, Lokman Rahmani, Gabriel Antoniu, Luc Bougé, RR-8855

    poster A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge, Dong Dai, Robert Ross, Dounia Khaldi, Yonghong Yan, Matthieu Dorier, Neda Tavakoli, Yong Chen, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)

    poster Design and Evaluation of Topology-aware Scatter and AllGather Algorithms for Dragonfly Networks, Nathanael Cheriere, Matthieu Dorier, Rob Ross, Shadi Ibrahim, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - ACM Student Research Competition

    journal Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Robert Sisneros, Orcun Yildiz, Shadi Ibrahim, Tom Peterka, Leigh Orf, ACM Transactions on Parallel Computing (ToPC)

    conference Leveraging Burst Buffer Coordination to Prevent I/O Interference, Anthony Kougkas, Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun, IEEE International Conference on eScience

    conference Evaluation of Topology-Aware Broadcast Algorithms for Dragonfly Networks, Matthieu Dorier, Misbah Mubarak, Rob Ross, Jianping Kelvin Li, Christopher D. Carothers, Kwan-Liu Ma, IEEE International Conference on Cluster Computing (CLUSTER)

    conference Adaptive Performance-Constrained In Situ Visualization of Atmospheric Simulations, Matthieu Dorier, Robert Sisneros, Leonardo Bautista Gomez, Tom Peterka, Leigh Orf, Lokman Rahmani, Gabriel Antoniu, Luc Bougé, IEEE International Conference on Cluster Computing (CLUSTER)

    conference On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems, Orcun Yildiz, Matthieu Dorier, Shadi Ibrahim, Rob Ross, Gabriel Antoniu, IEEE International Parallel and Distributed Processing Symposium (IPDPS)

    journal On the Energy Footprint of I/O Management in Exascale HPC Systems, Matthieu Dorier, Orcun Yildiz, Shadi Ibrahim, Anne-Cécile Orgerie, Gabriel Antoniu, Elsevier Future Generation Computer Systems (FGCS)

    workshop Get Out of the Way! Applying Compression to Internal Data Structures, Robert Latham, Matthieu Dorier, Robert Ross, Proceedings of the PDSW-DISC 2016 workshop (SC16)

  • 2015

    journal Using Formal Grammars to Predict I/O Behaviors in HPC: the Omnisc’IO Approach, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Rob Ross, IEEE Transactions on Parallel and Distributed Systems (TPDS)

    workshop Lessons Learned from Building In Situ Coupling Frameworks, Matthieu Dorier, Matthieu Dreher, Tom Peterka, Gabriel Antoniu, Bruno Raffin, Justin M. Wozniak, First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV)

    report On the Use of Formal Grammars to Predict HPC I/O Behaviors, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Rob Ross, RR-8725

  • 2014

    thesis Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations, Matthieu Dorier, PhD thesis

    conference Omnisc’IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Robert Ross, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC14)

    conference CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination, Matthieu Dorier, Gabriel Antoniu, Robert Ross, Dries Kimpe, Shadi Ibrahim, IEEE International Parallel and Distributed Processing Symposium (IPDPS)

    workshop A Performance and Energy Analysis of I/O Management Approaches for Exascale Systems, Orcun Yildiz, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Sixth International Workshop on Data Intensive Distributed Computing (DIDC)

  • 2013

    conference Damaris/Viz: a Nonintrusive, Adaptable and User-Friendly In Situ Visualization Framework, Matthieu Dorier, Roberto R. Sisneros, Tom Peterka, Gabriel Antoniu, Dave B. Semeraro, IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

    poster Efficient I/O using Dedicated Cores in Large-Scale HPC Simulations, Matthieu Dorier, IEEE International Parallel and Distributed Processing Symposium, PhD forum

    report A Nonintrusive, Adaptable and User-Friendly In Situ Visualization Framework, Matthieu Dorier, Roberto R. Sisneros, Tom Peterka, Gabriel Antoniu, Dave B. Semeraro, RR-8314

  • 2012

    conference Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Leigh G. Orf, IEEE International Conference on Cluster Computing (CLUSTER)

    report Damaris: Leveraging Multicore Parallelism to Mask I/O Jitter, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Leigh Orf, RR-7706

  • 2011

    thesis On the Benefit of Dedicating Cores to Mask I/O Jitter in HPC Simulations, Matthieu Dorier, Master thesis

    poster Damaris - Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations, Matthieu Dorier, ACM International Conference on Supercomputing (ICS) - ACM Student Research Competion

  • 2010

    conference BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications, Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bougé, Matthieu Dorier, IEEE International Symposium on Parallel & Distributed Processing (IPDPS)



Projects

Only some of my currently running projects are shown hereafter.

Mochi: a Software Defined Storage Approach to Exascale Storage Services

ANL, LANL, CMU, HDF Group

DOE-funded project working on designing efficient building blocks for data services in HPC systems. In this context I am the lead on multiple libraries for storage, I/O, and networking.

2015 - Present

SciDAC-4: HEP Data Analytics on HPC

ANL, FermiLab, LBNL, Colorado State University, University of Cincinnati

New capabilities at ASCR computing facilities drive us to rethink what is possible within the HEP (High-Energy Physics) scientific workflow. Within this project I designed HEPnOS, a distributed storage system specifically optimized for event data produced by HEP workflows.

2018 - Present

Joint Laboratory for Extreme-Scale Computing (JLESC)

Inria, ANL, UIUC, JSC, BSC, RIKEN-AICS

The purpose of the Joint Laboratory for Extreme Scale Computing (JLESC) is to be an international, virtual organization whose goal is to enhance the ability of member organizations and investigators to make the bridge between Petascale and Extreme computing.

2009 - Present


Software

I developed and contributed to many pieces software during my career. The most important ones are listed hereafter.

Mochi

Mochi is a set of libraries that my team and I are developing at Argonne National Laboratory. This project aims at providing libraries to build efficient HPC data services. Such libraries include threading, RPC, RDMA, key/value storage, document storage, blob storage, etc. Some of my most important contributions to these libraries, is thallium, a modern C++ library enabling threading, RPC, and RDMA, on top of the Argobots threading library and the Mercury networking library. The Mochi libraries are currently used by a growing number of users, including people from LANL, LLNL, LBNL, JGU Mainz, Intel, IIT, The HDF Group, BNL, Rutgers University, and FermiLab.

HEPnOS

HEPnOS is an object store that I designed and built using the Mochi libraries in the context of a collaboration with FermiLab. The goal of HEPnOS is to provide a simple interface to HEP event-processing workflows in modern C++, backed with an efficient object store, as a potential replacement to the traditional file-based storage approach based on the ROOT file format. HEPnOS is currently actively developed and under evaluation with our partners at FermiLab.

Damaris

Damaris is a data-management middleware for high-performance computing simulations. It enables to dedicate some of the cores in each node, or entire nodes, to run data management services, including asynchronous data transformation and storage, and in situ visualization and analysis. I developed Damaris during my Master and PhD (2010 to 2014) and published a number of papers about it. Following my graduation, the KerData team of INRIA Rennes hired an engineer to continue developing and supporting it. To this day, Damaris is still actively maintained by the KerData team.

GitHub Activity

Loading GitHub activity...



Awards

R&D 100 Award

The Mochi project, on which I have been working since 2015 alongside my Argonne colleagues and collaborators from Carnegie Mellon University, Los Alamos National Laboratory, and The HDF Group, was awarded the prestigious R&D 100 Award.

2021

Gilles Kahn Honorary award (accessit)

The Gilles Kahn prize is awarded every year by the Société Informatique de France and the French Academy of Science to the three best PhD theses in computer science in France (one first prize and two honorary prizes). It is one of the most prestigious PhD awards in Computer Science in France. It values the originality of the research, the originality of the domain and methods employed, the importance and impact of the results on the community, and the quality of the manuscript.

2015

C3I Certification (Certificat de Compétences en Calcul Intensif)

This label is awarded by the CPU (Conférence des Présidents d’Universités), GENCI, and the Maison de la Simulation, to doctors who demonstrated skills related to high performance computing during their PhD. The C3I label is multidisciplinary, and covers all domains of science, from theory to applied research. The candidate must have shown evidence of skills related to the use and application of HPC (optimization of parallel codes, distributed and parallel algorithms, large scale data management…). The label is awarded as a mean to increase the visibility of the work conducted by these young doctors, and provides them with an additional asset in their career, whether this career evolves in the academic sector or in the industry.

2015

2nd best PhD thesis award from Fondation Rennes 1

The PhD award from the Fondation Rennes 1 is given every year to 8 outstanding new doctors from the 4 doctoral schools associated with the University of Rennes 1 (2 awards per doctoral school). The candidates are judged on the innovative aspects of their PhD thesis, “innovative” being understood in the sense of impact on socioeconomic development and technology transfers.

2014

2nd ACM Student Research Competition at ICS 2011

The ACM Student Research Competition is an internationally recognized venue enabling undergraduate and graduate students to experience the research world, share research results and exchange ideas with other students, judges, and conference attendees, rub shoulders with academic and industry luminaries, understand the practical applications of their research, perfect their communication skills, and receive prizes and gain recognition from ACM and the greater computing community.

2011