I am a research engineer working on high-performance computing for Argonne National Laboratory. I specialize in HPC I/O, parallel and distributed data storage, distributed algorithms, and networking. I design and develop software for HPC data management targetting various domains including high-enery physics, computational biology, climate modeling, and artificial intelligence.
I excel in software design and development in particular in C and C++. I defined a methodology for developing HPC data services that is actively used in my team and by my collaborators. I am also particularly good at setting up and driving research collaborations.
Since March 2017, I work as a Software Development Specialist. I conduct research in the field of I/O, storage, communications, and in situ analysis for HPC applications. I work on developing data services for HPC as well as conducting research around HPC storage and I/O. I am a core member of the Mochi project, which aims at developing building blocks for designing HPC data services.
From February 2015 to March 2017, I was a postdoctoral appointee at the Mathematics and Computer Science division of Argonne National Laboratory, under the supervision of Rob Ross. I worked on I/O and storage for HPC applications, as well as collective communication algorithms, and event-driven simulations.
From 2011 to 2014 I completed a PhD under the supervision of Gabriel Antoniu and Luc Bougé, in the KerData team of Inria Rennes and IRISA. My research focused on I/O and in situ analysis for HPC applications. I developed the Damaris middleware for data management using dedicated resources on HPC systems (see Software section). I collaborated with people from Argonne National Lab and the University of Illinois at Urbana-Champaign. During my PhD, I also tought several programming courses (Java, C, C++, OCaml) at the ENS and INSA engineering schools.
In 2011 I completed my 6-month Master internship in the KerData team, supervised by Gabriel Antoniu. My research focused on I/O for HPC simulations.
In 2010 I completed a 3-month internship at the NCSA under the supervision of Marc Snir and Franck Cappello. The goal of this internship was to propose a scalable solution to the I/O performance issues posed by the CM1 atmospheric simulation.
In 2009 I completed a research internship in the field of storage for MapReduce applications on cloud platforms, under the supervision of Luc Bougé and Bogdan Nicolae. During this internship I designed a distributed file system for Hadoop based on the BlobSeer distributed data management service.
From 2008 to 2012 I was a normalien (civil servant with a scholarship) from the Britanny extension of ENS Cachan (now ENS de Rennes). I studied computer science and telecommunications.
Thesis titled "Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations", completed under the supervision of Gabriel Antoniu and Luc Bougé.
Obtained with honors and the rank of 1st/70 students.
French diploma corresponding to a Master's degree with additional professional experience (in my case, research experience).
Obtained with the rank of 4rth/106 students.
workshop Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, 2nd Workshop on High-Performance Storage (HPS)
conference SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services, Srinivasan Ramesh, Allen D. Malony, Philip Carns, Robert B. Ross, Matthieu Dorier, Jerome Soumagne, Shane Snyder, Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS)
conference Staging Based Task Execution for Data-driven, In-Situ Scientific Workflows, Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, Proceedings of the 2020 IEEE International Conference on Cluster Computing (CLUSTER)
conference DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training, Bogdan Nicolae, Justin M Wozniak, Matthieu Dorier, Franck Cappello, Proceedings of the 2020 IEEE International Conference on Cluster Computing (CLUSTER)
conference DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models, Bogdan Nicolae, Jiali Li, Justin Wozniak, George Bosilca, Matthieu Dorier, Franck Cappello, Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (Ccgrid)
conference Pufferscale: Rescaling HPC Data Services for High Energy Physics Applications, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Stefan M Wild, Sven Leyffer, Robert Ross, Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (Ccgrid)
journal Mochi: Composing Data Services for High-Performance Computing Environments, Robert B Ross, George Amvrosiadis, Philip Carns, Charles D Cranor, Matthieu Dorier, Kevin Harms, Greg Ganger, Garth Gibson, Samuel K Gutierrez, Robert Latham, Bob Robey, Dana Robinson, Bradley Settlemyer, Galen Shipman, Shane Snyder, Jerome Soumagne, Qing Zheng, Journal of Computer Science and Technology (Springer JCST)
report A Software Defined Storage Approach to Exascale Storage Services, Jerome Soumagne, Robert Ross, Galen Shipman, George Amvrosiadis, Neil Fortner, Dana Robinson, Philip Carns, Matthieu Dorier, Robert Latham, Shane Snyder, David Rich, Bradley Settlemyer, Chuck Cranor, Greg Ganger, Qing Zheng, Technical Report
journal MPI jobs within MPI jobs: A practical way of enabling task-level fault-tolerance in HPC workflows, Justin Wozniak, Matthieu Dorier, Robert Ross, Tong Shu, Tahsin Kurc, Li Tang, Norbert Podhorszki, Matthew Wolf, Future Generation Computer Systems (Elsevier FGCS)
conference Is it Worth Relaxing Fault Tolerance to Speed Up Decommission in Distributed Storage Systems?, Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu, Proceedings of the International Symposium in Cluster, Cloud, and Grid Computing (CCgrid)
workshop Methodology for the Rapid Development of Scalable HPC Data Services, Matthieu Dorier, Philip Carns, Kevin Harms, Robert Latham, Robert Ross, Shane Snyder, Justin Wozniak, Samuel Gutierrez, Bob Robey, Brad Settlemyer, Galen Shipman, Jerome Soumagne, James Kowalkowski, Marc Paterno, Saba Sehrish, Proceedings of the PDSW-DISC 2018 workshop (SC18)
report Performance-Constrained In Situ Visualization of Atmospheric Simulations, Matthieu Dorier, Robert Sisneros, Leonardo Bautista-Gomez, Tom Peterka, Leigh G Orf, Rob Ross, Lokman Rahmani, Gabriel Antoniu, Luc Bougé, RR-8855
poster A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge, Dong Dai, Robert Ross, Dounia Khaldi, Yonghong Yan, Matthieu Dorier, Neda Tavakoli, Yong Chen, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)
poster Design and Evaluation of Topology-aware Scatter and AllGather Algorithms for Dragonfly Networks, Nathanael Cheriere, Matthieu Dorier, Rob Ross, Shadi Ibrahim, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - ACM Student Research Competition
journal Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Robert Sisneros, Orcun Yildiz, Shadi Ibrahim, Tom Peterka, Leigh Orf, ACM Transactions on Parallel Computing (ToPC)
conference Evaluation of Topology-Aware Broadcast Algorithms for Dragonfly Networks, Matthieu Dorier, Misbah Mubarak, Rob Ross, Jianping Kelvin Li, Christopher D. Carothers, Kwan-Liu Ma, IEEE International Conference on Cluster Computing (CLUSTER)
conference Adaptive Performance-Constrained In Situ Visualization of Atmospheric Simulations, Matthieu Dorier, Robert Sisneros, Leonardo Bautista Gomez, Tom Peterka, Leigh Orf, Lokman Rahmani, Gabriel Antoniu, Luc Bougé, IEEE International Conference on Cluster Computing (CLUSTER)
conference On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems, Orcun Yildiz, Matthieu Dorier, Shadi Ibrahim, Rob Ross, Gabriel Antoniu, IEEE International Parallel and Distributed Processing Symposium (IPDPS)
journal On the Energy Footprint of I/O Management in Exascale HPC Systems, Matthieu Dorier, Orcun Yildiz, Shadi Ibrahim, Anne-Cécile Orgerie, Gabriel Antoniu, Elsevier Future Generation Computer Systems (FGCS)
journal Using Formal Grammars to Predict I/O Behaviors in HPC: the Omnisc’IO Approach, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Rob Ross, IEEE Transactions on Parallel and Distributed Systems (TPDS)
workshop Lessons Learned from Building In Situ Coupling Frameworks, Matthieu Dorier, Matthieu Dreher, Tom Peterka, Gabriel Antoniu, Bruno Raffin, Justin M. Wozniak, First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV)
conference Omnisc’IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Robert Ross, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC14)
conference CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination, Matthieu Dorier, Gabriel Antoniu, Robert Ross, Dries Kimpe, Shadi Ibrahim, IEEE International Parallel and Distributed Processing Symposium (IPDPS)
workshop A Performance and Energy Analysis of I/O Management Approaches for Exascale Systems, Orcun Yildiz, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, Sixth International Workshop on Data Intensive Distributed Computing (DIDC)
conference Damaris/Viz: a Nonintrusive, Adaptable and User-Friendly In Situ Visualization Framework, Matthieu Dorier, Roberto R. Sisneros, Tom Peterka, Gabriel Antoniu, Dave B. Semeraro, IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)
conference Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O, Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Leigh G. Orf, IEEE International Conference on Cluster Computing (CLUSTER)
conference BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications, Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bougé, Matthieu Dorier, IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
DOE-funded project working on designing efficient building blocks for data services in HPC systems. In this context I am the lead on multiple libraries for storage, I/O, and networking.
New capabilities at ASCR computing facilities drive us to rethink what is possible within the HEP (High-Energy Physics) scientific workflow. Within this project I designed HEPnOS, a distributed storage system specifically optimized for event data produced by HEP workflows.
The purpose of the Joint Laboratory for Extreme Scale Computing (JLESC) is to be an international, virtual organization whose goal is to enhance the ability of member organizations and investigators to make the bridge between Petascale and Extreme computing.
Mochi is a set of libraries that my team and I are developing at Argonne National Laboratory. This project aims at providing libraries to build efficient HPC data services. Such libraries include threading, RPC, RDMA, key/value storage, document storage, blob storage, etc. Some of my most important contributions to these libraries, is thallium, a modern C++ library enabling threading, RPC, and RDMA, on top of the Argobots threading library and the Mercury networking library. The Mochi libraries are currently used by a growing number of users, including people from LANL, LLNL, LBNL, JGU Mainz, Intel, IIT, The HDF Group, BNL, Rutgers University, and FermiLab.
HEPnOS is an object store that I designed and built using the Mochi libraries in the context of a collaboration with FermiLab. The goal of HEPnOS is to provide a simple interface to HEP event-processing workflows in modern C++, backed with an efficient object store, as a potential replacement to the traditional file-based storage approach based on the ROOT file format. HEPnOS is currently actively developed and under evaluation with our partners at FermiLab.
Damaris is a data-management middleware for high-performance computing simulations. It enables to dedicate some of the cores in each node, or entire nodes, to run data management services, including asynchronous data transformation and storage, and in situ visualization and analysis. I developed Damaris during my Master and PhD (2010 to 2014) and published a number of papers about it. Following my graduation, the KerData team of INRIA Rennes hired an engineer to continue developing and supporting it. To this day, Damaris is still actively maintained by the KerData team.
The Gilles Kahn prize is awarded every year by the Société Informatique de France and the French Academy of Science to the three best PhD theses in computer science in France (one first prize and two honorary prizes). It is one of the most prestigious PhD awards in Computer Science in France. It values the originality of the research, the originality of the domain and methods employed, the importance and impact of the results on the community, and the quality of the manuscript.
This label is awarded by the CPU (Conférence des Présidents d’Universités), GENCI, and the Maison de la Simulation, to doctors who demonstrated skills related to high performance computing during their PhD. The C3I label is multidisciplinary, and covers all domains of science, from theory to applied research. The candidate must have shown evidence of skills related to the use and application of HPC (optimization of parallel codes, distributed and parallel algorithms, large scale data management…). The label is awarded as a mean to increase the visibility of the work conducted by these young doctors, and provides them with an additional asset in their career, whether this career evolves in the academic sector or in the industry.
The PhD award from the Fondation Rennes 1 is given every year to 8 outstanding new doctors from the 4 doctoral schools associated with the University of Rennes 1 (2 awards per doctoral school). The candidates are judged on the innovative aspects of their PhD thesis, “innovative” being understood in the sense of impact on socioeconomic development and technology transfers.
The ACM Student Research Competition is an internationally recognized venue enabling undergraduate and graduate students to experience the research world, share research results and exchange ideas with other students, judges, and conference attendees, rub shoulders with academic and industry luminaries, understand the practical applications of their research, perfect their communication skills, and receive prizes and gain recognition from ACM and the greater computing community.