Adam Weingram, Duo Zhang, Zhonghao Chen, Hao Qi, Xiaoyi Lu
Large Reasoning Models (LRMs) are becoming increasingly popular as they offer advanced capabilities in logical inference, mathematical reasoning, and knowledge synthesis, even beyond those of standard language models. However, their complex training workflows present significant challenges in reproducibility, efficiency, and system-level optimization. This paper introduces HPC-R1, a comprehensive characterization of LRM training on the NERSC Perlmutter supercomputer, representing behavior on a Top500-ranked system. We analyze all major stages, including supervised fine-tuning (SFT), Group Relative Policy Optimization (GRPO)-based reinforcement learning (RL), autoregressive generation, and distillation using customized state-of-the-art frameworks. Our detailed performance analysis reveals key system inefficiencies and scaling behaviors. Through our in-depth analysis, we present 19 key observations across all stages, including 4 for SFT, 7 for GRPO-based RL, 6 for generation, and 2 for distillation. Based on these findings, we present several key recommendations to guide future HPC-AI system design.