About

I am Hao Liang (梁昊), a Ph.D. candidate at Center for Data Science, Peking University, jointly affiliated with Zhongguancun Academy. I am fortunate to be supervised by Prof. Wentao Zhang and Prof. Bin Dong, and to work closely with Prof. Bin Cui and Prof. Weinan E. I am also a research intern at Tencent HY under the Qingyun Program (青云计划), focusing on pretraining data preparation.

Prior to this, I received my bachelor’s degree from Beijing Institute of Technology, where I was awarded the Xu Teli Scholarship (the highest honor of BIT) and the National Scholarship. I also visited the University of Oxford, working with Prof. Ismail Ilkan Ceylan and Prof. Michael Bronstein.

News

Research Interests

My research focuses on Data-Centric AI, spanning three key directions:

  1. Data Infrastructure — Building scalable systems and pipelines for data preparation and data-model iterative training in large-scale AI.
  2. Data Understanding — Investigating how data quality, attribution, and scaling influence model performance.
  3. Data Agents — Designing autonomous agents that intelligently curate, transform, and manage data.

I have published 9 first-author / co-first-author papers at CCF-A venues and received the Sa Shixuan Best Student Paper Award at NDBC.

Open-Source Contributions

  • DataFlow: Lead designer of this open-source data processing framework, which has received 3,000+ GitHub Stars. DataFlow achieved 1st place in the ICML SeePhy Challenge and 1st place in the Zhiyuan LIC Challenge.
  • DataFlex: Lead designer of this open-source data-model iterative training framework.
  • LLaMA-Factory: Contributed to the data module design (65k+ Stars).
  • CAMEL: Integrated DataFlow into CAMEL’s data pipeline (16k+ Stars).

Honors & Awards

  • President’s Scholarship, Peking University
  • Industrial Bank Scholarship (兴业奖学金), Peking University
  • Xu Teli Scholarship (徐特立奖学金), Beijing Institute of Technology (Highest Honor)
  • National Scholarship, Beijing Institute of Technology
  • Sa Shixuan Best Student Paper Award (萨师煊优秀学生论文奖), NDBC