🐨 About Me

I am Fan Zhou, currently a 1st year PhD student at Shanghai Jiao Tong University, advised by Prof. Pengfei Liu. My research focuses on scalable methods (e.g., scaling data pipelines, stablizing agentic scaffolds) for building performant models, with the aim of contributing to powerful general-purpose AI (or AGI if you would like to call it) . Recently, I’ve been particularly interested in the following areas:

Developing Data-Centric Recipes for Foundation Models
(GURU, OctoThinker, MegaMath, ProX, Sailor2)
Building Agentic AI for Real-World Scenarios
(Qwen3-Coder, Qwen Code, OpenAgents, Lemur)

🔥 News

2025.09: 🔥 Qwen3-Coder is released, an agentic coding model for the world.
2025.09: 📄 GURU paper is accepted by Neurips'25.
2025.07: 📄 MegaMath paper is accepted by COLM'25.
2025.06: 🙋 We release GURU, a large-scale RL Study for general-purpose reasoning models across 6 domains.
2025.05: 📄 ProX and MSTaR paper are accepted by ICML'25.
2025.04: 🔥 Say hi to OctoThinker, a mid-training ablation study in the era of RL scaling.
2025.04: 🔥 We release MegaMath, the largest math pre-training dataset to date containing 370B tokens.
2024.12: 🔥 Enjoy Sailor2, a state-of-the-art language model family for south-east asia.
2024.11: 🔥 We have released MStaR, a self-evolving training recipe for multimodal reasoning.
2024.09: 🔥 We have released ProX, a small-LM-based pre-training data refining framework!
2024.09: 📄 OlympicArena paper is accepted by Neurips'24.
2024.07: 📄 OpenAgents paper is accepted by COLM'24.
2024.05: 📄 Preference Dissection paper is accepted by ACL'24.
2024.01: 📄 Our Lemur paper(Agent Model) is accepted by ICLR'24 (Spotlight, 5%).
2023.10: 🔥 We've built OpenAgents, an open platform for language agents in the wild!
2023.10: 🙋 We have released Lemur-70B, an agentic language model based on LLama-2!
2023.04: 🔥 New preprint applying symbolic tasks in instruction tuning
2022.10: 📄 Our TaCube paper(Table QA) is accepted by EMNLP'22 (Oral Presentation).

📖 [Selected Projects] | [Full]

Qwen3-Coder: Agentic Coding in the World
Qwen Team
Blog / Code / Models /
Focused on Improving Agentic Coding Capabilities.

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Zhoujun Cheng*, Shibo Hao*, Tianyang Liu*, Fan Zhou, Feng Yao, Yuexin Bian, Yutao Xie, … , Zhengzhong Liu, Zhiting Hu (*=equal contribution) [more authors]
NeurIPS 2025.
PDF / Code / Dataset /
GURU: An open RL suite for developing general-purpose reasoning models.

OctoThinker: Mid-Training Incentivizes RL Scaling
Zengzhi Wang*, Fan Zhou*, Xuefeng Li, Pengfei Liu
ICML 2025, AI4Math Workshop.
PDF / Blog / Code / Resources /
A mid-training ablation study in the era of RL scaling, with a 70+B token mid-training dataset.

Generative AI Act II: Test Time Scaling Drives Cognition Engineering
Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu
2025, Preprint.
PDF / Code /
A survey on Test Time Scaling.

MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou*, Zengzhi Wang*, Nikhil Ranjan, Zhoujun Cheng, Liping Tang, Guowei He, Zhengzhong Liu, Eric P. Xing
COLM 2025.
PDF / Code / Dataset (>70K Downloads, >350B Tokens) /
The largest open math pre-training dataset with 370B tokens.

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Longxu Dou*, Qian Liu*, Fan Zhou*, Changyu Chen*, … , Tianyu Pang, Chao Du, Wei Lu, Min Lin (*=equal contribution) [more authors]
2025, Tech Report.
PDF / Blog / Code / Resources /
An open state-of-the-art language model family for south-east asia languages, continually trained on Qwen-2.5.

Diving into Self-Evolving Training for Multimodal Reasoning
Wei Liu*, Junlong Li*, Xiwen Zhang, Fan Zhou, Yu Cheng, Junxian He, (*=equal contribution)
ICML 2025
PDF / Code / Resources / Project Page /
A self-evolving training recipe for multimodal reasoning, M-STaR.

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
Fan Zhou*, Zengzhi Wang*, Qian Liu, Junlong Li, Pengfei Liu, (*=equal contribution)
ICML 2025
PDF / Code / Dataset (>10K Downloads, >500B Tokens) / Project Page /
A small-LLM-based pre-training data refining framework via seamless program generation.

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
Neurips 2024 (DB track)
PDF / Code / Datasets / Project Page /
A challenging multi-modal olympic competition benchmark for LLMs and LVMs.

Dissecting Human and LLM Preferences
Junlong Li, Fan Zhou, Shichao Sun, Yikai Zhang, Hai Zhao, Pengfei Liu
ACL 2024
PDF / Code / Datasets /
Disentangling preferred and dispreferred features of LLM responses.

OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu, (*=equal contribution)
COLM 2024
PDF / Code / Blog (7.5K Users) /
An open platform for using, hosting, and building language agents.

Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu, (*=equal contribution)
ICLR 2024, Spotlight
PDF / Code / Models / Blog
A 70B agent model pre-trained with balanced code-text corpora, compatible with GPT-3.5.

From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning
Qian Liu*, Fan Zhou*, Zhengbao Jiang, Longxu Dou, Min Lin, (*=equal contribution)
Tech Report 2023
PDF / Code / Datasets & Models /
A symbolic and synthetic method for improving LM instruction tuning.

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems
Fan Zhou*, Haoyu Dong*, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang, (*=equal contribution)
NeurIPS 2022, 2nd MATH-AI Workshop
PDF
Inference time calibration for LLM-based numerical reasoning.

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data
Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Fan Cheng, Shi Han, Dongmei Zhang
EMNLP 2022, Oral
PDF
Pre-computing aggregation/arithmetic results to assist table numerical reasoning.

Table Pre-training: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks
Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang
IJCAI 2022 (survey track)
PDF
A survey on various tabular models, especially on the pretrained transformers.

Exploring Image Regions Not Well Encoded by an INN
Zenan Ling, Fan Zhou, Meng Wei, Quanshi Zhang
AISTATS 2022
PDF
An analysis on the normalizing flow’s generation flaws.

Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding
Haotian Ma, Hao Zhang, Fan Zhou, Quanshi Zhang
ICML 2022
PDF / Code
A quantitative analysis of CNNs.

Experiences

2021.09 - 2024.03, M.S.@SJTU, CS.
2017.09 - 2021.06, B.S.@SJTU, CS, IEEE honor class.

Service and Awards

Reviewer: ICLR, NeurIPS, COLM, ACL, IJCAI, COLING, …
MSRA: Award of Excellent Intern, 2022