π¨ Me:
I am Fan Zhou, currently doing AI research at GAIR Lab (2024 β Now) mentored by Prof. Pengfei Liu.
Previously, I obtained my Master Degree at Shanghai Jiao Tong University, under the supervise of Prof. Fan Cheng and Prof. Hongfei Fu, major in computer science. I completed my B.S. degree in IEEE honor class at SJTU. Iβve interned/worked at Interpretable ML lab(2019), Microsoft Research Asia(2021-2022), XLang Lab@HKUNLP(2023).
I also work closely with Qian Liu at Sea AI Lab.
Research Interests
I am generally interested in Natural Language Processing, and Machine Learning. Recently, I am specifically interested in:
- Data-Centric Methods: [ProX]. Please kindly check my Data-Centric Reading List π
- Code Generation, Understanding, and Reasoning
- Agentic Language Models and Applications [OpenAgents, Lemur]
π₯ News
- 2024.09: π₯ We have release ProX, a slm-based pre-training data refining framework.
- 2024.09: π OlympicArena paper is accepted by Neurips'24.
- 2024.07: π OpenAgents paper is accepted by COLM'24.
- 2024.05: π Preference Dissection paper is accepted by ACL'24.
- 2024.01: π Our Lemur paper(Agent Model) is accepted by ICLR'24 (Spotlight, 5%).
- 2023.10: π₯ We've built OpenAgents, an open platform for language agents in the wild!
- 2023.10: π We have released Lemur-70B, an agentic language model based on LLama-2!
- 2023.04: π₯ New preprint applying symbolic tasks in instruction tuning
- 2022.10: π Our TaCube paper(Table QA) is accepted by EMNLP'22 (Oral Presentation).
π Publications
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
Fan Zhou*, Zengzhi Wang*, Qian Liu, Junlong Li, Pengfei Liu, (*=equal contribution)
2024, Preprint | [π PDF] | [π Code] | [π€ hf repo]| [π Page] |
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
Neurips 2024 (DB track) | [π PDF] | [π Code] | [π€ hf datasets]| [π Page] |
Dissecting Human and LLM Preferences
Junlong Li, Fan Zhou, Shichao Sun, Yikai Zhang, Hai Zhao, Pengfei Liu
ACL 2024 | [π PDF] | [π Code] | [π Page] |
OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu, (*=equal contribution)
COLM 2024 | [π PDF] | [π Code] | [π Blog] |
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu, (*=equal contribution)
ICLR 2024, Spotlight | [π PDF] | [π Code] | [π€ hf models] | [π Blog] |
From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning
Qian Liu*, Fan Zhou*, Zhengbao Jiang, Longxu Dou, Min Lin, (*=equal contribution)
(2023, Preprint) | [π PDF] | [π Code] | [π€ hf datasets & models] |
Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems
Fan Zhou*, Haoyu Dong*, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang, (*=equal contribution)
NeurIPS 2022, 2nd MATH-AI Workshop | [π PDF]
TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data
Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Fan Cheng, Shi Han, Dongmei Zhang
EMNLP 2022, Oral | [π PDF]
Table Pre-training: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks
Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang
IJCAI 2022(survey track) | [π PDF]
Exploring Image Regions Not Well Encoded by an INN
Zenan Ling, Fan Zhou, Meng Wei, Quanshi Zhang
AISTATS 2022 | [π PDF]
Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding
Haotian Ma, Hao Zhang, Fan Zhou, Quanshi Zhang
ICML 2022 | [π PDF] | [π Code]
π₯ Projects
Host your own ChatGPT Plus locally!
- Data Agent: code interpreter augmented with data tools
- Plugins Agent: 200+ plugins for daily life
- Web Agent: autonomous web browsing
π Service
- Workshop Reviewer: Instruction Workshop @ NeurIPS 2023, MATH-AI Workshop @ NeurIPS 2024.
- Conference Reviewer: COLING 2024~2025, ICLR 2025
π² Exeperiences
Academia
- 2021.09 - 2024.03, M.S.@SJTU, Computer Science & Engineering
- 2017.09 - 2021.06, B.S.@SJTU, IEEE honor class, Computer Science.
Others
- 2023.04 - 2023.12, Research Assistant@XLang Lab, HKU. supervised by Prof. Tao Yu.
- 2021.10 - 2022.10, Research Intern@Microsoft Research Asia. supervised by Haoyu Dong.
- 2019.05 - 2019.10, Research Intern@Interpretable ML Lab, SJTU. supervised by Prof. Quanshi Zhang.
π Honors and Awards
- MSRA Stars of Tomorrow (Award of Excellent Intern), 2022
- Outstanding Graduates of SJTU, 2021
- SJTU Academic Scholarship, 2017~2020
- Shanghai City Scholarship(βtop 5%), 2018