Yiqing Xie's Personal Webpage
Language Technologies Institute, CMU. yiqingxi@andrew.cmu.edu
Rm 6413, Gates and Hillman Centers
4902 Forbes Ave
Pittsburgh, PA 15213, USA
I am a fourth-year Ph.D. student at the Language Technologies Institute of Carnegie Mellon University and I am working with Carolyn Rosé and Daniel Fried. Previously, I obtained my Master degree in the data mining group at the University of Illinois Urbana-Champaign supervised by Jiawei Han and obtained my Bachelor degree in Hong Kong University of Science and Technology, where I received the Academic Achievement Medal.
My research mainly focuses on synthetic training data construction and automatic evaluation, especially for coding agent. The topics include: (i) Constructing scalable synthetic training data; (ii) Training models to generalize across diverse tasks; (iii) Improving model performance with auxiliary models and evaluation benchmarks.
Scalable Synthetic Training Data
- Pretraining & continuous pretraining (Anchor-DR, METRO-T0)
- Training environment (RepoST, Hybrid-Gym)
- Model-generated Reward Signals (FenCE)
- Data augmentation (Eider, Anchor-DR, CMTrans, FenCE)
- Guidance under heuristic metrics or prior knowledge (RL-MMR, AlaGCN, KoMen)
- Unsupervised or Semi-supervised methods (Set-CoExpan, CoRel)
Training Models for Task Generalization
- Zero-shot retriever training (Anchor-DR, METRO-T0)
- General-purpose coding agent post-training (Hybrid-Gym)
- Few-shot graph-based interaction recommendation (KoMen)
Auxiliary Model Training and Benchmark Construction
- Evaluation Benchmarks (CodeBenchGen, CodeRAG-Bench, TheAgentCompany, RepoST)
- Evaluation frameworks (DocLens)
- Auxiliary models in training and inference (FenCE, Strong-Weak-Colab)
Code Generation and Coding Agent
- Code generation training (CMTrans, RepoST)
- Code generation inference (SACL, Strong-Weak-Colab)
- Code generation evaluation (CodeBenchGen, CodeRAG-Bench, TheAgentCompany, RepoST)
News
| Feb 19, 2026 | New preprint on coding agent training for task generalization! (Hybrid-Gym) |
|---|---|
| Nov 11, 2025 | Gave a guest lecture in CMU 11-891 Neural code generation! (slides) |
| Aug 26, 2025 | Start TA-ing for 11-891 Neural code generation!! |
| Aug 20, 2025 | Two papers on repo-level code generation got accepted to EMNLP 2025!! (SACL, Strong-Weak-Colab) |
| Jul 7, 2025 | Oun paper on synthetic coding environment construction for repo-level code generation got accepted to COLM 2025! (RepoST) |
| Jun 25, 2025 | Really excited about our two new preprints on analysis for code generation! (SACL, Strong-Weak-Colab) |
| May 15, 2025 | One paper on factuality evaluator training got accepted to ACL 2025! (FenCE 🚧) |
| Apr 9, 2025 | Gave a talk about repo-level coding environment construction at the EFML Reading Group (Stanford / UW)! |
| Feb 19, 2025 | Really excited about our new preprint: RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing! (RepoST) |
| Jan 22, 2025 | One paper on RAG-for-code benchmark got accepted to NAACL-Findings 2025! (CodeRAG-Bench) |
Educations
Work experience
- Meta AI
2024.05 - 2024.10 - Research Intern, GenAI
- Work on training a fine-grained critic-based evaluator model and use it to improve generators' factuality [FenCE]
- Manager: Hejia Zhang; Peers: Di Jin, Sinong Wang
- Microsoft Research Redmond
2023.06 - 2023.08 - Research Intern, Health Futures
- Work on a multi-aspect fine-grained evaluation framework of medical text generation [DocLens]
- Manager: Sheng Zhang, Hao Cheng, Hoifung Poon
- Microsoft Research Redmond
2022.05 - 2022.08 - Research Intern, Productivity and Intelligence group
- Work on continuously pre-trained models for zero-shot dense retrieval [Anchor-DR]
- Manager: Chenyan Xiong
- Alibaba DAMO Academy
2020.07 - 2021.02 - Research Intern, Data Analytics and Intelligence Lab
- Work on few-shot interaction recommendation under multiple scenarios [KoMen]
- Manager: Yaliang Li, Bolin Ding
Honors and Awards
Additional Information
Selected publications
- preprintHybrid-Gym: Training Coding Agents to Generalize Across Tasks(preprint, 2026)
- COLMRepoST: Scalable Repository-Level Coding Environment Construction with Sandbox TestingIn Proceedings of the 2nd Conference on Language Modeling (COLM) (COLM, 2025)
- ACLImproving Model Factuality with Fine-grained Critique-based EvaluatorIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL, 2025)
- ACLDocLens: Multi-aspect Fine-grained Evaluation for Medical Text GenerationIn Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL, 2024)
- EMNLP FindingsData Augmentation for Code Translation with Comparable Corpora and Multiple ReferencesIn Findings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings, 2023)
- SIGIRUnsupervised Dense Retrieval Training with Web AnchorsIn Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, 2023)
- ACL FindingsEider: Evidence-enhanced Document-level Relation ExtractionIn Findings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL Findings, 2022)
- WWWKoMen: Domain Knowledge-Guided Few-Shot Interaction Recommendation on Multiplex NetworksIn Proceedings of the Web Conference (WWW, 2022)
- IJCAIWhen Do GNNs Work: Understanding and Improving Neighborhood AggregationIn Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI, 2020)
- WWWGuiding Corpus-Based Set Expansion by Auxiliary Sets Generation and Co-ExpansionIn Proceedings of The Web Conference (WWW, 2020)