Yiqing Xie's Personal Webpage
Language Technologies Institute, CMU. yiqingxi@andrew.cmu.edu
Rm 6607, Gates and Hillman Centers
4902 Forbes Ave
Pittsburgh, PA 15213, USA
I am a third-year Ph.D. student at the Language Technologies Institute of Carnegie Mellon University and I am working with Daniel Fried and Carolyn Rosé. Previously, I obtained my Master degree in the data mining group at the University of Illinois Urbana-Champaign supervised by Jiawei Han and obtained my Bachelor degree in Hong Kong University of Science and Technology, where I received the Academic Achievement Medal.
My research goal is to build generalizable, scalable and annotation efficient systems to assist human with practical tasks. This includes:
NLP for Code & Code for NLP
- Code generation systems (CMTrans)
- Code generation evaluation (CodeBenchGen, CodeRAG-Bench)
- Code generation to assist other tasks (TBD)
Annotation-efficient Machine Learning
- Pretraining & continuous pretraining (Anchor-DR, METRO-T0)
- Construction of silver labels (Anchor-DR, CMTrans, Eider)
- Model-generated signals (METRO-T0)
- Guidance under heuristic metrics or prior knowledge (AlaGCN, RL-MMR, KoMen)
Scalable Evaluation Methods
- Synthetic benchmark generation (CodeBenchGen)
- Evaluation framework (DocLens)
- Evaluator model training (FenCE)
News
Jun 20, 2024 | Check our new preprint on a comprehensive RAG-for-code benchmark (CodeRAG-Bench) |
---|---|
May 16, 2024 | Really excited about our newly-accepted ACL 2024 paper on medical evaluation! (DocLens 🔍) |
Apr 5, 2024 | Gave a talk on our recent work on evaluation of medical text at Microsoft! |
Mar 30, 2024 | Check our new preprint on scalable code benchmark creation (CodeBenchGen 🤖💻) |
Dec 10, 2023 | Present our work on code translation at EMNLP! |
Oct 9, 2023 | One paper on code translation got accepted to EMNLP 2023 (CMTrans 🔄💻)! |
Apr 4, 2023 | One paper on unsupervised dense retrieval got accepted to SIGIR 2023 (Anchor-DR ⚓) ! |
Educations
Work experience
- Meta AI
2024.05 - 2024.10 - Research Intern, GenAI
- Work on training a fine-grained critic-based evaluator model and use it to improve generators' factuality [FenCE]
- Manager: Hejia Zhang, Di Jin, Sinong Wang
- Microsoft Research Redmond
2023.06 - 2023.08 - Research Intern, Health Futures
- Work on a multi-aspect fine-grained evaluation framework of medical text generation [DocLens]
- Manager: Sheng Zhang, Hao Cheng, Hoifung Poon
- Microsoft Research Redmond
2022.05 - 2022.08 - Research Intern, Productivity and Intelligence group
- Work on continuously pre-trained models for zero-shot dense retrieval [Anchor-DR]
- Manager: Chenyan Xiong
- Alibaba DAMO Academy
2020.07 - 2021.02 - Research Intern, Data Analytics and Intelligence Lab
- Work on few-shot interaction recommendation under multiple scenarios [KoMen]
- Manager: Yaliang Li, Bolin Ding
Honors and Awards
Additional Information
Selected publications
- preprintImproving Model Factuality with Fine-grained Critique-based Evaluator(preprint, 2024)
- preprint
- ACLDocLens: Multi-aspect Fine-grained Evaluation for Medical Text GenerationIn Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL, 2024)
- EMNLP FindingsData Augmentation for Code Translation with Comparable Corpora and Multiple ReferencesIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings, 2023)
- SIGIRUnsupervised Dense Retrieval Training with Web AnchorsIn Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, 2023)
- ACL FindingsEider: Evidence-enhanced Document-level Relation ExtractionIn Findings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL Findings, 2022)
- WWWKoMen: Domain Knowledge-Guided Few-Shot Interaction Recommendation on Multiplex NetworksIn Proceedings of the Web Conference (WWW, 2022)
- IJCAIWhen Do GNNs Work: Understanding and Improving Neighborhood AggregationIn Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI, 2020)
- WWWGuiding Corpus-Based Set Expansion by Auxiliary Sets Generation and Co-ExpansionIn Proceedings of The Web Conference (WWW, 2020)