Yiqing Xie's Personal Webpage

Language Technologies Institute, CMU. yiqingxi@andrew.cmu.edu

Rm 6607, Gates and Hillman Centers

4902 Forbes Ave

Pittsburgh, PA 15213, USA

I am a third-year Ph.D. student at the Language Technologies Institute of Carnegie Mellon University and I am working with Daniel Fried and Carolyn Rosé. Previously, I obtained my Master degree in the data mining group at the University of Illinois Urbana-Champaign supervised by Jiawei Han and obtained my Bachelor degree in Hong Kong University of Science and Technology, where I received the Academic Achievement Medal.

My research goal is to build generalizable, scalable and annotation efficient systems to assist human with practical tasks. This includes:

NLP for Code & Code for NLP

Annotation-efficient Machine Learning

Scalable Evaluation Methods


News

Jun 20, 2024 Check our new preprint on a comprehensive RAG-for-code benchmark (CodeRAG-Bench)
May 16, 2024 Really excited about our newly-accepted ACL 2024 paper on medical evaluation! (DocLens 🔍)
Apr 5, 2024 Gave a talk on our recent work on evaluation of medical text at Microsoft!
Mar 30, 2024 Check our new preprint on scalable code benchmark creation (CodeBenchGen 🤖💻)
Dec 10, 2023 Present our work on code translation at EMNLP!
Oct 9, 2023 One paper on code translation got accepted to EMNLP 2023 (CMTrans 🔄💻)!
Apr 4, 2023 One paper on unsupervised dense retrieval got accepted to SIGIR 2023 (Anchor-DR ⚓) !

Educations

Carnegie Mellon University 2022 - Present
Ph.D. in Language and Information Technology
Research focus: code generation, tools for LLMs, evaluation
Advisors: Daniel Fried and Carolyn Rosé
University of Illinois at Urbana-Champaign 2020 - 2022
Master of Science in Computer Science (GPA: 4.0/4.0)
Research focus: information extraction, graph-based machine learning
Advisor: Jiawei Han
Hong Kong University of Science and Technology 2016 - 2020
B. Sc. in Computer Science and double major in Mathematics (GPA: 3.9/4.3)
Research focus: graph-based machine learning, text mining
Advisor: Raymond Chi-Wing Wong

Work experience

Meta AI2024.05 - 2024.10
Research Intern, GenAI
Work on training a fine-grained critic-based evaluator model and use it to improve generators' factuality [FenCE]
Manager: Hejia Zhang, Di Jin, Sinong Wang
Microsoft Research Redmond 2023.06 - 2023.08
Research Intern, Health Futures
Work on a multi-aspect fine-grained evaluation framework of medical text generation [DocLens]
Manager: Sheng Zhang, Hao Cheng, Hoifung Poon
Microsoft Research Redmond 2022.05 - 2022.08
Research Intern, Productivity and Intelligence group
Work on continuously pre-trained models for zero-shot dense retrieval [Anchor-DR]
Manager: Chenyan Xiong
Alibaba DAMO Academy 2020.07 - 2021.02
Research Intern, Data Analytics and Intelligence Lab
Work on few-shot interaction recommendation under multiple scenarios [KoMen]
Manager: Yaliang Li, Bolin Ding

Honors and Awards

CMU Presidential Fellowship in LTI2024-2025
Siebel Scholar, class of 2022 2021-2022
Hong Kong University of Science and Technology Academic Achievement Medal (top 1%) 2020
Hong Kong Special Administrative Region Government Scholarship Fund - Reaching Out Award 2018
Hong Kong University of Science and Technology's Scholarship for Continuing Undergraduate Students 2017-2019
Dean’s List, Hong Kong University of Science and Technology Three times, 2017-2019
Silver medal of China Girls Math Olympiad 2015

Additional Information

Conference Reviews: ICLR 2025, ARR (June 2024, Apr 2024, Feb 2024, Dec 2023), EMNLP 2023, ACL 2023, TKDE 2023, COLING 2022, AACL 2022
Teaching Assistant: 11-711: Advanced NLP CMU, Fall 2024
Teaching Assistant: CS412: Introduction to Data Mining UIUC, Spring 2022
Teaching Assistant: COMP 2012: Object-Oriented Programming and Data Structures HKUST, Fall 2018
Teaching Assistant: COMP 1022P: Introduction to Java Programming HKUST, Fall 2018

Selected publications

For the completed list of publications, check here
  1. preprint
    Improving Model Factuality with Fine-grained Critique-based Evaluator
    Yiqing Xie, Wenxuan Zhou, Pradyot Prakash, Di Jin, Yuning Mao, Quintin Fettes, Arya Talebzadeh, Sinong Wang, Han Fang, Carolyn Rose, Daniel Fried, and Hejia Zhang
    (preprint, 2024)
  2. preprint
    CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
    Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, and Carolyn Rose
    (preprint, 2024)
  3. ACL
    DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation
    Yiqing Xie, Sheng Zhang, Hao Cheng, Pengfei Liu, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon, and Carolyn Rose
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL, 2024)
  4. EMNLP Findings
    Data Augmentation for Code Translation with Comparable Corpora and Multiple References
    Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn Rose
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings, 2023)
  5. SIGIR
    Unsupervised Dense Retrieval Training with Web Anchors
    Yiqing Xie, Xiao Liu, and Chenyan Xiong
    In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, 2023)
  6. ACL Findings
    Eider: Evidence-enhanced Document-level Relation Extraction
    Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, and Jiawei Han
    In Findings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL Findings, 2022)
  7. WWW
    KoMen: Domain Knowledge-Guided Few-Shot Interaction Recommendation on Multiplex Networks
    Yiqing Xie, Zhen Wang, Carl Yang, Yaliang Li, Hongbo Deng, Bolin Ding, and Jiawei Han
    In Proceedings of the Web Conference (WWW, 2022)
  8. IJCAI
    When Do GNNs Work: Understanding and Improving Neighborhood Aggregation
    Yiqing Xie*, Sha Li*, Carl Yang, Raymond Chi-Wing Wong, and Jiawei Han
    In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI, 2020)
  9. WWW
    Guiding Corpus-Based Set Expansion by Auxiliary Sets Generation and Co-Expansion
    Jiaxin Huang*, Yiqing Xie*, Yu Meng, Jiaming Shen, Yunyi Zhang, and Jiawei Han
    In Proceedings of The Web Conference (WWW, 2020)