Yiqing Xie's Personal Webpage

Language Technologies Institute, CMU. yiqingxi@andrew.cmu.edu

Rm 6607, Gates and Hillman Centers

4902 Forbes Ave

Pittsburgh, PA 15213, USA

I am a second-year Ph.D. student at the Language Technologies Institute of Carnegie Mellon University and I am working with Prof. Carolyn Rosé and Prof. Daniel Fried. Previously, I obtained my Master degree in the data mining group at the University of Illinois Urbana-Champaign supervised by Prof. Jiawei Han and obtained my Bachelor degree in Hong Kong University of Science and Technology, where I received the Academic Achievement Medal.

My research goal is to build generalizable, scalable and annotation efficient systems to reduce human labor. This includes:

NLP for Code & Code for NLP

  • Code generation systems (CMTrans)
  • Code generation evaluation (CodeBenchGen)
  • Code generation to assist other tasks (TBD)

Annotation-efficient Model Training

Scalable Evaluation Methods

  • Evaluation examples generation (CodeBenchGen)
  • Automatic evaluation framework (DocLens)
  • Evaluator model training (TBD)


May 16, 2024 Really excited about our newly-accepted ACL 2024 paper on medical evaluation! (DocLens 🔍)
Apr 5, 2024 Gave a talk on our recent work on evaluation of medical text at Microsoft!
Mar 30, 2024 Check our new preprint on scalable code benchmark creation (CodeBenchGen 🤖💻)
Dec 10, 2023 Present our work on code translation at EMNLP!
Oct 9, 2023 One paper on code translation got accepted to EMNLP 2023 (CMTrans 🔄💻)!
Apr 4, 2023 One paper on unsupervised dense retrieval got accepted to SIGIR 2023 (Anchor-DR ⚓) !


Carnegie Mellon University 2022 - Present
Ph.D. in Language and Information Technology
Research focus: NLP for code and code for NLP
Working with Carolyn Rosé and Daniel Fried
University of Illinois at Urbana-Champaign 2020 - 2022
Master of Science in Computer Science (GPA: 4.0/4.0)
Research focus: information extraction, graph-based machine learning
Advisor: Jiawei Han
Hong Kong University of Science and Technology 2016 - 2020
B. Sc. in Computer Science and double major in Mathematics (GPA: 3.9/4.3)
Research focus: text mining, graph-based machine learning
Advisor: Raymond Chi-Wing Wong

Work experience

Meta AI(Expected) 2024.05 - 2023.08
Research Intern, GenAI
Manager: Hejia Zhang
Microsoft Research Redmond 2023.06 - 2023.08
Research Intern, Health Futures
Work on evaluation of medical text generation
Manager: Sheng Zhang, Hao Cheng
Microsoft Research Redmond 2022.05 - 2022.08
Research Intern, Productivity and Intelligence group
Work on pre-trained language models for better text sequence embeddings
Manager: Chenyan Xiong
Alibaba DAMO Academy 2020.07 - 2021.01
Research Intern, Data Analytics and Intelligence Lab
Work on few-shot interaction recommendation under multiple scenarios in Taobao
Manager: Yaliang Li, Bolin Ding

Honors and Awards

Siebel Scholar, class of 2022 2021-2022
Hong Kong University of Science and Technology Academic Achievement Medal (top 1%) 2020
Hong Kong Special Administrative Region Government Scholarship Fund - Reaching Out Award 2018
Hong Kong University of Science and Technology's Scholarship for Continuing Undergraduate Students 2017-2019
Dean’s List, Hong Kong University of Science and Technology Three times, 2017-2019
Silver medal of China Girls Math Olympiad 2015
Second prize of National Olympiad in Mathematics, Guangdong Province, China 2015

Additional Information

Conference Reviews: ARR (Apr 2024, Feb 2024, Dec 2023), EMNLP 2023, ACL 2023, TKDE 2023, COLING 2022, AACL 2022
Teaching Assistant: CS412: Introduction to Data Mining UIUC, Spring 2022
Teaching Assistant: COMP 2012: Object-Oriented Programming and Data Structures HKUST, Fall 2018
Teaching Assistant: COMP 1022P: Introduction to Java Programming HKUST, Fall 2018

Selected publications

For the completed list of publications, check here
  1. preprint
    CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
    Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, and Carolyn Rose
    (preprint, 2024)
  2. ACL
    DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation
    Yiqing Xie, Sheng Zhang, Hao Cheng, Pengfei Liu, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon, and Carolyn Rose
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL, 2024)
  3. EMNLP Findings
    Data Augmentation for Code Translation with Comparable Corpora and Multiple References
    Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn Rose
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings, 2023)
  4. SIGIR
    Unsupervised Dense Retrieval Training with Web Anchors
    Yiqing Xie, Xiao Liu, and Chenyan Xiong
    In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, 2023)
  5. ACL Findings
    Eider: Evidence-enhanced Document-level Relation Extraction
    Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, and Jiawei Han
    In Findings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL Findings, 2022)
  6. WWW
    KoMen: Domain Knowledge-Guided Few-Shot Interaction Recommendation on Multiplex Networks
    Yiqing Xie, Zhen Wang, Carl Yang, Yaliang Li, Hongbo Deng, Bolin Ding, and Jiawei Han
    In Proceedings of the Web Conference (WWW, 2022)
  7. IJCAI
    When Do GNNs Work: Understanding and Improving Neighborhood Aggregation
    Yiqing Xie*, Sha Li*, Carl Yang, Raymond Chi-Wing Wong, and Jiawei Han
    In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI, 2020)
  8. WWW
    Guiding Corpus-Based Set Expansion by Auxiliary Sets Generation and Co-Expansion
    Jiaxin Huang*, Yiqing Xie*, Yu Meng, Jiaming Shen, Yunyi Zhang, and Jiawei Han
    In Proceedings of The Web Conference (WWW, 2020)