Publications
* and ^ represent equal-contribution groups.
Selected Publication
Alignment
-
π The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
βοΈγBill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi
π’γICLR 2024
[π Website] [πΎ Github]
[π€ Demo (BaseChat)] [π€ URIAL-Bench] [π¦ Tweet 1] [π¦ 2] -
π¦ββ¬ Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin
π’γarXiv
[π» Website] [π€ HF] [π€ Demo (by @davanstrien)] [πΎ Github] [π¦ Tweet] -
π₯ LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion
βοΈγDongfu Jiang, Xiang Ren, Bill Yuchen Lin
π’γto appear in Proc. of ACL 2023
[π Website] [πΎ Github] [π¦ Tweet]
Media coverage : MarkTechPost -
βοΈ LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
βοΈγChengsong Huang*, Qian Liu*, Bill Yuchen Lin*, Tianyu Pang, Chao Du, Min Lin
π’γCOLM 2024
[πΎ Demo] [πΎ Github] [π¦ Tweet]
Evaluation
-
π¦ WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi
π’γarXiv
[π€ Leaderboard] [πΎ Github] [π¦ Tweet] -
π WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin
π’γNeurIPS 2024 (D&B track)
[π€ Leaderboard] [π€ Hugging Face] [πΎ Github] [π¦ Tweet] -
π RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
π’γarXiv
Agents
-
π₯ SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
βοΈγBill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, Xiang Ren
π’γNeurIPS 2023 (spotlight)
[π Website] [πΎ Github] [π¦ Tweet] [π° Blog] -
πͺ Agent Lumos: Unified and Modular Training for Open-Source Language Agents
βοΈγDa Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin
π’γACL 2024 Main Conference
[π Website] [πΎ Github] [π¦ Tweet] -
π€ Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
βοΈγYifan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin
π’γACL 2024 Main Conference
[πΎ Github]
Preprints
-
π¦ββ¬ Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin
π’γarXiv
[π» Website] [π€ HF] [π€ Demo (by @davanstrien)] [πΎ Github] [π¦ Tweet] [π€ Magpie LM Collection] -
π¦ WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi
π’γarXiv
[π€ Leaderboard] [πΎ Github] [π¦ Tweet] -
π¦ ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models
Bill Yuchen Lin, Ronan Le Bras, Peter Clark, Yejin Choi
π’γBlog post
[π» Website] -
π€ Latent Action Pretraining from Videos
Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Sejune Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, Lars Liden, Kimin Lee, Jianfeng Gao, Luke Zettlemoyer, Dieter Fox, Minjoon Seo
π’γarXiv
[πΎ Project Page] [π¦ Tweet] -
πΎ On Memorization of Large Language Models in Logical Reasoning
Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar
π’γarXiv
[πΎ Project Page] [π¦ Tweet] -
π SimulBench: Evaluating Language Models with Creative Simulation Tasks
Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin
π’γarXiv
[π» Website] -
πΌ OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation
Zilong Wang, Yuedong Cui, Li Zhong, Zimin Zhang, Da Yin, Bill Yuchen Lin, Jingbo Shang
π’γarXiv
[πΎ Github] -
π RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
π’γarXiv -
𧩠L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
βοΈγYutaro Yamada, Khyathi Chandu, Bill Yuchen Lin, Jack Hessel, Ilker Yildirim, Yejin Choi
π’γarXiv -
π² Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
βοΈγJoel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu
π’γarXiv
[πΎ Github]
2024
-
π The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
βοΈγBill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi
π’γICLR 2024
[π Website] [πΎ Github]
[π€ Demo (BaseChat)] [π€ URIAL-Bench] [π¦ Tweet 1] [π¦ 2] -
βοΈ LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
βοΈγChengsong Huang*, Qian Liu*, Bill Yuchen Lin*, Tianyu Pang, Chao Du, Min Lin
π’γCOLM 2024
[πΎ Demo] [πΎ Github] [π¦ Tweet] -
π WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin
π’γNeurIPS 2024 (D&B track)
[π€ Leaderboard] [π€ Hugging Face] [πΎ Github] [π¦ Tweet] -
πͺ Agent Lumos: Unified and Modular Training for Open-Source Language Agents
βοΈγDa Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin
π’γACL 2024 Main Conference
[π Website] [πΎ Github] [π¦ Tweet] -
π€ Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
βοΈγYifan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin
π’γACL 2024 Main Conference
[πΎ Github] -
π‘οΈ SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
βοΈγZhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bill Yuchen Lin, Radha Poovendran
π’γACL 2024 Main Conference
[πΎ Github] -
π¨ WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri
π’γNeurIPS 2024 (D&B track)
[πΎ Github] -
π Selective βSelective Predictionβ: Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu
π’γACL 2024 Findings -
π» OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
βοΈγTianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue
π’γACL 2024 Findings [πΎ Website] [πΎ Demo] [πΎ Code] [πΎ Models] -
π₯ Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim, Juyoung Suk, Shayne Longpre, Bill Yuchen Lin, Jamin Shin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo
π’γEMNLP 2024 -
πΊ VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, β¦, Bill Yuchen Lin, Wenhu Chen
π’γEMNLP 2024 -
π Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT4
Jiaxian Guo*, Bo Yang*, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, Yutaka Matsuo
π’γCOLM 2024
[πΎ Github] [π¦ Tweet] -
πΈοΈ VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue
π’γCOLM 2024 -
π― TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
βοΈγDongfu Jiang*, Yishan Li*, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen
π’γTMLR
[πΎ Github] [π Website] [π¦ Tweet]
2023
-
π₯ SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
βοΈγBill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, Xiang Ren
π’γNeurIPS 2023 (spotlight)
[π Website] [πΎ Github] [π¦ Tweet] [π° Blog] -
π₯ Faith and Fate: Limits of Transformers on Compositionality
βοΈγNouha Dziri*, Ximing Lu*, Melanie Sclar*, Xiang Lorraine Li^, Liwei Jiang^, Bill Yuchen Lin^,
Peter West, Chandra Bhagavatula, Ronan Le Bras,Jena Hwang,Soumya Sanyal,Sean Welleck,Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi
π’γNeurIPS 2023 (spotlight)
[π¦ Tweet] -
π₯ LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion
βοΈγDongfu Jiang, Xiang Ren, Bill Yuchen Lin
π’γto appear in Proc. of ACL 2023
[π Website] [πΎ Github] [π¦ Tweet]
Media coverage : MarkTechPost -
πΊ Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
βοΈγXiming Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu, Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu,
Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler Hallinan, Xiang Ren, Sean Welleck, Yejin Choi
π’γEMNLP 2023 (Main) -
NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
βοΈγPeter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi
π’γEMNLP 2023 (Findings) -
On Grounded Planning for Embodied Tasks with Language Models
βοΈγBill Yuchen Lin*, Chengsong Huang*, Qian Liu, Wenda Gu, Sam Sommerer, Xiang Ren
π’γin Proc. of AAAI 2023
[π Website] [πΎ Github] [π€ Data]
Media coverage : USC Viterbi News -
AutoTriggER: Named Entity Recognition with Auxiliary Trigger Extraction
βοΈγDong-Ho Lee, Ravi Kiran Selvam, Sheikh Muhammad Sarwar, Bill Yuchen Lin,
Mahak Agarwal, Fred Morstatter, Jay Pujara, Elizabeth Boschee, James Allan, Xiang Ren
π’γin Proc. of EACL 2023, also presented at TrustNLP @ NAACL 2021 (best paper award) -
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
βοΈγ442 authors including Bill Yuchen Lin
Β [πΎ Github]
π’γin TMLR
2022
-
Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality
βοΈγPei Zhou, Hyundong J. Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, Jay Pujara, Xiang Ren
π’γin Proc. of EMNLP 2022
[π¦ Tweet] -
Unsupervised Cross-Task Generalization via Retrieval Augmentation
βοΈγBill Yuchen Lin, Kangmin Tan, Chris Miller, Beiwen Tian, Xiang Ren
π’γin Proc. of NeurIPS 2022
Β [π Website] Β [πΎ Github] [πΌοΈ Slides] Β [π¦ Video] [π¦ Tweet] -
On Continual Model Refinement in Out-of-Distribution Data Streams
βοΈγBill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Scott Yih
π’γin Proc. of ACL 2022
Β [π Website] Β [πΎ Github] [πΌοΈ Slides] Β [π¦ Video] [π¦ Tweet] -
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks
βοΈγBill Yuchen Lin*, Chaoyang He*, Zihang Zeng, Hulin Wang, Yufen Huang, Mahdi Soltanolkotabi, Xiang Ren^, Salman Avestimehr^
π’γin Proc. of NAACL 2022 Findings
[πΎ Github] [π¦ Tweet] -
On the Robustness of Reading Comprehension Models to Entity Renaming
βοΈγJun Yan, Yang Xiao, Sagnik Mukherjee, Bill Yuchen Lin, Robin Jia, Xiang Ren
π’γin Proc. of NAACL 2022
2021
-
CrossFit: A Few-shot Learning Challenge for Cross-Task Generalization in NLP
βοΈγQinyuan Ye, Bill Yuchen Lin, Xiang Ren
π’γin Proc. of EMNLP 2021
[πΎ Github] [π¦ Tweet] -
Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning
βοΈγXisen Jin, Bill Yuchen Lin, Mohammad Rostami, Xiang Ren
π’γin Proc. of EMNLP 2021 Findings
[πΎ Github] -
RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of NER Models
βοΈγBill Yuchen Lin, Wenyang Gao, Jun Yan, Ryan Moreno, Xiang Ren
π’γin Proc. of EMNLP 2021 (short)
Β [π Website] -
RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms
βοΈγPei Zhou, Rahul Khanna, Seyeon Lee, Bill Yuchen Lin, Daniel Ho, Jay Pujara, Xiang Ren
π’γin Proc. of EMNLP 2021
Β [π Website] -
Probing Commonsense Explanation in Dialogue Response Generation
βοΈγPei Zhou, Pegah Jandaghi, Hyundong Cho, Bill Yuchen Lin, Jay Pujara, Xiang Ren
π’γin Proc. of EMNLP 2021 Findings -
Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning
βοΈγBill Yuchen Lin, Seyeon Lee, Xiaoyang Qiao, Xiang Ren
π’γin Proc. of ACL 2021
[πΎ Github] Β Β [π Website] -
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge
βοΈγBill Yuchen Lin, Ziyi Wu, Yichi Yang, Dong-Ho Lee, Xiang Ren
π’γin Proc. of ACL 2021 Findings
[πΎ Github] Β Β [π Website] -
Differentiable Open-Ended Commonsense Reasoning
βοΈγBill Yuchen Lin, Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Xiang Ren, William W. Cohen
π’γin Proc. of NAACL 2021
[πΌοΈ Slides] Β Β [π¦ Video] Β Β [πΎ Github] Β Β [π Website] -
Pre-training Text-to-Text Transformers for Concept-Centric Common Sense
βοΈγWangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, Xiang Ren
π’γin Proc. of ICLR 2021 Β
[πΎ Github] -
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
βοΈγWenxuan Zhou, Bill Yuchen Lin, Xiang Ren
π’γin Proc. of AAAI 2021
2020
-
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
βοΈγBill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, Xiang Ren
π’γin Proc. of EMNLP 2020 Findings Β Β (presented at AKBC 2020 as a non-archival paper.)
[π Website]
Media coverage : The Register , Tech Xplore , Techzine , Radio.com , ScienceDaily , USC Viterbi -
Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models
βοΈγBill Yuchen Lin, Seyeon Lee, Rahul Khanna, Xiang Ren
π’γin Proc. of EMNLP 2020 (short)
[π Website] -
Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering
βοΈγYanlin Feng*, Xinyue Chen*, Bill Yuchen Lin, Peifeng Wang, Jun Yan, Xiang Ren
π’γin Proc. of EMNLP 2020
[πΎ Github] - FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents
βοΈγBill Yuchen Lin, Ying Sheng, Nguyen Vo and Sandeep Tata
π’γin Proc. of KDD 2020 (Research Track)
[πΌοΈ Slides] [π¦ Video] -
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition
βοΈγBill Yuchen Lin*, Dongho Lee*, Ming Shen, Ryan Moreno, Xiao Huang, Prashant Shiralkar, Xiang Ren
π’γin Proc. of ACL 2020 (short)
[πΌοΈ Slides] Β Β [π¦ Video] Β Β [πΎ Github] Β Β [π Website] -
Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling.
βοΈγOuyu Lan, Xiao Huang, Bill Yuchen Lin, He Jiang, Liyuan Liu, Xiang Ren
π’γin Proc. of ACL 2020
[πΎ Github] -
LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation
βοΈγDong-Ho Lee, Rahul Khanna, Bill Yuchen Lin, Jamin Chen, Seyeon Lee, Qinyuan Ye, Elizabeth Boschee, Leonardo Neves, Xiang Ren
π’γin Proc. of ACL 2020 (Demo Track)
[π Website] - NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction.
βοΈγWenxuan Zhou, Hongtao Lin, Bill Yuchen Lin, Ziqi Wang, Junyi Du, Leonardo Neves, Xiang Ren
π’γin Proc. of TheWebConf (WWW) 2020
Best Paper Runner-up (2/1500+) Β Β [πΎ Github]
2019
- KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning.
βοΈγBill Yuchen Lin, Xinyue Chen, Jamin Chen, Xiang Ren
π’γin Proc. of EMNLP-IJCNLP 2019
[πΎ Github] - AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging.
βοΈγBill Yuchen Lin*, Dongho Lee*, Frank F. Xu, Ouyu Lan, Xiang Ren
π’γin Proc. of ACL 2019 (Demo Track)
[π Website]
2018
- Neural Adaptation Layers for Cross-domain Named Entity Recognition.
βοΈγBill Yuchen Lin, Wei Lu
π’γin Proc. of EMNLP 2018
[πΎ Github] - ExtRA: Extracting Prominent Review Aspects from Customer Feedback.
βοΈγZhiyi Luo, Shanshan Huang, Frank F. Xu, Bill Yuchen Lin, Hanyuan Shi, Kenny Q. Zhu
π’γin Proc. of EMNLP 2018
[πΎ Github] - Mining Cross-Cultural Differences and Similarities in Social Media.
βοΈγBill Yuchen Lin*, Frank F. Xu*, Kenny Q. Zhu, Seung-won Hwang
π’γin Proc. of ACL 2018
[πΎ Github] - Automatic Extraction of Commonsense LocatedNear Knowledge.
βοΈγFrank F. Xu*, Bill Yuchen Lin*, Kenny Q. Zhu
π’γin Proc. of ACL 2018 (short)
[πΎ Github]
2017
- Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media.
βοΈγBill Y. Lin*, Frank F. Xu*, Zhiyi Luo, Kenny Q. Zhu
π’γin Proc. of EMNLP 2017, Workshop on Noisy User-generated Text
[πΎ Github]