24-09-11 Release ZeroEval a leaderboard of LLMs for reasoning. [X Post] |
24-08-27 Release WildVision datasets: WV-Chat, WV-Battle, and WV-Bench. [X Post] |
24-08-19 Will serve as an Area Chair for ICLR 2025. |
24-06-29 Will serve as a Senior Area Chair for ACL 2025. |
24-05-08 Three ACL 2024 Main Conference papers: Agent Lumos, ETO, and SafeDecoding! |
24-05-01 Will serve as an Area Chair for EMNLP 2024. |
24-03-08 Introducing AI2 π¦ WildBench! A dynamic LLM benchmark for challenging tasks from real users. [Leaderboard] | [Tweet] |
24-03-06 2 new preprints: π ETO (Continual DPO for Agent Training) and π» OpenCI (open code interpreter). |
24-02-16 2 new preprints: 𧩠L3GO (with AI2 intern Yutaro Yamada from Yale); π‘οΈ SafeDecoding (led by Zhangchen Xu at UW). |
24-02-09 Check out our Vision Arena demo on HuggingFace! You can test many Vision LMs side by side here! |
24-01-30 Invited talk at UT Austin (Host: Prof. Jessy Li at LIN 393). |
24-01-16 Accepted by ICLR'24: πͺ The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning. |
23-12-01 We release the PairRM-0.4B that is based on LLM-Blender. It achieves great performance on the AlpacaEval Leaderboard: [picture] [tweet 1] [tweet 2]. Kudos to Dongfu Jiang's great work! |
23-11-15 New preprint: πͺ Lumos Agent (with AI2 intern Da Yin from UCLA) |
23-11-01 New preprint: π² Personalized RLHF (with AI2 intern Joel Jang from UW). |
23-10-15 New preprints: π― TIGER-Score (reference-free NLG evaluation) and π Suspicion-Agent (playing imperfect-information games). |
23-09-21 Our SwiftSage and FnF papers got in NeurIPS 2023 as spotlights! π π |
23-07-29 Check out our new work (with Chengsong and Qian): LoraHub for efficient cross-task generalization. |
23-07-09 Co-presented an tutorial at ACL 2023 on Complex Reasoning in Natural Language. |
23-06-18 Will serve as an Area Chair at EMNLP 2023. |
23-01-01 Will serve as an Area Chair for ACL 2023! |