Introduction

  • We introduce LLM-Blender, an novel ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source large language models (LLMs). LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.
  • Our framework consists of two complementary modules: PairRanker and GenFuser, addressing the observation that optimal LLMs for different examples can significantly vary. PairRanker employs a specialized pairwise comparison method to distinguish subtle differences between candidate outputs. GenFuser aims to merge the top-ranked candidates from the aggregation of PairRanker's pairwise comparisons into an improved output by capitalizing on their strengths and mitigating their weaknesses.
  • To facilitate large-scale evaluation, we introduce a benchmark dataset, MixInstruct, which is a mixture of multiple instruction datasets featuring oracle pairwise comparisons for testing purposes. Our LLM-Blender significantly surpasses the best LLMs and baseline ensembling methods across various metrics on MixInstruct, establishing a substantial performance gap.


Background

[show more]

LLM-Blender Framework


[show more]

Evaluation


[show more]

Analysis


[show more]

Misc.

Citation

 
@inproceedings{llm-blender-2023,
  title = "LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion",
  author = "Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen",
  booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)",
  year = "2023"
}