Computer Science > Computation and Language

arXiv:2401.07013 (cs)

[Submitted on 13 Jan 2024 (v1), last revised 9 Nov 2024 (this version, v2)]

Title:Knowledge Distillation of Black-Box Large Language Models

Authors:Hongzhan Chen, Ruijun Chen, Yuqi Yi, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang

Abstract:Given the exceptional performance of proprietary large language models (LLMs) like GPT-4, recent research has increasingly focused on boosting the capabilities of smaller models through knowledge distillation (KD) from these powerful yet black-box teachers. While leveraging the high-quality outputs of these teachers is advantageous, the inaccessibility of their internal states often limits effective knowledge transfer. To overcome this limitation, we introduce Proxy-KD, a novel method that uses a proxy model to facilitate the efficient transfer of knowledge from black-box LLMs to smaller models. Our experiments show that Proxy-KD not only enhances the performance of KD from black-box teacher models but also surpasses traditional white-box KD techniques.~This approach presents a compelling new avenue for distilling knowledge from advanced LLMs.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2401.07013 [cs.CL]
(or arXiv:2401.07013v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2401.07013

Submission history

From: Hongzhan Chen [view email]
[v1] Sat, 13 Jan 2024 08:43:32 UTC (359 KB)
[v2] Sat, 9 Nov 2024 01:35:32 UTC (8,288 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-01

Change to browse by:

References & Citations

Bookmark

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Source: hackernews

Knowledge Distillation of Black-Box Large Language Models (2024)

Computer Science > Computation and Language

Title:Knowledge Distillation of Black-Box Large Language Models

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Knowledge Distillation of Black-Box Large Language Models (2024)

Computer Science > Computation and Language

Title:Knowledge Distillation of Black-Box Large Language Models

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators