快速使用/Quickstart
https://colab.research.google.com/drive/1an4RTFb7_1ZT6EeDch1jEV5dGr0n3NoC?usp=sharing
运行这个 Notebook, 可以创建 CharPoet 的用户界面,亲自尝试我们的系统!
With this Jupyter Notebook, you can launch the user interface of CharPoet, and try our system!
模型介绍/Introduction
CharPoet,基于 token-free LLM 的中文古典诗词创作系统,在生成诗词的格式和内容上都有优异表现。
- 格式:模型基于 token-free 架构,采用逐字生成的方式,可以实现字符生成数量的精准控制。
- 内容:模型继承了 token-based LLMs 的复杂指令理解能力,可以理解用户的复杂指令输入,并生成高质量诗词,例如"介绍一下纽约"、"为我母亲的生日写一首诗"。
CharPoet, a Chinese classical poetry generation system based on token-free LLM, which provides effective control in both format and content.
- Format: Our token-free architecture generates in a character-by-character manner, which enables precise control over number of characters.
- Content: Our system inherits instruction-following abilities from token-based LLMs, which can generate poetry following complex instructions such as "Introduce New York city.", "Write me a poem for my mother's birthday.".
模型描述/Description
CharPoet的核心是 token-free LLM。我们设计了一种方法,能够将 token-based LLM 修剪成 token-free 模型。我们的模型通过修剪现有的 token-based LLM,继承了它们预训练过程中掌握的知识和能力。我们的方法改变了标准 token-based LLM 的输入模块和输出模块,保持 Transformer 模块不变。如果想了解更多关于 CharPoet 模型的信息,请点击链接查看我们的论文。
The core of CharPoet is a token-free LLM. We have designed a procedure which can prune a typical token-based LLM into a token-free model. Our model is pruned from existing token-based LLMs to inherit their pretrained knowledge and ability. Our pruning procedure alters the Input component and the Output component of the standard token-based LLM, and leave the Transformer component unchanged. For more details about CharPoet, please refer to our paper.
数据评估及结果/Evaluation and Results
我们从格式、内容两个角度评测了 CharPoet 模型的能力。我们在两种用户输入设定下进行评测:
- 关键词设定,用户输入为一个关键词
- 指令设定,用户输入是自然语言指令,例如“为我妈妈的生日写一首诗。”
We evaluate performance based on two aspects: format accuracy and content quality. We conduct tests under two user input settings:
- the conventional keyword setting, where the user input consists of a single keyword
- the instruction setting, where the user input is a natural language instruction, such as "Write me a poem for my mother's birthday."
格式准确率/Format Accuracy
Format Type | #Chars | GPT-4 | Jiuge-GPT-2 | Qwen (Finetuned) | CharPoet (Ours) | ||||
---|---|---|---|---|---|---|---|---|---|
keyword | instruction | keyword | instruction | keyword | instruction | keyword | instruction | ||
WuyanJueju (SHI) | 20 | 0.49 | 0.73 | 1.00 | - | 0.94 | 1.00 | 0.98 | 0.99 |
WuyanLvshi (SHI) | 40 | 0.29 | 0.36 | 1.00 | - | 0.97 | 0.98 | 0.97 | 0.99 |
QiyanJueju (SHI) | 28 | 0.88 | 0.78 | 1.00 | - | 0.99 | 1.00 | 1.00 | 1.00 |
QiyanLvshi (SHI) | 56 | 0.81 | 0.68 | 1.00 | - | 0.98 | 0.96 | 0.97 | 0.98 |
Rumengling (CI) | 33 | 0.13 | 0.09 | 0.90 | - | 0.95 | 0.97 | 1.00 | 0.99 |
Jianzimulanhua (CI) | 44 | 0.81 | 0.79 | 0.96 | - | 0.99 | 0.97 | 1.00 | 0.99 |
Qingpingyue (CI) | 46 | 0.13 | 0.18 | 0.96 | - | 0.98 | 0.97 | 0.95 | 0.99 |
Dielianhua (CI) | 60 | 0.21 | 0.12 | 0.91 | - | 0.94 | 0.98 | 0.99 | 0.98 |
Manjianghong (CI) | 93 | 0.07 | 0.04 | 0.83 | - | 0.88 | 0.90 | 0.95 | 0.95 |
Qinyuanchun (CI) | 114 | 0.00 | 0.01 | 0.55 | - | 0.64 | 0.75 | 0.82 | 0.86 |
Avg | 53.4 | 0.382 | 0.378 | 0.911 | - | 0.926 | 0.948 | 0.963 | 0.972 |
只有每行的字符数量完全匹配时,我们才认为这首诗的格式是正确的。在格式准确性方面,CharPoet 的表现优于所有竞争模型。不经任何后处理,CharPoet 在两种设定下的整体准确率均超过 0.96,超过 Jiuge-GPT-2 (0.91) 和 GPT-4 (0.38)。我们对 CharPoet 与 Qwen (Finetuned) 进行的消融实验证实了 token-free 架构的有效性,在格式准确率方面带来了 3% 的提升。
A poem is counted as accurate only if the number of characters for every line is correct (perfect match). CharPoet performs better in format accuracy than all competing models. Without any postprocessing, Charpoet achieves an overall accuracy above 0.96 under both settings, outperforming Jiuge-GPT-2 (0.91) and GPT-4 (0.38). Our ablation study comparing CharPoet with Qwen (Finetuned) confirms that the token-free architecture is effective, bringing a 3% gain in format accuracy.
内容质量/Content Quality
我们使用五个标准来评估内容质量;每个标准在一个五分制的评分尺度上进行。在内容质量方面,CharPoet 大幅超越了包括 Jiuge 在内的传统诗词生成模型 (尤其在“相关性”维度),且与其他大语言模型相当。内容质量的提升表明,相对于传统模型,拥有预训练知识的大语言模型在生成内容的控制上有显著的提升。
we evaluate content quality with five criteria; each criterion needs to be scored on a 5-point scale. In terms of content quality, CharPoet significantly surpasses traditional systems including Jiuge especially in terms of Relevance, and is comparable to other LLMs. The gain in content relevance indicates that pretrained LLMs can provide significantly better control over content compared to traditional models.
评论