LongAlign-7B-64k

我要开发同款
匿名用户2024年07月31日
13阅读
所属分类ai
开源地址https://modelscope.cn/models/ZhipuAI/LongAlign-7B-64k
授权协议Apache License 2.0

作品详情

LongAlign-7B-64k

? [LongAlign Dataset] • ? [Github Repo] • ? [LongAlign Paper]

LongAlign is the first full recipe for LLM alignment on long context. We propose the LongAlign-10k dataset, containing 10,000 long instruction data of 8k-64k in length. We investigate on trianing strategies, namely packing (with loss weighting) and sorted batching, which are all implemented in our code. For real-world long context evaluation, we introduce LongBench-Chat that evaluate the instruction-following capability on queries of 10k-100k length.

All Models

We open-sourced the following list of models:

Model Huggingface Repo Description
LongAlign-6B-64k-base ? Huggingface Repo ChatGLM3-6B with an extended 64k context window
LongAlign-6B-64k ? Huggingface Repo Chat model by LongAlign training on LongAlign-6B-64k-base
LongAlign-7B-64k-base ? Huggingface Repo Llama-2-7B with an extended 64k context window
LongAlign-7B-64k ? Huggingface Repo Chat model by LongAlign training on LongAlign-7B-64k-base
LongAlign-13B-64k-base ? Huggingface Repo Llama-2-13B with an extended 64k context window
LongAlign-13B-64k ? Huggingface Repo Chat model by LongAlign training on LongAlign-13B-64k-base
ChatGLM3-6B-128k ? Huggingface Repo ChatGLM3-6B with a 128k context window

Model usage

Chat prompt template for LongAlign-6B-64k:

[Round 1]

问:Hi!

答:Hello! What can I assist you today?

[Round 2]

问:What should I do if I can't sleep at night?

答:

Chat prompt template for LongAlign-7B-64k and LongAlign-13B-64k:

[INST]Hi![/INST]Hello! What can I assist you today?

[INST]What should I do if I can't sleep at night?[/INST]

ChatGLM3-6B-128k uses the same prompt template as ChatGLM3-6B.

A simple demo for deployment of the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("THUDM/LongAlign-6B-64k", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("THUDM/LongAlign-6B-64k", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
model = model.eval()
query = open("assets/paper.txt").read() + "\n\nPlease summarize the paper."
response, history = model.chat(tokenizer, query, history=[], max_new_tokens=512, temperature=1)
print(response)

Citation

If you find our work useful, please consider citing LongAlign:


声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论