PolyLM多语言文本生成模型(PolyLM-文本生成模型-多语言-13B-演示)
模型简介
PolyLM是一个通晓多种语言的大规模语言模型,涵盖中文、英文、西班牙语、法语、德语、俄语、葡萄牙语、意大利语、阿拉伯语、日语、韩语、泰语、越南语和印尼语等18个语言。该模型可以应用于对话问答、文本生成、机器翻译和情感分析等领域,能够自动生成高质量的多语言文本,从而为跨语言、文化的交流提供便利。
Abstract in English
Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English.
Our models, alone with the multilingual instruction data, are available at Github and Huggingface.
模型版本
本项目提供了一系列不同规模和用途的模型,参数规模包括1.7B/13B版本(当前模型为13B版本),同时涵盖了预训练底座模型以及指令精调后的Chat版本(即MultiAlpaca系列)。全部版本如下表所示:
Model | Precision | Layers | Heads | Hidden | Max_length | LR | Batch | Type |
---|---|---|---|---|---|---|---|---|
PolyLM-1.7B | bfloat16 | 24 | 16 | 2048 | 2048 | 1.0e-4 | 4M | Pretrain Model |
PolyLM-13B | bfloat16 | 40 | 40 | 5120 | 2048 | 6.0e-5 | 4M | Pretrain Model |
PolyLM-MultiAlpaca-13B | bfloat16 | 40 | 40 | 5120 | 2048 | 6.0e-5 | 4M | Chat Model |
PolyLM-Assistant-13B | bfloat16 | 40 | 40 | 5120 | 2048 | 6.0e-5 | 4M | Chat Model |
实验结果
模型下载
git lfs install
git clone https://www.modelscope.cn/damo/nlp_polylm_13b_text_generation.git
模型使用
# git clone https://github.com/modelscope/modelscope
# cd modelscope
# pip install .
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope import snapshot_download
polylm_13b_model_id = 'damo/nlp_polylm_13b_text_generation'
revision = 'v1.0.3'
model_dir = snapshot_download(polylm_13b_model_id, revision)
input_text = f"Beijing is the capital of China.\nTranslate this sentence from English to Chinese."
kwargs = {"do_sample": False, "num_beams": 4, "max_new_tokens": 128, "early_stopping": True, "eos_token_id": 2}
pipeline_ins = pipeline(Tasks.text_generation, model=model_dir)
result = pipeline_ins(input_text, **kwargs)
print(result['text'])
微调(SFT)
代码链接: https://github.com/modelscope/swift/tree/main/examples/pytorch/llm
- 支持的sft方法: lora, qlora, 全参数微调, …
- 支持的模型: qwen系列, qwen-vl系列, baichuan系列, chatglm2系列, llama系列, openbuddy-llama系列, internlm系列, xverse系列, …
- 支持的特性: 模型量化, DDP, 模型并行, gradient checkpointing, 梯度累加, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, …
使用qlora+ddp+deepspeed SFT polylm-13b的脚本 (需要2*13GB显存)
# https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/polylm_13b/qlora_ddp_ds/sft.sh
# Experimental environment: 2 * A10
# 2 * 13GB GPU memory
nproc_per_node=2
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0,1 \
torchrun \
--nproc_per_node=$nproc_per_node \
--master_port 29500 \
src/llm_sft.py \
--model_type polylm-13b \
--sft_type lora \
--template_type default-generation \
--dtype bf16 \
--output_dir output \
--ddp_backend nccl \
--dtype bf16 \
--dataset advertise-gen-zh \
--train_dataset_sample 20000 \
--num_train_epochs 1 \
--max_length 2048 \
--quantization_bit 4 \
--bnb_4bit_comp_dtype bf16 \
--lora_rank 8 \
--lora_alpha 32 \
--lora_dropout_p 0. \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--weight_decay 0. \
--learning_rate 1e-4 \
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
--max_grad_norm 0.5 \
--warmup_ratio 0.03 \
--eval_steps 100 \
--save_steps 100 \
--save_total_limit 2 \
--logging_steps 10 \
--push_to_hub false \
--hub_model_id polylm-13b-qlora \
--hub_private_repo true \
--hub_token 'your-sdk-token' \
--deepspeed_config_path 'ds_config/zero2.json' \
--only_save_model true \
论文引用
如果你觉得这个该模型对有所帮助,请考虑引用下面的相关的论文:
@misc{wei2023polylm,
title={PolyLM: An Open Source Polyglot Large Language Model},
author={Xiangpeng Wei and Haoran Wei and Huan Lin and Tianhao Li and Pei Zhang and Xingzhang Ren and Mei Li and Yu Wan and Zhiwei Cao and Binbin Xie and Tianxiang Hu and Shangjie Li and Binyuan Hui and Bowen Yu and Dayiheng Liu and Baosong Yang and Fei Huang and Jun Xie},
year={2023},
eprint={2307.06018},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
评论