MuPT-v0-4096-550M

我要开发同款
匿名用户2024年07月31日
18阅读
所属分类ai、llama、pytorch、art、music
开源地址https://modelscope.cn/models/m-a-p/MuPT-v0-4096-550M
授权协议apache-2.0

作品详情

SMuPT: Symbolic Music Generative Pre-trained Transformer

SMuPT is a series of pre-trained models for symbolic music generation. It was trained on a large-scale dataset of symbolic music, including millions of monophonic and polyphonic pieces from different genres and styles. The models are trained with the LLama2 architecture, and can be further used for downstream music generation tasks such as melody generation, accompaniment generation, and multi-track music generation.

  • 09/01/2024: a series of pre-trained SMuPT models are released, with parameters ranging from 110M to 1.3B.

Model architecture

The details of model architecture of SMuPT-v0 are listed below:

Name Parameters Training Data(Music Pieces) Seq Length Hidden Size Layers Heads
SMuPT-v0-8192-110M 110M 7M x 5.8 epochs 8192 768 12 12
SMuPT-v0-8192-345M 345M 7M x 4 epochs 8192 1024 24 16
SMuPT-v0-8192-770M 770M 7M x 3 epochs 8192 1280 36 20
SMuPT-v0-8192-1.3B 1.3B 7M x 2.2 epochs 8192 1536 48 24

Model Usage

There are several ways to use our pre-trained SMuPT models, we now the usage based on Megatron-LM. Huggingface format will be supported soon.

Before starting, make sure you have setup the relevant environment and codebase.

# pull Megatron-LM codebase
mkdir -p /path/to/workspace && cd /path/to/workspace
git clone https://github.com/NVIDIA/Megatron-LM.git

# download the pre-trained SMuPT models checkpoint and vocab files from Huggingface page
mkdir -p /models/SMuPT_v0_8192_1.3B && cd /models/SMuPT_v0_8192_1.3B
wget -O model_optim_rng.pt https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/model_optim_rng.pt?download=true
wget -O newline.vocab https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/newline.vocab?download=true
wget -O newline.txt https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/newline.txt?download=true

We recommend using the latest version of NGC's PyTorch container for SMuPT inference. See more details in Megatron-LM

# pull the latest NGC's PyTorch container, mount the workspace directory and enter the container
docker run --gpus all -it --name megatron --shm-size=16g -v $PWD:/workspace -p 5000:5000 nvcr.io/nvidia/pytorch:23.11-py3 /bin/bash

Once you enter the container, you can start a REST server for inference.

Click to expand the example script

#!/bin/bash
# This example will start serving the 1.3B model.
export CUDA_DEVICE_MAX_CONNECTIONS=1

DISTRIBUTED_ARGS="--nproc_per_node 1 \
                --nnodes 1 \
                --node_rank 0 \
                --master_addr localhost \
                --master_port 6000"

CHECKPOINT=/path/to/model/checkpoint/folder
VOCAB_FILE=/path/to/vocab/file
MERGE_FILE=/path/to/merge/file

MODEL_SIZE="1.3B"
if   [[ ${MODEL_SIZE} == "110M" ]];   then HIDDEN_SIZE=768;  NUM_HEAD=12; NUM_QUERY_GROUP=12; NUM_LAYERS=12; FFN_HIDDEN_SIZE=3072; NORM_EPS=1e-5;
elif [[ ${MODEL_SIZE} == "345M" ]];   then HIDDEN_SIZE=1024;  NUM_HEAD=16; NUM_QUERY_GROUP=16; NUM_LAYERS=24; FFN_HIDDEN_SIZE=4096; NORM_EPS=1e-5;
elif [[ ${MODEL_SIZE} == "770M" ]];   then HIDDEN_SIZE=1280;  NUM_HEAD=20; NUM_QUERY_GROUP=20; NUM_LAYERS=36; FFN_HIDDEN_SIZE=5120; NORM_EPS=1e-5;
elif [[ ${MODEL_SIZE} == "1.3B" ]];   then HIDDEN_SIZE=1536;  NUM_HEAD=24; NUM_QUERY_GROUP=24; NUM_LAYERS=48; FFN_HIDDEN_SIZE=6144; NORM_EPS=1e-5;
else echo "invalid MODEL_SIZE: ${MODEL_SIZE}"; exit 1
fi
MAX_SEQ_LEN=8192
MAX_POSITION_EMBEDDINGS=8192

pip install flask-restful

torchrun $DISTRIBUTED_ARGS tools/run_text_generation_server.py   \
    --tensor-model-parallel-size 1  \
    --pipeline-model-parallel-size 1  \
    --num-layers ${NUM_LAYERS}  \
    --hidden-size ${HIDDEN_SIZE}  \
    --ffn-hidden-size ${FFN_HIDDEN_SIZE} \
    --load ${CHECKPOINT}  \
    --group-query-attention \
    --num-query-groups ${NUM_QUERY_GROUP} \
    --position-embedding-type rope \
    --num-attention-heads ${NUM_HEAD}  \
    --max-position-embeddings ${MAX_POSITION_EMBEDDINGS}  \
    --tokenizer-type GPT2BPETokenizer  \
    --normalization RMSNorm \
    --norm-epsilon ${NORM_EPS} \
    --make-vocab-size-divisible-by 1 \
    --swiglu \
    --use-flash-attn \
    --bf16  \
    --micro-batch-size 1  \
    --disable-bias-linear \
    --no-bias-gelu-fusion \
    --untie-embeddings-and-output-weights \
    --seq-length ${MAX_SEQ_LEN}  \
    --vocab-file $VOCAB_FILE  \
    --merge-file $MERGE_FILE  \
    --attention-dropout 0.0 \
    --hidden-dropout 0.0 \
    --weight-decay 1e-1 \
    --clip-grad 1.0 \
    --adam-beta1 0.9 \
    --adam-beta2 0.95 \
    --adam-eps 1e-8 \
    --seed 42

Use CURL to query the server directly, note that the newline token \n is represented by <n> in the vocabulary, so we need to replace the newline token with <n> in both the prompt and the generated tokens.

curl 'http://localhost:6000/api' -X 'PUT' -H 'Content-Type: application/json; charset=UTF-8'  -d '{"prompts":["X:1<n>L:1/8<n>Q:1/8=200<n>M:4/4<n>K:Gmin<n>|:\"Gm\" BGdB"], "tokens_to_generate":4096}'

Processed Output:

X:1
L:1/8
Q:1/8=200
M:4/4<n>K:Gmin
|:\"Gm\" BGdB fdBG |\"F\" AFcF dFcF |\"Gm\" BGdG gFBF |\"F\" AFAG AF F2 |\"Gm\" BGBd fffd |\"F\" cdcB cdeg |
\"Gm\" fdcB\"Eb\" AFcA |1 BGFG\"F\" AFGc :|2 BGFG\"F\" AF F2 ||<eos>

Once you encode the generated tokens into audio, you will hear the following music.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论