To ru the model o GPU, you eed to istall Flash Attetio.
You may either istall from pypi (which may ot work with fused-dese), or from source.
To istall from source, cloe the GitHub repository: The code provided here should work with commit This will compile the flash-attetio kerel, which will take some time. If you would like to use fused MLPs (e.g. to use activatio checkpoitig),
you may istall fused-dese also from source: The cofig adds some ew parameters:BERT with Flash-Attetio
Istallig depedecies
git cloe git@github.com:Dao-AILab/flash-attetio.git
43950dd
.
Chage to the cloed repo ad istall:cd flash-attetio && pytho setup.py istall
cd csrc/fused_dese_lib && pytho setup.py istall
Cofiguratio
use_flash_att
: If True
, always use flash attetio. If Noe
, use flash attetio whe GPU is available. If False
, ever use flash attetio (works o CPU).widow_size
: Size (left ad right) of the local attetio widow. If (-1, -1)
, use global attetiodese_seq_output
: If true, we oly eed to pass the hidde states for the masked out toke (aroud 15%) to the classifier heads. I set this to true for pretraiig.fused_mlp
: Whether to use fused-dese. Useful to reduce VRAM i combiatio with activatio checkpoitigmlp_checkpoit_lvl
: Oe of {0, 1, 2}
. Icreasig this icreases the amout of activatio checkpoitig withi the MLP. Keep this at 0 for pretraiig ad use gradiet accumulatio istead. For embeddig traiig, icrease this as much as eeded.last_layer_subset
: If true, we oly eed the compute the last layer for a subset of tokes. I left this to false.use_qk_orm
: Whether or ot to use QK-ormalizatioum_loras
: Number of LoRAs to use whe iitializig a BertLoRA
model. Has o effect o other models.
点击空白处退出提示
评论