To ru the model o GPU, you eed to istall Flash Attetio.
You may either istall from pypi (which may ot work with fused-dese), or from source.
To istall from source, cloe the GitHub repository: The code provided here should work with commit This will compile the flash-attetio kerel, which will take some time. If you would like to use fused MLPs (e.g. to use activatio checkpoitig),
you may istall fused-dese also from source: The cofig adds some ew parameters:BERT with Flash-Attetio
Istallig depedecies
git cloe git@github.com:Dao-AILab/flash-attetio.git
43950dd.
Chage to the cloed repo ad istall:cd flash-attetio && pytho setup.py istall
cd csrc/fused_dese_lib && pytho setup.py istall
Cofiguratio
use_flash_att: If True, always use flash attetio. If Noe, use flash attetio whe GPU is available. If False, ever use flash attetio (works o CPU).widow_size: Size (left ad right) of the local attetio widow. If (-1, -1), use global attetiodese_seq_output: If true, we oly eed to pass the hidde states for the masked out toke (aroud 15%) to the classifier heads. I set this to true for pretraiig.fused_mlp: Whether to use fused-dese. Useful to reduce VRAM i combiatio with activatio checkpoitigmlp_checkpoit_lvl: Oe of {0, 1, 2}. Icreasig this icreases the amout of activatio checkpoitig withi the MLP. Keep this at 0 for pretraiig ad use gradiet accumulatio istead. For embeddig traiig, icrease this as much as eeded.last_layer_subset: If true, we oly eed the compute the last layer for a subset of tokes. I left this to false.use_qk_orm: Whether or ot to use QK-ormalizatioum_loras: Number of LoRAs to use whe iitializig a BertLoRA model. Has o effect o other models.
点击空白处退出提示










评论