jina-bert-flash-implementation

我要开发同款
匿名用户2024年07月31日
32阅读
所属分类ai、bert、pytorch
开源地址https://modelscope.cn/models/jinaai/jina-bert-flash-implementation

作品详情

BERT with Flash-Attention

Installing dependencies

To run the model on GPU, you need to install Flash Attention. You may either install from pypi (which may not work with fused-dense), or from source. To install from source, clone the GitHub repository:

git clone git@github.com:Dao-AILab/flash-attention.git

The code provided here should work with commit 43950dd. Change to the cloned repo and install:

cd flash-attention && python setup.py install

This will compile the flash-attention kernel, which will take some time.

If you would like to use fused MLPs (e.g. to use activation checkpointing), you may install fused-dense also from source:

cd csrc/fused_dense_lib && python setup.py install

Configuration

The config adds some new parameters:

  • use_flash_attn: If True, always use flash attention. If None, use flash attention when GPU is available. If False, never use flash attention (works on CPU).
  • window_size: Size (left and right) of the local attention window. If (-1, -1), use global attention
  • dense_seq_output: If true, we only need to pass the hidden states for the masked out token (around 15%) to the classifier heads. I set this to true for pretraining.
  • fused_mlp: Whether to use fused-dense. Useful to reduce VRAM in combination with activation checkpointing
  • mlp_checkpoint_lvl: One of {0, 1, 2}. Increasing this increases the amount of activation checkpointing within the MLP. Keep this at 0 for pretraining and use gradient accumulation instead. For embedding training, increase this as much as needed.
  • last_layer_subset: If true, we only need the compute the last layer for a subset of tokens. I left this to false.
  • use_qk_norm: Whether or not to use QK-normalization
  • num_loras: Number of LoRAs to use when initializing a BertLoRA model. Has no effect on other models.
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论