plant-dnamamba-open_chromatin

我要开发同款
匿名用户2024年07月31日
14阅读
开发技术Mamba、Pytorch
所属分类ai、genomics、biology、DNA
开源地址https://modelscope.cn/models/zhangtaolab/plant-dnamamba-open_chromatin
授权协议CC-BY-NC-SA-4.0

作品详情

植物基础DNA大语言模型 (Plant foundation DNA large language models)

The plant DNA large language models (LLMs) contain a series of foundation models based on different model architectures, which are pre-trained on various plant reference genomes.
All the models have a comparable model size between 90 MB and 150 MB, BPE tokenizer is used for tokenization and 8000 tokens are included in the vocabulary.

开发者: zhangtaolab

Model Sources

  • Repository: Plant DNA LLMs
  • Manuscript: [Versatile applications of foundation DNA large language models in plant genomes]()

Architecture

The model is trained based on the State-Space Mamba-130m model with modified tokenizer specific for DNA sequence.

This model is fine-tuned for predicting open chromatin.

How to use

Install the runtime library first:

pip install transformers
pip install causal-conv1d<=1.2.0
pip install mamba-ssm<2.0.0

Since transformers library (version < 4.43.0) does not provide a MambaForSequenceClassification function, we wrote a script to train Mamba model for sequence classification.
An inference code can be found in our GitHub.
Note that Plant DNAMamba model requires NVIDIA GPU to run.

Training data

We use a custom MambaForSequenceClassification script to fine-tune the model.
Detailed training procedure can be found in our manuscript.

Hardware

Model was trained on a NVIDIA RTX4090 GPU (24 GB).

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论