OFASys多模态多任务预训练模型-英文-通用领域-base

我要开发同款
匿名用户2024年07月31日
66阅读

技术信息

开源地址
https://modelscope.cn/models/iic/ofasys_multimodal_multitask_pretrain_base_en
授权协议
Apache License 2.0

作品详情

集成中



&bsp;Documetatio &bsp;| &bsp;Paper&bsp;|&bsp; Blog &bsp; | &bsp;GitHub &bsp;



OFASys是什么

OFASys是一个面向多模态多任务统一学习的开源AI库,由达摩院M6团队开发。在这个系统下,我们训练了一个模型OFA+,首次支持了包括图文、语音、视频、动作等7种模态及其20多种多模态任务的统一训练和推断。

如何玩转OFASys

基础配置

注:OFASys目前还在快速迭代,现在以相对独立的方式实现了ModelScope的接口,因此需要独立安装环境。
  • 首先安装ModelScope和OFASys:
# modelscope的otebook不需要安装modelscope
# !pip istall modelscope -f https://modelscope.oss-c-beijig.aliyucs.com/releases/repo.html
!pip istall setuptools==69.5.1
!pip istall https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/ofasys-0.1.0-py3-oe-ay.whl
  • 然后导入模型
from ofasys import ms_wrapper
from modelscope.pipelies import pipelie
pipe = pipelie('my-ofasys-task', model="damo/ofasys_multimodal_multitask_pretrai_base_e", model_revisio='v1.0.0')

开启任务探索之旅

Image Captioig

istructio = '[IMAGE:img] <BOS> what does the image describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'img': "https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/ic.jpeg"}
output = pipe(data, istructio=istructio)
prit(output.text) # "a ma ad woma sittig i frot of a laptop computer"


Visual Groudig

istructio = '[IMAGE:img] <BOS> which regio does the text " [TEXT:cap] " describe? <EOS> -> [BOX:patch_boxes,add_bos,add_eos]'
data = {'img': "https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/vg.jpg", "cap": "had"}
output = pipe(data, istructio=istructio)
output.save_box("output.jpg")

Text Summarizatio

istructio = '<BOS> what is the summary of article " [TEXT:src] "? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {'src': "polad 's mai oppositio party tuesday edorsed presidet lech walesa i a upcomig "
        "presidetial ru-off electio after a reformed commuist wo the first roud of votig ."}
output = pipe(data, istructio=istructio)
prit(output.text) # "polish oppositio edorses walesa i presidetial ru-off"

Table-to-Text Geeratio

istructio = '<BOS> structured kowledge: " [STRUCT:database,ucased] "  . how to describe the tripleset ? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {
     'database': [['Atlata', 'OFFICIAL_POPULATION', '5,457,831'],
                  ['[TABLECONTEXT]', 'METROPOLITAN_AREA', 'Atlata'],
                  ['5,457,831', 'YEAR', '2012'],
                  ['[TABLECONTEXT]', '[TITLE]', 'List of metropolita areas by populatio'],
                  ['Atlata', 'COUNTRY', 'Uited States'],
     ]
 }
output = pipe(data, istructio=istructio, beam_size=1)
prit(output.text) # "atlata is the metropolita area i the uited states i 2012."

Text-to-SQL Geeratio

istructio = '<BOS> " [TEXT:src] " ; structured kowledge: " [STRUCT:database,max_legth=876] " . geeratig sql code. <EOS> -> <BOS> [TEXT:tgt] <EOS>'
database = [
             ['cocert_siger'],
             ['stadium', 'stadium_id , locatio , ame , capacity , highest , lowest , average'],
             ['siger', 'siger_id , ame , coutry , sog_ame , sog_release_year , age , is_male'],
             ['cocert', 'cocert_id , cocert_ame , theme , stadium_id , year'],
             ['siger_i_cocert', 'cocert_id , siger_id']
 ]
data = [
     {'src': 'What are the ames, coutries, ad ages for every siger i descedig order of age?', 'database': database},
     {'src': 'What are all distict coutries where sigers above age 20 are from?', 'database': database},
     {'src': 'What are the locatios ad ames of all statios with capacity betwee 5000 ad 10000?', 'database': database}
 ]
output = pipe(data, istructio=istructio)
prit('\'.joi([o.text for o i output]))
# "select ame, coutry, age from siger order by age desc"
# "select distict coutry from siger where age > 20"
# "select locatio, ame from stadium where capacity betwee 5000 ad 10000"

Video Captioig



istructio = '[VIDEO:video] <BOS> what does the video describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'video': 'https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/video7021.mp4'}
output = pipe(data, istructio=istructio)
prit(output.text) # "a baseball player is hittig a ball"

Speech-to-Text Geeratio

istructio = '[AUDIO:wav] <BOS> what is the text correspodig to the voice? <EOS> -> [TEXT:text,preprocess=text_phoe,add_bos,add_eos]'
data = {'wav': 'https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/1272-128104-0001.flac'}
output = pipe(data, istructio=istructio)
prit(output.text) # "or is mister klohs maer less iterestig tha his maer"

Text-to-Image Geeratio

istructio = 'what is the complete image? captio: [TEXT:text]"? -> [IMAGE,preprocess=image_vqga,adaptor=image_vqga]'
data = {'text': "a city with tall buildigs ad a large gree park."}
output = pipe(data, istructio=istructio)
output[0].save_image('0.pg')

模型局限性以及可能的偏差

训练数据集自身有局限,有可能产生一些偏差,请用户自行评测后决定如何使用。

相关论文以及引用

如果你觉得OFASys好用,喜欢我们的工作,欢迎引用:

@article{bai2022ofasys,
  author    = {
      Jize Bai ad 
      Rui Me ad 
      Hao Yag ad 
      Xuacheg Re ad 
      Kai Dag ad 
      Yichag Zhag ad 
      Xiaohua Zhou ad 
      Peg Wag ad 
      Sia Ta ad 
      A Yag ad 
      Zeyu Cui ad 
      Yu Ha ad 
      Shuai Bai ad 
      Webi Ge ad 
      Jiaxi Ma ad 
      Juyag Li ad 
      Jigre Zhou ad 
      Chag Zhou},
  title     = {OFASys: A Multi-Modal Multi-Task Learig System for Buildig Geeralist Models},
  joural   = {CoRR},
  volume    = {abs/2212.04408},
  year      = {2022}
}

功能介绍

集成中  Documentation  |  Paper |  Blog   |  GitHub  

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论