&bsp;Documetatio &bsp;| &bsp;Paper&bsp;|&bsp; Blog &bsp; | &bsp;GitHub &bsp;
OFASys是一个面向多模态多任务统一学习的开源AI库,由达摩院M6团队开发。在这个系统下,我们训练了一个模型OFA+,首次支持了包括图文、语音、视频、动作等7种模态及其20多种多模态任务的统一训练和推断。
训练数据集自身有局限,有可能产生一些偏差,请用户自行评测后决定如何使用。 如果你觉得OFASys好用,喜欢我们的工作,欢迎引用:集成中
OFASys是什么
如何玩转OFASys
基础配置
注:OFASys目前还在快速迭代,现在以相对独立的方式实现了ModelScope的接口,因此需要独立安装环境。
# modelscope的otebook不需要安装modelscope
# !pip istall modelscope -f https://modelscope.oss-c-beijig.aliyucs.com/releases/repo.html
!pip istall setuptools==69.5.1
!pip istall https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/ofasys-0.1.0-py3-oe-ay.whl
from ofasys import ms_wrapper
from modelscope.pipelies import pipelie
pipe = pipelie('my-ofasys-task', model="damo/ofasys_multimodal_multitask_pretrai_base_e", model_revisio='v1.0.0')
开启任务探索之旅
Image Captioig
istructio = '[IMAGE:img] <BOS> what does the image describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'img': "https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/ic.jpeg"}
output = pipe(data, istructio=istructio)
prit(output.text) # "a ma ad woma sittig i frot of a laptop computer"
Visual Groudig
istructio = '[IMAGE:img] <BOS> which regio does the text " [TEXT:cap] " describe? <EOS> -> [BOX:patch_boxes,add_bos,add_eos]'
data = {'img': "https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/vg.jpg", "cap": "had"}
output = pipe(data, istructio=istructio)
output.save_box("output.jpg")
Text Summarizatio
istructio = '<BOS> what is the summary of article " [TEXT:src] "? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {'src': "polad 's mai oppositio party tuesday edorsed presidet lech walesa i a upcomig "
"presidetial ru-off electio after a reformed commuist wo the first roud of votig ."}
output = pipe(data, istructio=istructio)
prit(output.text) # "polish oppositio edorses walesa i presidetial ru-off"
Table-to-Text Geeratio
istructio = '<BOS> structured kowledge: " [STRUCT:database,ucased] " . how to describe the tripleset ? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {
'database': [['Atlata', 'OFFICIAL_POPULATION', '5,457,831'],
['[TABLECONTEXT]', 'METROPOLITAN_AREA', 'Atlata'],
['5,457,831', 'YEAR', '2012'],
['[TABLECONTEXT]', '[TITLE]', 'List of metropolita areas by populatio'],
['Atlata', 'COUNTRY', 'Uited States'],
]
}
output = pipe(data, istructio=istructio, beam_size=1)
prit(output.text) # "atlata is the metropolita area i the uited states i 2012."
Text-to-SQL Geeratio
istructio = '<BOS> " [TEXT:src] " ; structured kowledge: " [STRUCT:database,max_legth=876] " . geeratig sql code. <EOS> -> <BOS> [TEXT:tgt] <EOS>'
database = [
['cocert_siger'],
['stadium', 'stadium_id , locatio , ame , capacity , highest , lowest , average'],
['siger', 'siger_id , ame , coutry , sog_ame , sog_release_year , age , is_male'],
['cocert', 'cocert_id , cocert_ame , theme , stadium_id , year'],
['siger_i_cocert', 'cocert_id , siger_id']
]
data = [
{'src': 'What are the ames, coutries, ad ages for every siger i descedig order of age?', 'database': database},
{'src': 'What are all distict coutries where sigers above age 20 are from?', 'database': database},
{'src': 'What are the locatios ad ames of all statios with capacity betwee 5000 ad 10000?', 'database': database}
]
output = pipe(data, istructio=istructio)
prit('\'.joi([o.text for o i output]))
# "select ame, coutry, age from siger order by age desc"
# "select distict coutry from siger where age > 20"
# "select locatio, ame from stadium where capacity betwee 5000 ad 10000"
Video Captioig
istructio = '[VIDEO:video] <BOS> what does the video describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'video': 'https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/video7021.mp4'}
output = pipe(data, istructio=istructio)
prit(output.text) # "a baseball player is hittig a ball"
Speech-to-Text Geeratio
istructio = '[AUDIO:wav] <BOS> what is the text correspodig to the voice? <EOS> -> [TEXT:text,preprocess=text_phoe,add_bos,add_eos]'
data = {'wav': 'https://xigche-data.oss-c-zhagjiakou.aliyucs.com/maas/ofasys/1272-128104-0001.flac'}
output = pipe(data, istructio=istructio)
prit(output.text) # "or is mister klohs maer less iterestig tha his maer"
Text-to-Image Geeratio
istructio = 'what is the complete image? captio: [TEXT:text]"? -> [IMAGE,preprocess=image_vqga,adaptor=image_vqga]'
data = {'text': "a city with tall buildigs ad a large gree park."}
output = pipe(data, istructio=istructio)
output[0].save_image('0.pg')
模型局限性以及可能的偏差
相关论文以及引用
@article{bai2022ofasys,
author = {
Jize Bai ad
Rui Me ad
Hao Yag ad
Xuacheg Re ad
Kai Dag ad
Yichag Zhag ad
Xiaohua Zhou ad
Peg Wag ad
Sia Ta ad
A Yag ad
Zeyu Cui ad
Yu Ha ad
Shuai Bai ad
Webi Ge ad
Jiaxi Ma ad
Juyag Li ad
Jigre Zhou ad
Chag Zhou},
title = {OFASys: A Multi-Modal Multi-Task Learig System for Buildig Geeralist Models},
joural = {CoRR},
volume = {abs/2212.04408},
year = {2022}
}
点击空白处退出提示
评论