集成中
Documentation | Paper | Blog | GitHub
OFASys是什么
OFASys是一个面向多模态多任务统一学习的开源AI库,由达摩院M6团队开发。在这个系统下,我们训练了一个模型OFA+,首次支持了包括图文、语音、视频、动作等7种模态及其20多种多模态任务的统一训练和推断。
如何玩转OFASys
基础配置
注:OFASys目前还在快速迭代,现在以相对独立的方式实现了ModelScope的接口,因此需要独立安装环境。
- 首先安装ModelScope和OFASys:
# modelscope的notebook不需要安装modelscope
# !pip install modelscope -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
!pip install setuptools==69.5.1
!pip install https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/ofasys-0.1.0-py3-none-any.whl
- 然后导入模型
from ofasys import ms_wrapper
from modelscope.pipelines import pipeline
pipe = pipeline('my-ofasys-task', model="damo/ofasys_multimodal_multitask_pretrain_base_en", model_revision='v1.0.0')
开启任务探索之旅
Image Captioning
instruction = '[IMAGE:img] <BOS> what does the image describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'img': "https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/ic.jpeg"}
output = pipe(data, instruction=instruction)
print(output.text) # "a man and woman sitting in front of a laptop computer"
Visual Grounding
instruction = '[IMAGE:img] <BOS> which region does the text " [TEXT:cap] " describe? <EOS> -> [BOX:patch_boxes,add_bos,add_eos]'
data = {'img': "https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/vg.jpg", "cap": "hand"}
output = pipe(data, instruction=instruction)
output.save_box("output.jpg")
Text Summarization
instruction = '<BOS> what is the summary of article " [TEXT:src] "? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {'src': "poland 's main opposition party tuesday endorsed president lech walesa in an upcoming "
"presidential run-off election after a reformed communist won the first round of voting ."}
output = pipe(data, instruction=instruction)
print(output.text) # "polish opposition endorses walesa in presidential run-off"
Table-to-Text Generation
instruction = '<BOS> structured knowledge: " [STRUCT:database,uncased] " . how to describe the tripleset ? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {
'database': [['Atlanta', 'OFFICIAL_POPULATION', '5,457,831'],
['[TABLECONTEXT]', 'METROPOLITAN_AREA', 'Atlanta'],
['5,457,831', 'YEAR', '2012'],
['[TABLECONTEXT]', '[TITLE]', 'List of metropolitan areas by population'],
['Atlanta', 'COUNTRY', 'United States'],
]
}
output = pipe(data, instruction=instruction, beam_size=1)
print(output.text) # "atlanta is the metropolitan area in the united states in 2012."
Text-to-SQL Generation
instruction = '<BOS> " [TEXT:src] " ; structured knowledge: " [STRUCT:database,max_length=876] " . generating sql code. <EOS> -> <BOS> [TEXT:tgt] <EOS>'
database = [
['concert_singer'],
['stadium', 'stadium_id , location , name , capacity , highest , lowest , average'],
['singer', 'singer_id , name , country , song_name , song_release_year , age , is_male'],
['concert', 'concert_id , concert_name , theme , stadium_id , year'],
['singer_in_concert', 'concert_id , singer_id']
]
data = [
{'src': 'What are the names, countries, and ages for every singer in descending order of age?', 'database': database},
{'src': 'What are all distinct countries where singers above age 20 are from?', 'database': database},
{'src': 'What are the locations and names of all stations with capacity between 5000 and 10000?', 'database': database}
]
output = pipe(data, instruction=instruction)
print('\n'.join([o.text for o in output]))
# "select name, country, age from singer order by age desc"
# "select distinct country from singer where age > 20"
# "select location, name from stadium where capacity between 5000 and 10000"
Video Captioning
instruction = '[VIDEO:video] <BOS> what does the video describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'video': 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/video7021.mp4'}
output = pipe(data, instruction=instruction)
print(output.text) # "a baseball player is hitting a ball"
Speech-to-Text Generation
instruction = '[AUDIO:wav] <BOS> what is the text corresponding to the voice? <EOS> -> [TEXT:text,preprocess=text_phone,add_bos,add_eos]'
data = {'wav': 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/1272-128104-0001.flac'}
output = pipe(data, instruction=instruction)
print(output.text) # "nor is mister klohs manner less interesting than his manner"
Text-to-Image Generation
instruction = 'what is the complete image? caption: [TEXT:text]"? -> [IMAGE,preprocess=image_vqgan,adaptor=image_vqgan]'
data = {'text': "a city with tall buildings and a large green park."}
output = pipe(data, instruction=instruction)
output[0].save_image('0.png')
模型局限性以及可能的偏差
训练数据集自身有局限,有可能产生一些偏差,请用户自行评测后决定如何使用。
相关论文以及引用
如果你觉得OFASys好用,喜欢我们的工作,欢迎引用:
@article{bai2022ofasys,
author = {
Jinze Bai and
Rui Men and
Hao Yang and
Xuancheng Ren and
Kai Dang and
Yichang Zhang and
Xiaohuan Zhou and
Peng Wang and
Sinan Tan and
An Yang and
Zeyu Cui and
Yu Han and
Shuai Bai and
Wenbin Ge and
Jianxin Ma and
Junyang Lin and
Jingren Zhou and
Chang Zhou},
title = {OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models},
journal = {CoRR},
volume = {abs/2212.04408},
year = {2022}
}
评论