RaNER-chunking-英文-large

我要开发同款
匿名用户2024年07月31日
23阅读
所属分类ai、xlm-roberta、pytorch、组块分析、ACL 2021、Chunking、Alibaba、F1、nlp
开源地址https://modelscope.cn/models/iic/nlp_raner_chunking_english-large
授权协议Apache License 2.0

作品详情

RANER介绍

模型描述

本方法采用Transformer-CRF模型,使用XLM-Roberta作为预训练模型底座,结合使用外部工具召回的相关句子作为额外上下文,使用Multi-view Training方式进行训练。 模型结构如下图所示:

模型结构

可参考论文:Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

期望模型使用方式以及适用范围

本模型主要用于给输入英语句子产出命名实体识别结果。用户可以自行尝试输入英语句子。具体调用方式请参考代码示例。

如何使用

在安装ModelScope完成之后即可使用chunking(组块分析)的能力, 默认单句长度不超过512。

代码范例

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

ner_pipeline = pipeline(Tasks.named_entity_recognition, 'damo/nlp_raner_chunking_english-large')
result = ner_pipeline('Confidence in the pound is widely expected to take another sharp dive if trade figures for September.')

print(result)
# {'output': [{'type': 'NP', 'start': 0, 'end': 10, 'span': 'Confidence'}, {'type': 'PP', 'start': 11, 'end': 13, 'span': 'in'}, {'type': 'NP', 'start': 14, 'end': 23, 'span': 'the pound'}, {'type': 'VP', 'start': 24, 'end': 50, 'span': 'is widely expected to take'}, {'type': 'NP', 'start': 51, 'end': 69, 'span': 'another sharp dive'}, {'type': 'SBAR', 'start': 70, 'end': 72, 'span': 'if'}, {'type': 'NP', 'start': 73, 'end': 78, 'span': 'trade'}, {'type': 'NP', 'start': 79, 'end': 86, 'span': 'figures'}, {'type': 'PP', 'start': 87, 'end': 90, 'span': 'for'}, {'type': 'NP', 'start': 91, 'end': 101, 'span': 'September.'}]}

模型局限性以及可能的偏差

本模型基于conll2000数据集上训练,在垂类领域英语文本上的chunking效果会有降低,请用户自行评测后决定如何使用。

训练数据介绍

chunk类型 英文名
名词短语 NP
动词短语 VP
介词短语 PP
副词短语 ADVP
从句 SBAR
形容词短语 ADJP
小品词 PRT
连词短语 CONJP
感叹词 INTJ
序列标记 LST
非协调短语 UCP

数据评估及结果

模型在conll2000测试数据评估结果:

Dataset Precision Recall F1
conll2000 97.15 97.21 97.18

相关论文以及引用信息

如果你觉得这个该模型对有所帮助,请考虑引用下面的相关的论文:

@inproceedings{wang-etal-2021-improving,
    title = "Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning",
    author = "Wang, Xinyu  and
      Jiang, Yong  and
      Bach, Nguyen  and
      Wang, Tao  and
      Huang, Zhongqiang  and
      Huang, Fei  and
      Tu, Kewei",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.142",
    pages = "1800--1812",
}

@inproceedings{wang-etal-2022-damo,
    title = "{DAMO}-{NLP} at {S}em{E}val-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition",
    author = "Wang, Xinyu  and
      Shen, Yongliang  and
      Cai, Jiong  and
      Wang, Tao  and
      Wang, Xiaobin  and
      Xie, Pengjun  and
      Huang, Fei  and
      Lu, Weiming  and
      Zhuang, Yueting  and
      Tu, Kewei  and
      Lu, Wei  and
      Jiang, Yong",
    booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.semeval-1.200",
    pages = "1457--1468",
}
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论