text-classification widgets:
task: text-classification inputs:
- type: text validator: max_words: 256
- type: text validator: max_words: 256 examples:
- name: 1 inputs:
  - data: 老垅坡高家组省建五公司
  - data: 高家巷省建五公司岳阳分公司
- name: 2 inputs:
  - data: 北京航空航天大学逸夫楼
  - data: 北京航空航天大学图书馆
- name: 3 inputs:
  - data: 浙江省杭州市湖滨银泰
  - data: 重庆市沙坪坝银泰百货 domain:
nlp frameworks:
PyTorch model-type:
text-classification backbone:
bert metrics:
Acc language:
cn license: Apache License 2.0 tags:
Alibaba
transformer
sentence-similarity
text-similarity
text-semantic-matching
文本相似度
文本对匹配
句子对匹配
句子相似度
语义匹配

datasets: train:

ccks2021-addrsim test:
ccks2021-addrsim evaluation:
ccks2021-addrsim

indexing: results:

task: name: Address Similarity dataset: name: ccks2021-addrsim metrics:
- type: Acc value: 83.86 description: Acc args: default

地址相似度匹配介绍

模型描述

本模型使用StructBERT作为预训练模型底座，使用句子对分类计算交叉熵的方式进行训练。模型结构如下图所示：

模型结构

期望模型使用方式以及适用范围

日常生活中输入的地址文本可以为以下几种形式：

包含四级行政区划及路名路号POI的规范地址文本；

地址要素缺省的规范地址文本，例：只有路名+路号、只有POI；

非规范的地址文本、口语化的地址信息描述，例：阿里西溪园区东门旁亲橙里；

地址相似度匹配模型主要是对输入的两条地址，评估他们之间的相似程度，输出完全相关(exactmatch)、部分相关(partialmatch)、不相关(not_match)这三个标签类型的概率。

用户可以自行尝试输入中文句子。具体调用方式请参考代码示例。

如何使用

在安装ModelScope完成之后即可使用nlpstructbertaddress-matchchinesebase(地址相似度匹配)的能力, 默认两个句子长度加起来不超过512。

代码范例

kk```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

pipeline_ins = pipeline(
    task=Tasks.text_classification, model='damo/nlp_structbert_address-matching_chinese_base')
print(pipeline_ins(input=('北京航空航天大学逸夫楼', '北京航空航天大学图书馆')))
# {'scores': [0.005042273085564375, 0.14676348865032196, 0.8481942415237427], 'labels': ['exact_match', 'not_match', 'partial_match']}

模型局限性以及可能的偏差

本模型基于ccks2021-addrsim数据集上训练，请用户自行评测后决定如何使用。

训练数据介绍

ccks2021-addrsim: 中文地址相似度匹配数据集。

数据评估及结果

模型在ccks2021-addrsim测试数据评估结果:

Dataset	Accuracy
ccks2021-addrsim	83.86

StructBERT地址相似度匹配-中文-地址领域-base

作品详情