模型介绍(Introduction)
TestGPT-7B,是蚂蚁研发的测试域大模型。该模型以CodeLlama-7B为基座,进行了测试领域下游任务的微调,包含多语言测试用例生成、测试用例Assert补全。
TestGPT-7B, developed by Ant Group, is a large-scale model designed for software quality domains. Built upon the foundation of CodeLlama-7B, this model has undergone fine-tuning for downstream tasks, including multi-language test case generation and test case assertion completion.
依赖项(Requirements)
- python>=3.8
- pytorch>=2.0.0
- CUDA 11.4
- transformers==4.33.2
WEB服务
- github: https://github.com/codefuse-ai/Test-Agent
- cd Test-Agent
- pip install -r requirements.txt
- 启动controller: python -m chat.server.controller
- 启动模型(如果不是M,英伟达选cuda): python -m chat.server.model_worker --model-path ../models/huggingface/llm/codefuse-ai/TestGPT-7B --device mps
- 启动web服务: python -m chat.server.gradio_testgpt
- 打开页面: http://127.0.0.1:7860
- 我本地试了一下,不是特别好用;可能这种代码大模型还是直接IDE插件比较直接
快速使用(QuickStart)
下面我们展示使用TestGPT-7B模型,进行测试用例生成、测试用例Assert补全的示例:
Below are examples of test case generation and test case assertion completion using the TestGPT-7B model:
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download, AutoConfig
import torch
HUMAN_ROLE_START_TAG = "<s>human\n"
BOT_ROLE_START_TAG = "<s>bot\n"
if __name__ == '__main__':
# 模型地址, 可以替换为本地模型地址
model_dir = snapshot_download('codefuse/TestGPT-7B', revision='v1.0.1')
# 加载tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True, use_fast=False, legacy=False)
eos_token = '</s>'
pad_token = '<unk>'
try:
tokenizer.eos_token = eos_token
tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids(eos_token)
except:
print(tokenizer.eos_token, tokenizer.eos_token_id)
try:
tokenizer.pad_token = pad_token
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(pad_token)
except:
print(tokenizer.pad_token, tokenizer.pad_token_id)
tokenizer.padding_side = "left"
print(f"tokenizer's eos_token: {tokenizer.eos_token}, pad_token: {tokenizer.pad_token}")
print(f"tokenizer's eos_token_id: {tokenizer.eos_token_id}, pad_token_id: {tokenizer.pad_token_id}")
# 配置
config, unused_kwargs = AutoConfig.from_pretrained(
model_dir,
use_flash_attn=True,
use_xformers=True,
trust_remote_code=True,
return_unused_kwargs=True)
# 加载模型
model = AutoModelForCausalLM.from_pretrained(
model_dir,
config=config,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
use_safetensors=False,
).eval()
# 推理生成测试用例
# 被测代码prompt,分为用例生成和assert补全
# 用例生成格式
prompt = '为以下Python代码生成单元测试\n' \
'```Python\ndef add(lst):\n return sum([lst[i] for i in range(1, len(lst), 2) if lst[i]%2 == 0])\n```\n'
# assert补全格式,目前仅支持java语言
# prompt = '下面是被测代码\n' \
# '```java\n' \
# 'public class BooleanUtils {\n ' \
# 'public static boolean and(final boolean... array) {\n ' \
# 'ObjectUtils.requireNonEmpty(array, "array");\n ' \
# 'for (final boolean element : array) {\n ' \
# 'if (!element) {\n return false;\n }\n }\n ' \
# 'return true;\n }\n}\n```\n' \
# '下面代码是针对上面被测代码生成的用例,请补全用例,生成assert校验\n' \
# '```java\n' \
# '@Test\npublic void testAnd_withAllTrueInputs() {\n ' \
# 'boolean[] input = new boolean[] {true, true, true};\n ' \
# 'boolean result = BooleanUtils.and(input);\n}\n\n@Test\npublic void testAnd_withOneFalseInput() {\n ' \
# 'boolean[] input = new boolean[] {true, false, true};\n ' \
# 'boolean result = BooleanUtils.and(input);\n}\n' \
# '```\n'
# 输入格式化处理
prompt = f"{HUMAN_ROLE_START_TAG}{prompt}{BOT_ROLE_START_TAG}"
inputs = tokenizer(prompt, return_tensors='pt', padding=True, add_special_tokens=False).to("cuda")
# 推理
outputs = model.generate(
inputs=inputs["input_ids"],
max_new_tokens=2048,
top_p=0.95,
temperature=0.2,
do_sample=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
num_return_sequences=1,
)
# 结果处理
outputs_len = len(outputs)
print(f"output len is: {outputs_len}")
for index in range(0, outputs_len):
print(f"generate index: {index}")
gen_text = tokenizer.decode(outputs[index], skip_special_tokens=True)
print(gen_text)
print("===================")
评论