匿名用户2024年07月31日
36阅读
所属分类ai、llama、Pytorch
开源地址https://modelscope.cn/models/maple77/TestGPT-7B

作品详情

模型介绍(Introduction)

TestGPT-7B,是蚂蚁研发的测试域大模型。该模型以CodeLlama-7B为基座,进行了测试领域下游任务的微调,包含多语言测试用例生成、测试用例Assert补全。
TestGPT-7B, developed by Ant Group, is a large-scale model designed for software quality domains. Built upon the foundation of CodeLlama-7B, this model has undergone fine-tuning for downstream tasks, including multi-language test case generation and test case assertion completion.

依赖项(Requirements)

  • python>=3.8
  • pytorch>=2.0.0
  • CUDA 11.4
  • transformers==4.33.2

WEB服务

  • github: https://github.com/codefuse-ai/Test-Agent
  • cd Test-Agent
  • pip install -r requirements.txt
  • 启动controller: python -m chat.server.controller
  • 启动模型(如果不是M,英伟达选cuda): python -m chat.server.model_worker --model-path ../models/huggingface/llm/codefuse-ai/TestGPT-7B --device mps
  • 启动web服务: python -m chat.server.gradio_testgpt
  • 打开页面: http://127.0.0.1:7860
  • 我本地试了一下,不是特别好用;可能这种代码大模型还是直接IDE插件比较直接

快速使用(QuickStart)

下面我们展示使用TestGPT-7B模型,进行测试用例生成、测试用例Assert补全的示例:
Below are examples of test case generation and test case assertion completion using the TestGPT-7B model:

from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download, AutoConfig
import torch

HUMAN_ROLE_START_TAG = "<s>human\n"
BOT_ROLE_START_TAG = "<s>bot\n"

if __name__ == '__main__':
    # 模型地址, 可以替换为本地模型地址
    model_dir = snapshot_download('codefuse/TestGPT-7B', revision='v1.0.1')

    # 加载tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True, use_fast=False, legacy=False)

    eos_token = '</s>'
    pad_token = '<unk>'

    try:
        tokenizer.eos_token = eos_token
        tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids(eos_token)
    except:
        print(tokenizer.eos_token, tokenizer.eos_token_id)

    try:
        tokenizer.pad_token = pad_token
        tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(pad_token)
    except:
        print(tokenizer.pad_token, tokenizer.pad_token_id)

    tokenizer.padding_side = "left"
    print(f"tokenizer's eos_token: {tokenizer.eos_token}, pad_token: {tokenizer.pad_token}")
    print(f"tokenizer's eos_token_id: {tokenizer.eos_token_id}, pad_token_id: {tokenizer.pad_token_id}")

    # 配置
    config, unused_kwargs = AutoConfig.from_pretrained(
        model_dir,
        use_flash_attn=True,
        use_xformers=True,
        trust_remote_code=True,
        return_unused_kwargs=True)

    # 加载模型
    model = AutoModelForCausalLM.from_pretrained(
        model_dir,
        config=config,
        device_map="auto",
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
        use_safetensors=False,
    ).eval()

    # 推理生成测试用例
    # 被测代码prompt,分为用例生成和assert补全
    # 用例生成格式
    prompt = '为以下Python代码生成单元测试\n' \
             '```Python\ndef add(lst):\n    return sum([lst[i] for i in range(1, len(lst), 2) if lst[i]%2 == 0])\n```\n'

    # assert补全格式,目前仅支持java语言
    # prompt = '下面是被测代码\n' \
    #          '```java\n' \
    #          'public class BooleanUtils {\n    ' \
    #          'public static boolean and(final boolean... array) {\n        ' \
    #          'ObjectUtils.requireNonEmpty(array, "array");\n        ' \
    #          'for (final boolean element : array) {\n            ' \
    #          'if (!element) {\n                return false;\n            }\n        }\n        ' \
    #          'return true;\n    }\n}\n```\n' \
    #          '下面代码是针对上面被测代码生成的用例,请补全用例,生成assert校验\n' \
    #          '```java\n' \
    #          '@Test\npublic void testAnd_withAllTrueInputs() {\n    ' \
    #          'boolean[] input = new boolean[] {true, true, true};\n    ' \
    #          'boolean result = BooleanUtils.and(input);\n}\n\n@Test\npublic void testAnd_withOneFalseInput() {\n    ' \
    #          'boolean[] input = new boolean[] {true, false, true};\n    ' \
    #          'boolean result = BooleanUtils.and(input);\n}\n' \
    #          '```\n'

    # 输入格式化处理
    prompt = f"{HUMAN_ROLE_START_TAG}{prompt}{BOT_ROLE_START_TAG}"
    inputs = tokenizer(prompt, return_tensors='pt', padding=True, add_special_tokens=False).to("cuda")

    # 推理
    outputs = model.generate(
        inputs=inputs["input_ids"],
        max_new_tokens=2048,
        top_p=0.95,
        temperature=0.2,
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        num_return_sequences=1,
    )

    # 结果处理
    outputs_len = len(outputs)
    print(f"output len is: {outputs_len}")
    for index in range(0, outputs_len):
        print(f"generate index: {index}")
        gen_text = tokenizer.decode(outputs[index], skip_special_tokens=True)
        print(gen_text)
        print("===================")
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论