匿名用户2024年07月31日
63阅读

技术信息

开源地址
https://modelscope.cn/models/AI-ModelScope/MistralLite
授权协议
apache-2.0

作品详情

MistralLite Model

MistralLite is a fie-tued Mistral-7B-v0.1 laguage model, with ehaced capabilities of processig log cotext (up to 32K tokes). By utilizig a adapted Rotary Embeddig ad slidig widow durig fie-tuig, MistralLite is able to perform sigificatly better o several log cotext retrieve ad aswerig tasks, while keepig the simple model structure of the origial model. MistralLite is useful for applicatios such as log cotext lie ad topic retrieval, summarizatio, questio-aswerig, ad etc. MistralLite ca be deployed o a sigle AWS g5.2x istace with Sagemaker Huggigface Text Geeratio Iferece (TGI) edpoit, makig it suitable for applicatios that require high performace i resource-costraied eviromets. You ca also serve the MistralLite model directly usig TGI docker cotaiers. Also, MistralLite supports other ways of servig like vLLM, ad you ca use MistralLite i Pytho by usig the HuggigFace trasformers ad FlashAttetio-2 library.

MistralLite is similar to Mistral-7B-Istruct-v0.1, ad their similarities ad differeces are summarized below:

Model Fie-tued o log cotexts Max cotext legth RotaryEmbeddig adaptatio Slidig Widow Size
Mistral-7B-Istruct-v0.1 up to 8K tokes 32K rope_theta = 10000 4096
MistralLite up to 16K tokes 32K rope_theta = 1000000 16384

Motivatio of Developig MistralLite

Sice the release of Mistral-7B-Istruct-v0.1, the model became icreasigly popular because its strog performace o a wide rage of bechmarks. But most of the bechmarks are evaluated o short cotext, ad ot much has bee ivestigated o its performace o log cotext tasks. The We evaluated Mistral-7B-Istruct-v0.1 agaist bechmarks that are specifically desiged to assess the capabilities of LLMs i hadlig loger cotext. Although the performace of the models o log cotext was fairly competitive o log cotext less tha 4096 tokes, there were some limitatios o its performace o loger cotext. Motivated by improvig its performace o loger cotext, we fietued the Mistral 7B model, ad produced Mistrallite. The model maaged to sigificatly boost the performace of log cotext hadlig over Mistral-7B-Istruct-v0.1. The detailed log cotext evalutaio results are as below:

  1. Topic Retrieval

    Model Name Iput legth Iput legth Iput legth Iput legth Iput legth
    2851 5568 8313 11044 13780
    Mistral-7B-Istruct-v0.1 100% 50% 2% 0% 0%
    MistralLite 100% 100% 100% 100% 98%
  2. Lie Retrieval

Model Name Iput legth Iput legth Iput legth Iput legth Iput legth Iput legth
3818 5661 7505 9354 11188 12657
Mistral-7B-Istruct-v0.1 98% 62% 42% 42% 32% 30%
MistralLite 98% 92% 88% 76% 70% 60%
  1. Pass key Retrieval
Model Name Iput legth Iput legth Iput legth Iput legth
3264 5396 8329 10197
Mistral-7B-Istruct-v0.1 100% 50% 20% 30%
MistralLite 100% 100% 100% 100%

  1. Questio Aswerig with Log Iput Texts

    Model Name Test set Accuracy Hard subset Accuracy
    Mistral-7B-Istruct-v0.1 44.3% 39.7%
    MistralLite 64.4% 56.2%

    Example test o log cotext of 13400 tokes

    • Cotext is from Amazo Aurora FAQs
    • Questio: please tell me how does pgvector help with Geerative AI ad give me some examples.
    • Aswer from MistralLite:
      pgvector is a ope-source extesio for PostgreSQL supported by Amazo Aurora PostgreSQL-Compatible Editio.
    
      You ca use pgvector to store, search, idex, ad query billios of embeddigs that are geerated from machie learig (ML) ad artificial itelligece (AI) models i your database, such as those from Amazo Bedrock (limited preview) or Amazo SageMaker. A vector embeddig is a umerical represetatio that represets the sematic meaig of cotet such as text, images, ad video.
    
      With pgvector, you ca query embeddigs i your Aurora PostgreSQL database to perform efficiet sematic similarity searches of these data types, represeted as vectors, combied with other tabular data i Aurora. This eables the use of geerative AI ad other AI/ML systems for ew types of applicatios such as persoalized recommedatios based o similar text descriptios or images, cadidate match based o iterview otes, customer service ext best actio recommedatios based o successful trascripts or chat sessio dialogs, ad more.
    

    Model Details

    How to Use MistralLite from Pytho Code (HuggigFace trasformers)

    Importat - For a ed-to-ed example Jupyter otebook, please refer to this lik.

    Istall the ecessary packages

    Requires: trasformers 4.34.0 or later, flash-att 2.3.1.post1 or later, ad accelerate 0.23.0 or later.

    pip istall trasformers==4.34.0
    pip istall flash-att==2.3.1.post1 --o-build-isolatio
    pip istall accelerate==0.23.0
    

    You ca the try the followig example code

    from trasformers import AutoModelForCausalLM, AutoTokeizer
    import trasformers
    import torch
    
    model_id = "amazo/MistralLite"
    
    tokeizer = AutoTokeizer.from_pretraied(model_id)
    model = AutoModelForCausalLM.from_pretraied(model_id,
                                                 torch_dtype=torch.bfloat16,
                                                 use_flash_attetio_2=True,
                                                 device_map="auto",)
    pipelie = trasformers.pipelie(
        "text-geeratio",
        model=model,
        tokeizer=tokeizer,
    )
    prompt = "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>"
    
    sequeces = pipelie(
        prompt,
        max_ew_tokes=400,
        do_sample=False,
        retur_full_text=False,
        um_retur_sequeces=1,
        eos_toke_id=tokeizer.eos_toke_id,
    )
    for seq i sequeces:
        prit(f"{seq['geerated_text']}")
    

    Importat - Use the prompt template below for MistralLite:

    <|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>
    

    How to Serve MistralLite o TGI

    Importat:

    • For a ed-to-ed example Jupyter otebook usig the ative TGI cotaier, please refer to this lik.
    • If the iput cotext legth is greater tha 12K tokes, it is recommeded usig a custom TGI cotaier, please refer to this lik.

    Start TGI server

    Use TGI versio 1.1.0 or later. The official Docker cotaier is: ghcr.io/huggigface/text-geeratio-iferece:1.1.0

    Example Docker parameters:

    docker ru -d --gpus all --shm-size 1g -p 443:80 -v $(pwd)/models:/data ghcr.io/huggigface/text-geeratio-iferece:1.1.0 \
          --model-id amazo/MistralLite \
          --max-iput-legth 16000 \
          --max-total-tokes 16384 \
          --max-batch-prefill-tokes 16384 \
          --trust-remote-code
    

    Perform Iferece

    Example Pytho code for iferece with TGI (requires text_geeratio 0.6.1 or later):

    pip istall text_geeratio==0.6.1
    
    from text_geeratio import Cliet
    
    SERVER_PORT = 443
    SERVER_HOST = "localhost"
    SERVER_URL = f"{SERVER_HOST}:{SERVER_PORT}"
    tgi_cliet = Cliet(f"http://{SERVER_URL}", timeout=60)
    
    def ivoke_tgi(prompt, 
                          radom_seed=1, 
                          max_ew_tokes=400, 
                          prit_stream=True,
                          assist_role=True):
        if (assist_role):
            prompt = f"<|prompter|>{prompt}</s><|assistat|>"
        output = ""
        for respose i tgi_cliet.geerate_stream(
            prompt,
            do_sample=False,
            max_ew_tokes=max_ew_tokes,
            retur_full_text=False,
            #temperature=Noe,
            #trucate=Noe,
            #seed=radom_seed,
            #typical_p=0.2,
        ):
            if hasattr(respose, "toke"):
                if ot respose.toke.special:
                    sippet = respose.toke.text
                    output += sippet
                    if (prit_stream):
                        prit(sippet, ed='', flush=True)
        retur output
    
    prompt = "What are the mai challeges to support a log cotext for LLM?"
    result = ivoke_tgi(prompt)
    

    Importat - Whe usig MistralLite for iferece for the first time, it may require a brief 'warm-up' period that ca take 10s of secods. However, subsequet ifereces should be faster ad retur results i a more timely maer. This warm-up period is ormal ad should ot affect the overall performace of the system oce the iitialisatio period has bee completed.

    How to Deploy MistralLite o Amazo SageMaker

    Importat:

    • For a ed-to-ed example Jupyter otebook usig the SageMaker built-i cotaier, please refer to this lik.
    • If the iput cotext legth is greater tha 12K tokes, it is recommeded usig a custom docker cotaier, please refer to this lik.

    Istall the ecessary packages

    Requires: sagemaker 2.192.1 or later.

    pip istall sagemaker==2.192.1
    

    Deploy the Model as A SageMaker Edpoit

    To deploy MistralLite o a SageMaker edpoit, please follow the example code as below.

    import sagemaker
    from sagemaker.huggigface import HuggigFaceModel, get_huggigface_llm_image_uri
    import time
    
    sagemaker_sessio = sagemaker.Sessio()
    regio = sagemaker_sessio.boto_regio_ame
    role = sagemaker.get_executio_role()
    
    image_uri = get_huggigface_llm_image_uri(
      backed="huggigface", # or lmi
      regio=regio,
     versio="1.1.0"
    )
    
    model_ame = "MistralLite-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
    
    hub = {
        'HF_MODEL_ID':'amazo/MistralLite',
        'HF_TASK':'text-geeratio',
        'SM_NUM_GPUS':'1',
        "MAX_INPUT_LENGTH": '16000',
        "MAX_TOTAL_TOKENS": '16384',
        "MAX_BATCH_PREFILL_TOKENS": '16384',
        "MAX_BATCH_TOTAL_TOKENS":  '16384',
    }
    
    model = HuggigFaceModel(
        ame=model_ame,
        ev=hub,
        role=role,
        image_uri=image_uri
    )
    predictor = model.deploy(
      iitial_istace_cout=1,
      istace_type="ml.g5.2xlarge",
      edpoit_ame=model_ame,
    
    )
    

    Perform Iferece

    To call the edpoit, please follow the example code as below:

    iput_data = {
      "iputs": "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>",
      "parameters": {
        "do_sample": False,
        "max_ew_tokes": 400,
        "retur_full_text": False,
        #"typical_p": 0.2,
        #"temperature":Noe,
        #"trucate":Noe,
        #"seed": 1,
      }
    }
    result = predictor.predict(iput_data)[0]["geerated_text"]
    prit(result)
    

    or via boto3, ad the example code is show as below:

    import boto3
    import jso
    def call_edpoit(cliet, prompt, edpoit_ame, paramters):
        cliet = boto3.cliet("sagemaker-rutime")
        payload = {"iputs": prompt,
                   "parameters": parameters}
        respose = cliet.ivoke_edpoit(EdpoitName=edpoit_ame,
                                          Body=jso.dumps(payload), 
                                          CotetType="applicatio/jso")
        output = jso.loads(respose["Body"].read().decode())
        result = output[0]["geerated_text"]
        retur result
    
    cliet = boto3.cliet("sagemaker-rutime")
    parameters = {
        "do_sample": False,
        "max_ew_tokes": 400,
        "retur_full_text": False,
        #"typical_p": 0.2,
        #"temperature":Noe,
        #"trucate":Noe,
        #"seed": 1,
    }
    edpoit_ame = predictor.edpoit_ame
    prompt = "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>"
    result = call_edpoit(cliet, prompt, edpoit_ame, parameters)
    prit(result)
    

    How to Serve MistralLite o vLLM

    Documetatio o istallig ad usig vLLM ca be foud here.

    Importat - For a ed-to-ed example Jupyter otebook, please refer to this lik.

    Usig vLLM as a server

    Whe usig vLLM as a server, pass the --model amazo/MistralLite parameter, for example:

    pytho3 -m vllm.etrypoits.api_server --model amazo/MistralLite
    

    Usig vLLM i Pytho Code

    Whe usig vLLM from Pytho code, Please see the example code as below:

    from vllm import LLM, SampligParams
    
    prompts = [
       "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>",
    ]
    samplig_params = SampligParams(temperature=0, max_tokes=100)
    
    llm = LLM(model="amazo/MistralLite",)
    
    outputs = llm.geerate(prompts, samplig_params)
    
    # Prit the outputs.
    for output i outputs:
        prompt = output.prompt
        geerated_text = output.outputs[0].text
        prit(f"Prompt: {prompt!r}, Geerated text: {geerated_text!r}")
    

    Limitatios

    Before usig the MistralLite model, it is importat to perform your ow idepedet assessmet, ad take measures to esure that your use would comply with your ow specific quality cotrol practices ad stadards, ad that your use would comply with the local rules, laws, regulatios, liceses ad terms that apply to you, ad your cotet.

功能介绍

MistralLite Model MistralLite is a fine-tuned Mistral-7B-v0.1 language model, with enhanced capabili

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论