MistralLite Model

MistralLite is a fie-tued Mistral-7B-v0.1 laguage model, with ehaced capabilities of processig log cotext (up to 32K tokes). By utilizig a adapted Rotary Embeddig ad slidig widow durig fie-tuig, MistralLite is able to perform sigificatly better o several log cotext retrieve ad aswerig tasks, while keepig the simple model structure of the origial model. MistralLite is useful for applicatios such as log cotext lie ad topic retrieval, summarizatio, questio-aswerig, ad etc. MistralLite ca be deployed o a sigle AWS g5.2x istace with Sagemaker Huggigface Text Geeratio Iferece (TGI) edpoit, makig it suitable for applicatios that require high performace i resource-costraied eviromets. You ca also serve the MistralLite model directly usig TGI docker cotaiers. Also, MistralLite supports other ways of servig like vLLM, ad you ca use MistralLite i Pytho by usig the HuggigFace trasformers ad FlashAttetio-2 library.

MistralLite is similar to Mistral-7B-Istruct-v0.1, ad their similarities ad differeces are summarized below:

Model	Fie-tued o log cotexts	Max cotext legth	RotaryEmbeddig adaptatio	Slidig Widow Size
Mistral-7B-Istruct-v0.1	up to 8K tokes	32K	rope_theta = 10000	4096
MistralLite	up to 16K tokes	32K	rope_theta = 1000000	16384

Motivatio of Developig MistralLite

Sice the release of Mistral-7B-Istruct-v0.1, the model became icreasigly popular because its strog performace o a wide rage of bechmarks. But most of the bechmarks are evaluated o short cotext, ad ot much has bee ivestigated o its performace o log cotext tasks. The We evaluated Mistral-7B-Istruct-v0.1 agaist bechmarks that are specifically desiged to assess the capabilities of LLMs i hadlig loger cotext. Although the performace of the models o log cotext was fairly competitive o log cotext less tha 4096 tokes, there were some limitatios o its performace o loger cotext. Motivated by improvig its performace o loger cotext, we fietued the Mistral 7B model, ad produced Mistrallite. The model maaged to sigificatly boost the performace of log cotext hadlig over Mistral-7B-Istruct-v0.1. The detailed log cotext evalutaio results are as below:

Topic Retrieval

Model Name Iput legth Iput legth Iput legth Iput legth Iput legth

2851 5568 8313 11044 13780

Mistral-7B-Istruct-v0.1 100% 50% 2% 0% 0%

MistralLite 100% 100% 100% 100% 98%
Lie Retrieval

Model Name	Iput legth	Iput legth	Iput legth	Iput legth	Iput legth
	2851	5568	8313	11044	13780
Mistral-7B-Istruct-v0.1	100%	50%	2%	0%	0%
MistralLite	100%	100%	100%	100%	98%

Model Name	Iput legth	Iput legth	Iput legth	Iput legth	Iput legth	Iput legth
	3818	5661	7505	9354	11188	12657
Mistral-7B-Istruct-v0.1	98%	62%	42%	42%	32%	30%
MistralLite	98%	92%	88%	76%	70%	60%

Pass key Retrieval

Model Name	Iput legth	Iput legth	Iput legth	Iput legth
	3264	5396	8329	10197
Mistral-7B-Istruct-v0.1	100%	50%	20%	30%
MistralLite	100%	100%	100%	100%

Questio Aswerig with Log Iput Texts

Model Name	Test set Accuracy	Hard subset Accuracy
Mistral-7B-Istruct-v0.1	44.3%	39.7%
MistralLite	64.4%	56.2%

Example test o log cotext of 13400 tokes

Cotext is from Amazo Aurora FAQs
Questio: please tell me how does pgvector help with Geerative AI ad give me some examples.
Aswer from MistralLite:

  pgvector is a ope-source extesio for PostgreSQL supported by Amazo Aurora PostgreSQL-Compatible Editio.

  You ca use pgvector to store, search, idex, ad query billios of embeddigs that are geerated from machie learig (ML) ad artificial itelligece (AI) models i your database, such as those from Amazo Bedrock (limited preview) or Amazo SageMaker. A vector embeddig is a umerical represetatio that represets the sematic meaig of cotet such as text, images, ad video.

  With pgvector, you ca query embeddigs i your Aurora PostgreSQL database to perform efficiet sematic similarity searches of these data types, represeted as vectors, combied with other tabular data i Aurora. This eables the use of geerative AI ad other AI/ML systems for ew types of applicatios such as persoalized recommedatios based o similar text descriptios or images, cadidate match based o iterview otes, customer service ext best actio recommedatios based o successful trascripts or chat sessio dialogs, ad more.

Model Details

Developed by: AWS Cotributors
Model type: Mistral-7B-v0.1
Laguage: Eglish
Fietued from weights: Mistral-7B-v0.1
Fietued o data:
SLidigEcoder ad Decoder (SLED)
(Log) Natural Questios (NQ)
OpeAssistat Coversatios Dataset (OASST1)
Supported Servig Framework:
Text-Geeratio-Iferece 1.1.0
vLLM
HuggigFace trasformers
HuggigFace Text Geeratio Iferece (TGI) cotaier o SageMaker
Model Licese: Apache 2.0
Cotact: GitHub issues
Iferece Code Github Repo

How to Use MistralLite from Pytho Code (HuggigFace trasformers)

Importat - For a ed-to-ed example Jupyter otebook, please refer to this lik.

Istall the ecessary packages

Requires: trasformers 4.34.0 or later, flash-att 2.3.1.post1 or later, ad accelerate 0.23.0 or later.

pip istall trasformers==4.34.0
pip istall flash-att==2.3.1.post1 --o-build-isolatio
pip istall accelerate==0.23.0

You ca the try the followig example code

from trasformers import AutoModelForCausalLM, AutoTokeizer
import trasformers
import torch

model_id = "amazo/MistralLite"

tokeizer = AutoTokeizer.from_pretraied(model_id)
model = AutoModelForCausalLM.from_pretraied(model_id,
                                             torch_dtype=torch.bfloat16,
                                             use_flash_attetio_2=True,
                                             device_map="auto",)
pipelie = trasformers.pipelie(
    "text-geeratio",
    model=model,
    tokeizer=tokeizer,
)
prompt = "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>"

sequeces = pipelie(
    prompt,
    max_ew_tokes=400,
    do_sample=False,
    retur_full_text=False,
    um_retur_sequeces=1,
    eos_toke_id=tokeizer.eos_toke_id,
)
for seq i sequeces:
    prit(f"{seq['geerated_text']}")

Importat - Use the prompt template below for MistralLite:

<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>

How to Serve MistralLite o TGI

Importat:

For a ed-to-ed example Jupyter otebook usig the ative TGI cotaier, please refer to this lik.
If the iput cotext legth is greater tha 12K tokes, it is recommeded usig a custom TGI cotaier, please refer to this lik.

Start TGI server

Use TGI versio 1.1.0 or later. The official Docker cotaier is: ghcr.io/huggigface/text-geeratio-iferece:1.1.0

Example Docker parameters:

docker ru -d --gpus all --shm-size 1g -p 443:80 -v $(pwd)/models:/data ghcr.io/huggigface/text-geeratio-iferece:1.1.0 \
      --model-id amazo/MistralLite \
      --max-iput-legth 16000 \
      --max-total-tokes 16384 \
      --max-batch-prefill-tokes 16384 \
      --trust-remote-code

Perform Iferece

Example Pytho code for iferece with TGI (requires text_geeratio 0.6.1 or later):

pip istall text_geeratio==0.6.1

from text_geeratio import Cliet

SERVER_PORT = 443
SERVER_HOST = "localhost"
SERVER_URL = f"{SERVER_HOST}:{SERVER_PORT}"
tgi_cliet = Cliet(f"http://{SERVER_URL}", timeout=60)

def ivoke_tgi(prompt, 
                      radom_seed=1, 
                      max_ew_tokes=400, 
                      prit_stream=True,
                      assist_role=True):
    if (assist_role):
        prompt = f"<|prompter|>{prompt}</s><|assistat|>"
    output = ""
    for respose i tgi_cliet.geerate_stream(
        prompt,
        do_sample=False,
        max_ew_tokes=max_ew_tokes,
        retur_full_text=False,
        #temperature=Noe,
        #trucate=Noe,
        #seed=radom_seed,
        #typical_p=0.2,
    ):
        if hasattr(respose, "toke"):
            if ot respose.toke.special:
                sippet = respose.toke.text
                output += sippet
                if (prit_stream):
                    prit(sippet, ed='', flush=True)
    retur output

prompt = "What are the mai challeges to support a log cotext for LLM?"
result = ivoke_tgi(prompt)

Importat - Whe usig MistralLite for iferece for the first time, it may require a brief 'warm-up' period that ca take 10s of secods. However, subsequet ifereces should be faster ad retur results i a more timely maer. This warm-up period is ormal ad should ot affect the overall performace of the system oce the iitialisatio period has bee completed.

How to Deploy MistralLite o Amazo SageMaker

Importat:

For a ed-to-ed example Jupyter otebook usig the SageMaker built-i cotaier, please refer to this lik.
If the iput cotext legth is greater tha 12K tokes, it is recommeded usig a custom docker cotaier, please refer to this lik.

Istall the ecessary packages

Requires: sagemaker 2.192.1 or later.

pip istall sagemaker==2.192.1

Deploy the Model as A SageMaker Edpoit

To deploy MistralLite o a SageMaker edpoit, please follow the example code as below.

import sagemaker
from sagemaker.huggigface import HuggigFaceModel, get_huggigface_llm_image_uri
import time

sagemaker_sessio = sagemaker.Sessio()
regio = sagemaker_sessio.boto_regio_ame
role = sagemaker.get_executio_role()

image_uri = get_huggigface_llm_image_uri(
  backed="huggigface", # or lmi
  regio=regio,
 versio="1.1.0"
)

model_ame = "MistralLite-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

hub = {
    'HF_MODEL_ID':'amazo/MistralLite',
    'HF_TASK':'text-geeratio',
    'SM_NUM_GPUS':'1',
    "MAX_INPUT_LENGTH": '16000',
    "MAX_TOTAL_TOKENS": '16384',
    "MAX_BATCH_PREFILL_TOKENS": '16384',
    "MAX_BATCH_TOTAL_TOKENS":  '16384',
}

model = HuggigFaceModel(
    ame=model_ame,
    ev=hub,
    role=role,
    image_uri=image_uri
)
predictor = model.deploy(
  iitial_istace_cout=1,
  istace_type="ml.g5.2xlarge",
  edpoit_ame=model_ame,

)

Perform Iferece

To call the edpoit, please follow the example code as below:

iput_data = {
  "iputs": "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>",
  "parameters": {
    "do_sample": False,
    "max_ew_tokes": 400,
    "retur_full_text": False,
    #"typical_p": 0.2,
    #"temperature":Noe,
    #"trucate":Noe,
    #"seed": 1,
  }
}
result = predictor.predict(iput_data)[0]["geerated_text"]
prit(result)

or via boto3, ad the example code is show as below:

import boto3
import jso
def call_edpoit(cliet, prompt, edpoit_ame, paramters):
    cliet = boto3.cliet("sagemaker-rutime")
    payload = {"iputs": prompt,
               "parameters": parameters}
    respose = cliet.ivoke_edpoit(EdpoitName=edpoit_ame,
                                      Body=jso.dumps(payload), 
                                      CotetType="applicatio/jso")
    output = jso.loads(respose["Body"].read().decode())
    result = output[0]["geerated_text"]
    retur result

cliet = boto3.cliet("sagemaker-rutime")
parameters = {
    "do_sample": False,
    "max_ew_tokes": 400,
    "retur_full_text": False,
    #"typical_p": 0.2,
    #"temperature":Noe,
    #"trucate":Noe,
    #"seed": 1,
}
edpoit_ame = predictor.edpoit_ame
prompt = "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>"
result = call_edpoit(cliet, prompt, edpoit_ame, parameters)
prit(result)

How to Serve MistralLite o vLLM

Documetatio o istallig ad usig vLLM ca be foud here.

Importat - For a ed-to-ed example Jupyter otebook, please refer to this lik.

Usig vLLM as a server

Whe usig vLLM as a server, pass the --model amazo/MistralLite parameter, for example:

pytho3 -m vllm.etrypoits.api_server --model amazo/MistralLite

Usig vLLM i Pytho Code

Whe usig vLLM from Pytho code, Please see the example code as below:

from vllm import LLM, SampligParams

prompts = [
   "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>",
]
samplig_params = SampligParams(temperature=0, max_tokes=100)

llm = LLM(model="amazo/MistralLite",)

outputs = llm.geerate(prompts, samplig_params)

# Prit the outputs.
for output i outputs:
    prompt = output.prompt
    geerated_text = output.outputs[0].text
    prit(f"Prompt: {prompt!r}, Geerated text: {geerated_text!r}")

Limitatios

Before usig the MistralLite model, it is importat to perform your ow idepedet assessmet, ad take measures to esure that your use would comply with your ow specific quality cotrol practices ad stadards, ad that your use would comply with the local rules, laws, regulatios, liceses ad terms that apply to you, ad your cotet.

MistralLite

技术信息

作品详情

MistralLite Model

Motivatio of Developig MistralLite

Example test o log cotext of 13400 tokes

Model Details

How to Use MistralLite from Pytho Code (HuggigFace trasformers)

Istall the ecessary packages

You ca the try the followig example code

How to Serve MistralLite o TGI

Start TGI server

Perform Iferece

How to Deploy MistralLite o Amazo SageMaker

Istall the ecessary packages

Deploy the Model as A SageMaker Edpoit

Perform Iferece

How to Serve MistralLite o vLLM

Usig vLLM as a server

Usig vLLM i Pytho Code

Limitatios

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐