Questio Aswerig with Log Iput Texts
Model Name |
Test set Accuracy |
Hard subset Accuracy |
Mistral-7B-Istruct-v0.1 |
44.3% |
39.7% |
MistralLite |
64.4% |
56.2% |
|
|
|
Example test o log cotext of 13400 tokes |
|
|
- Cotext is from Amazo Aurora FAQs
- Questio: please tell me how does pgvector help with Geerative AI ad give me some examples.
- Aswer from MistralLite:
pgvector is a ope-source extesio for PostgreSQL supported by Amazo Aurora PostgreSQL-Compatible Editio.
You ca use pgvector to store, search, idex, ad query billios of embeddigs that are geerated from machie learig (ML) ad artificial itelligece (AI) models i your database, such as those from Amazo Bedrock (limited preview) or Amazo SageMaker. A vector embeddig is a umerical represetatio that represets the sematic meaig of cotet such as text, images, ad video.
With pgvector, you ca query embeddigs i your Aurora PostgreSQL database to perform efficiet sematic similarity searches of these data types, represeted as vectors, combied with other tabular data i Aurora. This eables the use of geerative AI ad other AI/ML systems for ew types of applicatios such as persoalized recommedatios based o similar text descriptios or images, cadidate match based o iterview otes, customer service ext best actio recommedatios based o successful trascripts or chat sessio dialogs, ad more.
Model Details
How to Use MistralLite from Pytho Code (HuggigFace trasformers)
Importat - For a ed-to-ed example Jupyter otebook, please refer to this lik.
Istall the ecessary packages
Requires: trasformers 4.34.0 or later, flash-att 2.3.1.post1 or later,
ad accelerate 0.23.0 or later.
pip istall trasformers==4.34.0
pip istall flash-att==2.3.1.post1 --o-build-isolatio
pip istall accelerate==0.23.0
You ca the try the followig example code
from trasformers import AutoModelForCausalLM, AutoTokeizer
import trasformers
import torch
model_id = "amazo/MistralLite"
tokeizer = AutoTokeizer.from_pretraied(model_id)
model = AutoModelForCausalLM.from_pretraied(model_id,
torch_dtype=torch.bfloat16,
use_flash_attetio_2=True,
device_map="auto",)
pipelie = trasformers.pipelie(
"text-geeratio",
model=model,
tokeizer=tokeizer,
)
prompt = "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>"
sequeces = pipelie(
prompt,
max_ew_tokes=400,
do_sample=False,
retur_full_text=False,
um_retur_sequeces=1,
eos_toke_id=tokeizer.eos_toke_id,
)
for seq i sequeces:
prit(f"{seq['geerated_text']}")
Importat - Use the prompt template below for MistralLite:
<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>
How to Serve MistralLite o TGI
Importat:
- For a ed-to-ed example Jupyter otebook usig the ative TGI cotaier, please refer to this lik.
- If the iput cotext legth is greater tha 12K tokes, it is recommeded usig a custom TGI cotaier, please refer to this lik.
Start TGI server
Use TGI versio 1.1.0 or later. The official Docker cotaier is: ghcr.io/huggigface/text-geeratio-iferece:1.1.0
Example Docker parameters:
docker ru -d --gpus all --shm-size 1g -p 443:80 -v $(pwd)/models:/data ghcr.io/huggigface/text-geeratio-iferece:1.1.0 \
--model-id amazo/MistralLite \
--max-iput-legth 16000 \
--max-total-tokes 16384 \
--max-batch-prefill-tokes 16384 \
--trust-remote-code
Perform Iferece
Example Pytho code for iferece with TGI (requires text_geeratio
0.6.1 or later):
pip istall text_geeratio==0.6.1
from text_geeratio import Cliet
SERVER_PORT = 443
SERVER_HOST = "localhost"
SERVER_URL = f"{SERVER_HOST}:{SERVER_PORT}"
tgi_cliet = Cliet(f"http://{SERVER_URL}", timeout=60)
def ivoke_tgi(prompt,
radom_seed=1,
max_ew_tokes=400,
prit_stream=True,
assist_role=True):
if (assist_role):
prompt = f"<|prompter|>{prompt}</s><|assistat|>"
output = ""
for respose i tgi_cliet.geerate_stream(
prompt,
do_sample=False,
max_ew_tokes=max_ew_tokes,
retur_full_text=False,
#temperature=Noe,
#trucate=Noe,
#seed=radom_seed,
#typical_p=0.2,
):
if hasattr(respose, "toke"):
if ot respose.toke.special:
sippet = respose.toke.text
output += sippet
if (prit_stream):
prit(sippet, ed='', flush=True)
retur output
prompt = "What are the mai challeges to support a log cotext for LLM?"
result = ivoke_tgi(prompt)
Importat - Whe usig MistralLite for iferece for the first time, it may require a brief 'warm-up' period that ca take 10s of secods. However, subsequet ifereces should be faster ad retur results i a more timely maer. This warm-up period is ormal ad should ot affect the overall performace of the system oce the iitialisatio period has bee completed.
How to Deploy MistralLite o Amazo SageMaker
Importat:
- For a ed-to-ed example Jupyter otebook usig the SageMaker built-i cotaier, please refer to this lik.
- If the iput cotext legth is greater tha 12K tokes, it is recommeded usig a custom docker cotaier, please refer to this lik.
Istall the ecessary packages
Requires: sagemaker 2.192.1 or later.
pip istall sagemaker==2.192.1
Deploy the Model as A SageMaker Edpoit
To deploy MistralLite o a SageMaker edpoit, please follow the example code as below.
import sagemaker
from sagemaker.huggigface import HuggigFaceModel, get_huggigface_llm_image_uri
import time
sagemaker_sessio = sagemaker.Sessio()
regio = sagemaker_sessio.boto_regio_ame
role = sagemaker.get_executio_role()
image_uri = get_huggigface_llm_image_uri(
backed="huggigface", # or lmi
regio=regio,
versio="1.1.0"
)
model_ame = "MistralLite-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
hub = {
'HF_MODEL_ID':'amazo/MistralLite',
'HF_TASK':'text-geeratio',
'SM_NUM_GPUS':'1',
"MAX_INPUT_LENGTH": '16000',
"MAX_TOTAL_TOKENS": '16384',
"MAX_BATCH_PREFILL_TOKENS": '16384',
"MAX_BATCH_TOTAL_TOKENS": '16384',
}
model = HuggigFaceModel(
ame=model_ame,
ev=hub,
role=role,
image_uri=image_uri
)
predictor = model.deploy(
iitial_istace_cout=1,
istace_type="ml.g5.2xlarge",
edpoit_ame=model_ame,
)
Perform Iferece
To call the edpoit, please follow the example code as below:
iput_data = {
"iputs": "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>",
"parameters": {
"do_sample": False,
"max_ew_tokes": 400,
"retur_full_text": False,
#"typical_p": 0.2,
#"temperature":Noe,
#"trucate":Noe,
#"seed": 1,
}
}
result = predictor.predict(iput_data)[0]["geerated_text"]
prit(result)
or via boto3, ad the example code is show as below:
import boto3
import jso
def call_edpoit(cliet, prompt, edpoit_ame, paramters):
cliet = boto3.cliet("sagemaker-rutime")
payload = {"iputs": prompt,
"parameters": parameters}
respose = cliet.ivoke_edpoit(EdpoitName=edpoit_ame,
Body=jso.dumps(payload),
CotetType="applicatio/jso")
output = jso.loads(respose["Body"].read().decode())
result = output[0]["geerated_text"]
retur result
cliet = boto3.cliet("sagemaker-rutime")
parameters = {
"do_sample": False,
"max_ew_tokes": 400,
"retur_full_text": False,
#"typical_p": 0.2,
#"temperature":Noe,
#"trucate":Noe,
#"seed": 1,
}
edpoit_ame = predictor.edpoit_ame
prompt = "<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>"
result = call_edpoit(cliet, prompt, edpoit_ame, parameters)
prit(result)
How to Serve MistralLite o vLLM
Documetatio o istallig ad usig vLLM ca be foud here.
Importat - For a ed-to-ed example Jupyter otebook, please refer to this lik.
Usig vLLM as a server
Whe usig vLLM as a server, pass the --model amazo/MistralLite parameter, for example:
pytho3 -m vllm.etrypoits.api_server --model amazo/MistralLite
Usig vLLM i Pytho Code
Whe usig vLLM from Pytho code, Please see the example code as below:
from vllm import LLM, SampligParams
prompts = [
"<|prompter|>What are the mai challeges to support a log cotext for LLM?</s><|assistat|>",
]
samplig_params = SampligParams(temperature=0, max_tokes=100)
llm = LLM(model="amazo/MistralLite",)
outputs = llm.geerate(prompts, samplig_params)
# Prit the outputs.
for output i outputs:
prompt = output.prompt
geerated_text = output.outputs[0].text
prit(f"Prompt: {prompt!r}, Geerated text: {geerated_text!r}")
Limitatios
Before usig the MistralLite model, it is importat to perform your ow idepedet assessmet, ad take measures to esure that your use would comply with your ow specific quality cotrol practices ad stadards, ad that your use would comply with the local rules, laws, regulatios, liceses ad terms that apply to you, ad your cotet.
评论