这是一个本地托管版本的GitHubCopilot。它在英伟达的Triton推理服务器中使用了SalesForceCodeGen模型和FasterTransformer后端。
前提条件Dockerdocker-compose>=1.28一台计算能力大于6.0的英伟达GPU,以及足够的VRAM来运行你想要的模型nvidia-dockercurl和zstd,用于下载和解包模型Copilot插件你可以配置官方VSCodeCopilot插件来使用你的本地服务器。只要编辑你的settings.json来添加。
"github.copilot.advanced":{"debug.overrideEngine":"codegen","debug.testOverrideProxyUrl":"https://localhost:5000","debug.overrideProxyUrl":"https://localhost:5000"}设置运行设置脚本以选择要使用的模型。这将从Huggingface下载模型,然后将其转换为与FasterTransformer一起使用。
$./setup.shModelsavailable:[1]codegen-350M-mono(2GBtotalVRAMrequired;Python-only)[2]codegen-350M-multi(2GBtotalVRAMrequired;multi-language)[3]codegen-2B-mono(7GBtotalVRAMrequired;Python-only)[4]codegen-2B-multi(7GBtotalVRAMrequired;multi-language)[5]codegen-6B-mono(13GBtotalVRAMrequired;Python-only)[6]codegen-6B-multi(13GBtotalVRAMrequired;multi-language)[7]codegen-16B-mono(32GBtotalVRAMrequired;Python-only)[8]codegen-16B-multi(32GBtotalVRAMrequired;multi-language)Enteryourchoice[6]:2EnternumberofGPUs[1]:1Wheredoyouwanttosavethemodel[/home/moyix/git/fauxpilot/models]?/fastdata/mymodelsDownloadingandconvertingthemodel,thiswilltakeawhile...Convertingmodelcodegen-350M-multiwith1GPUsLoadingCodeGenmodelDownloadingconfig.json:100%|██████████|996/996[00:00<00:00,1.25MB/s]Downloadingpytorch_model.bin:100%|██████████|760M/760M[00:11<00:00,68.3MB/s]CreatingemptyGPTJmodelConverting...Conversioncomplete.Savingmodeltocodegen-350M-multi-hf...===============Argument===============saved_dir:/models/codegen-350M-multi-1gpu/fastertransformer/1in_file:codegen-350M-multi-hftrained_gpu_num:1infer_gpu_num:1processes:4weight_data_type:fp32========================================transformer.wte.weighttransformer.h.0.ln_1.weight[...moreconversionoutputtrimmed...]transformer.ln_f.weighttransformer.ln_f.biaslm_head.weightlm_head.biasDone!Nowrun./launch.shtostarttheFauxPilotserver.
评论