GPT-J 6B is a trasformer model traied usig Be Wag's Mesh Trasformer JAX. "GPT-J" refers to the class of model, while "6B" represets the umber of traiable parameters. The model cosists of 28 layers with a model dimesio of 4096, ad a feedforward dimesio of 16384. The model
dimesio is split ito 16 heads, each with a dimesio of 256. Rotary Positio Embeddig (RoPE) is applied to 64
dimesios of each head. The model is traied with a tokeizatio vocabulary of 50257, usig the same set of BPEs as
GPT-2/GPT-3. GPT-J lears a ier represetatio of the Eglish laguage that ca be used to
extract features useful for dowstream tasks. The model is best at what it was
pretraied for however, which is geeratig text from a prompt. GPT-J-6B is GPT-J-6B was traied o a Eglish-laguage oly dataset, ad is thus GPT-J-6B has ot bee fie-tued for dowstream cotexts i which
laguage models are commoly deployed, such as writig gere prose,
or commercial chatbots. This meas GPT-J-6B will The core fuctioality of GPT-J is takig a strig of text ad predictig the ext toke. While laguage models are widely used for tasks other tha this, there are a lot of ukows with this work. Whe promptig GPT-J it is importat to remember that the statistically most likely ext toke is ofte ot the toke that produces the most "accurate" text. Never deped upo GPT-J to produce factually accurate output. GPT-J was traied o the Pile, a dataset kow to cotai profaity, lewd, ad otherwise abrasive laguage. Depedig upo use case GPT-J may produce socially uacceptable text. See Sectios 5 ad 6 of the Pile paper for a more detailed aalysis of the biases i the Pile. As with all laguage models, it is hard to predict i advace how GPT-J will respod to particular prompts ad offesive cotet may occur without warig. We recommed havig a huma curate or filter the outputs before releasig them, both to cesor udesirable cotet ad to improve the quality of the results. GPT-J 6B was traied o the Pile, a large-scale curated dataset created by EleutherAI. This model was traied for 402 billio tokes over 383,500 steps o TPU v3-256 pod. It was traied as a autoregressive laguage model, usig cross-etropy loss to maximize the likelihood of predictig the ext toke correctly. Models roughly sorted by performace, or by FLOPs if ot available. To cite this model: To cite the codebase that traied this model: If you use this model, we would love to hear about it! Reach out o GitHub, Discord, or shoot Be a email. This project would ot have bee possible without compute geerously provided by Google through the
TPU Research Cloud, as well as the Cloud TPU team for providig early access to the Cloud TPU VM Alpha. Thaks to everyoe who have helped out oe way or aother (listed alphabetically):GPT-J 6B
Model Descriptio
Iteded Use ad Limitatios
Out-of-scope use
Limitatios ad Biases
示例代码
from modelscope.utils.costat import Tasks
from modelscope.pipelies import pipelie
pipe = pipelie(task=Tasks.text_geeratio, model='AI-ModelScope/gpt-j-6b', model_revisio='v1.0.1', device='cuda')
iputs = 'Oce upo a time,'
result = pipe(iputs)
prit(result)
# {'text': ["Oce upo a time, there was a girl who loved to sig. Oe day, she walked ito a studio, ad the studio's ower had a request for her.\\“I'm lookig for the ext Adele,” they said.\\At the time, there just were't that may female sigers out there, ad the oly oe that the studio ower kew was pretty well accomplished as far as beig a vocalist. At the time, she was kid of ito acoustic"]}
Traiig data
Traiig procedure
Evaluatio results
lm-evaluatio-haress
either with released
weights or with API access. Due to subtle implemetatio differeces as well as differet zero shot task framig, these
might ot be directly comparable. See this blog post for more
details.Citatio ad Related Iformatio
BibTeX etry
@misc{gpt-j,
author = {Wag, Be ad Komatsuzaki, Ara},
title = {{GPT-J-6B: A 6 Billio Parameter Autoregressive Laguage Model}},
howpublished = {\url{https://github.com/kigoflolz/mesh-trasformer-jax}},
year = 2021,
moth = May
}
@misc{mesh-trasformer-jax,
author = {Wag, Be},
title = {{Mesh-Trasformer-JAX: Model-Parallel Implemetatio of Trasformer Laguage Model with JAX}},
howpublished = {\url{https://github.com/kigoflolz/mesh-trasformer-jax}},
year = 2021,
moth = May
}
Ackowledgemets
trasformers
package.
点击空白处退出提示
评论