GPT-2, short for Generative Pretrained Transformer 2, is a sophisticated inteligência artificial[1] model designed for natural language processing tasks. Developed and introduced by OpenAI[2] in February 2019, it’s notable for its ability to generate diverse types of text, with capabilities extending to answering questions and autocompleting code. GPT-2 was trained on a large corpus of online text, known as WebText, and is powered by a whopping 1.5 billion parameters. While its deployment can be resource-intensive, it has been used in various unique applications, including text-based adventure games and subreddit simulations. Despite initial fears of misuse, the full GPT-2 model was released in November 2019 after concerns didn’t materialize. However, a smaller model, DistilGPT2, was created to alleviate resource issues. Looking forward, the breakthroughs with GPT-2 pave the way for future advancements in AI text generation.
Generative Pre-trained Transformer 2 (GPT-2) is a grande modelo linguístico by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.
Original author(s) | OpenAI |
---|---|
Lançamento inicial | 14 February 2019 |
Repository | https://github.com/openai/gpt-2 |
Antecessor | GPT-1 |
Successor | GPT-3 |
Tipo | |
Licença | MIT |
Sítio Web | openai |
GPT-2 was created as a "direct scale-up" of GPT-1 with a ten-fold increase in both its parameter count and the size of its training dataset. It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence, which enabled it to translate texts, answer questions about a topic from a text, resumir passages from a larger text, and generate text output on a level sometimes indistinguishable from that of humans, however it could become repetitive or nonsensical when generating long passages. It was superseded by GPT-3 and GPT-4 models, which are not open source anymore.
GPT-2 has, like its predecessor GPT-1 and its successors GPT-3 and GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model, which uses attention instead of older recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. This model allows for greatly increased parallelization, and outperforms previous benchmarks for RNN/CNN/LSTM-based models.