Solutional new logo (1)

GPT-1

Share This
« Back to Glossary Index

GPT-1, also known as Generative Pre-training Transformer 1, is a machine learning[1] model designed for tasks related to understanding and generating human language. Developed by OpenAI[4], GPT-1 utilizes a decoder-only transformer structure with 12 layers. This model is equipped with twelve masked self-attention heads, each having 64-dimensional states. To optimize its performance, GPT-1 uses the Adam optimization algorithm[2] with a learning rate that increases linearly. The model boasts a staggering 117 million parameters, demonstrating its complexity. Despite its sophistication, its architecture undergoes minimal changes when applied to various tasks. It’s particularly noted for its performance in natural language inference[3] tasks, question answering, commonsense reasoning, and semantic similarity tasks. A notable resource for this model is the BookCorpus dataset, selected for its long passages that aid in handling long-range information.

Terms definitions
1. machine learning. Machine learning, a term coined by Arthur Samuel in 1959, is a field of study that originated from the pursuit of artificial intelligence. It employs techniques that allow computers to improve their performance over time through experience. This learning process often mimics the human cognitive process. Machine learning applies to various areas such as natural language processing, computer vision, and speech recognition. It also finds use in practical sectors like agriculture, medicine, and business for predictive analytics. Theoretical frameworks such as the Probably Approximately Correct learning and concepts like data mining and mathematical optimization form the foundation of machine learning. Specialized techniques include supervised and unsupervised learning, reinforcement learning, and dimensionality reduction, among others.
2. algorithm. An algorithm is a well-defined sequence of instructions or rules that provides a solution to a specific problem or task. Originating from ancient civilizations, algorithms have evolved through centuries and are now integral to modern computing. They are designed using techniques such as divide-and-conquer and are evaluated for efficiency using measures like big O notation. Algorithms can be represented in various forms like pseudocode, flowcharts, or programming languages. They are executed by translating them into a language that computers can understand, with the speed of execution dependent on the instruction set used. Algorithms can be classified based on their implementation or design paradigm, and their efficiency can significantly impact processing time. Understanding and using algorithms effectively is crucial in fields like computer science and artificial intelligence.
GPT-1 (Wikipedia)

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a generative pre-trained transformer.

Generative Pre-trained Transformer 1 (GPT-1)
Original author(s)OpenAI
Initial releaseJune 2018; 5 years ago (June 2018)
Repository
SuccessorGPT-2
Type
LicenseMIT
Websiteopenai.com/blog/language-unsupervised/ Edit this on Wikidata
Original GPT architecture

Up to that point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use of datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models; many languages (such as Swahili or Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building. In contrast, a GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.

The use of a transformer architecture, as opposed to previous techniques involving attention-augmented RNNs, provided GPT models with a more structured memory than could be achieved through recurrent mechanisms; this resulted in "robust transfer performance across diverse tasks".

« Back to Glossary Index
en_USEN
Scroll to Top