Model Description

GPT was released together with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al at OpenAI. It’s a combination of two ideas: Transformer model and large scale unsupervised pre-training.

Here are three models based on OpenAI’s pre-trained weights along with the associated Tokenizer. It includes:

  • openAIGPTModel: raw OpenAI GPT Transformer model (fully pre-trained)
  • openAIGPTLMHeadModel: OpenAI GPT Transformer with the tied language modeling head on top (fully pre-trained)
  • openAIGPTDoubleHeadsModel: OpenAI GPT Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT Transformer is pre-trained, the multiple choice classification head is only initialized and has to be trained)

Requirements

Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed.

pip install tqdm boto3 requests regex ftfy spacy

Example

Here is an example on how to tokenize the text with openAIGPTTokenizer, and then get the hidden states computed by openAIGPTModel or predict the next token using openAIGPTLMHeadModel. Finally, we showcase how to use openAIGPTDoubleHeadsModel to combine the language modeling head and a multiple choice classification head.

### First, tokenize the input
#############################
import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTTokenizer', 'openai-gpt')

#  Prepare tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])


### Get the hidden states computed by `openAIGPTModel`
######################################################
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTModel', 'openai-gpt')
model.eval()

# Compute hidden states features for each layer
with torch.no_grad():
	hidden_states = model(tokens_tensor)


### Predict the next token using `openAIGPTLMHeadModel`
#######################################################
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTLMHeadModel', 'openai-gpt')
lm_model.eval()

# Predict all tokens
with torch.no_grad():
	predictions = lm_model(tokens_tensor)

# Get the last predicted token
predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == '.</w>'


### Language modeling and multiple choice classification `openAIGPTDoubleHeadsModel`
####################################################################################
double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTDoubleHeadsModel', 'openai-gpt')
double_head_model.eval() # Set the model to train mode if used for training

text_bis = "Who was Jim Henson ? Jim Henson was a mysterious young man"
tokenized_text_bis = tokenizer.tokenize(text_bis)
indexed_tokens_bis = tokenizer.convert_tokens_to_ids(tokenized_text_bis)
tokens_tensor = torch.tensor([[indexed_tokens, indexed_tokens_bis]])
mc_token_ids = torch.LongTensor([[len(tokenized_text)-1, len(tokenized_text_bis)-1]])

with torch.no_grad():
    lm_logits, multiple_choice_logits = double_head_model(tokens_tensor, mc_token_ids)

Requirement

The model only support python3.

Resources