{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# For tips on running notebooks in Google Colab, see\n",
"# https://pytorch.org/tutorials/beginner/colab\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NLP From Scratch: Generating Names with a Character-Level RNN\n",
"=============================================================\n",
"\n",
"**Author**: [Sean Robertson](https://github.com/spro)\n",
"\n",
"This is our second of three tutorials on \\\"NLP From Scratch\\\". In the\n",
"[first\n",
"tutorial](/tutorials/intermediate/char_rnn_classification_tutorial) we\n",
"used a RNN to classify names into their language of origin. This time\n",
"we\\'ll turn around and generate names from languages.\n",
"\n",
"``` {.sourceCode .sh}\n",
"> python sample.py Russian RUS\n",
"Rovakov\n",
"Uantov\n",
"Shavakov\n",
"\n",
"> python sample.py German GER\n",
"Gerren\n",
"Ereng\n",
"Rosher\n",
"\n",
"> python sample.py Spanish SPA\n",
"Salla\n",
"Parer\n",
"Allan\n",
"\n",
"> python sample.py Chinese CHI\n",
"Chan\n",
"Hang\n",
"Iun\n",
"```\n",
"\n",
"We are still hand-crafting a small RNN with a few linear layers. The big\n",
"difference is instead of predicting a category after reading in all the\n",
"letters of a name, we input a category and output one letter at a time.\n",
"Recurrently predicting characters to form language (this could also be\n",
"done with words or other higher order constructs) is often referred to\n",
"as a \\\"language model\\\".\n",
"\n",
"**Recommended Reading:**\n",
"\n",
"I assume you have at least installed PyTorch, know Python, and\n",
"understand Tensors:\n",
"\n",
"-
Download the data fromhereand extract it to the current directory.
\n", "Rather than having to give it a starting letter, anotherstrategy would have been to include a \"start of string\" token intraining and have the network choose its own starting letter.
\n", "