January 31, 2023

Can Artificial Intelligence Write Poetry?

In the past few years, AI systems created automated poets that emulate human creativity in a way that ranges from bizarre and ridiculous to disturbingly accurate. For example, Microsoft’s “empathetic” AI system called Xiaoice created millions of poems in response to pictures submitted by users. Deep-speare created by DeepAI made headline news when it taught itself to write Shakespearean sonnets. And the most impressive NLP model of all time, OpenAI’s GPT-3, has composed a poem or two.

But no matter how sophisticated, can AI write poetry that resonates with human readers? We jumped on the quest to find out using GPT-2 on the corpus of poetry in the Polish language. But before we share the details and results, let’s quickly recap why AI is writing poetry in the first place.

What is NLP, and how does it work?

Natural Language Processing (NLP) is a branch of computer science and artificial intelligence that aims to give computers the ability to process human languages as text or voice data and to understand their entire meaning. That’s right, even the writer’s intent and sentiment.

NLP combines the field of computational linguistics (i.e., rule-based modeling of human language) with statistical, machine learning, and deep learning technologies. This mix of approaches enables computers to translate text from one language to another, respond to spoken or written commands, and summarize large amounts of text rapidly, even in real time.

There’s a good chance you’ve already interacted with NLP through voice-operated GPS systems, voice assistants, chatbots, and other consumer products. NLP has also proven its worth in enterprise solutions that streamline business operations, boost employee productivity, and accelerate mission-critical business processes.

Why has NLP become so popular?

The most significant advantage of Natural Language Processing is that it enables organizations to automate tasks where customers, users, or employees need to ask questions. For example, a customer can ask an NLP model through an interface. Using training and inference, the NLP system should be able to answer those questions on behalf of the team.

By understanding human language, NLP can answer fundamental lower-level questions and provide immediate answers; this allows customer service representatives to focus on high-level tasks and provide more value to customers by dedicating their time to more complex inquiries or issues.

Key NLP use cases

NLP in voice assistants and chatbots

Voice assistants such as Apple’s Siri and Amazon’s Alexa use voice recognition and natural language generation to respond appropriately to voice commands and typed text entries. Chatbots perform similarly but in response to text-based requests. The best of these also learn from context data about human requests and use them to provide even better answers or options over time.

NLP in language translation

You’ve probably used Google Translate. Did you know it’s an NLP technology? Efficient machine translation involves more than simply replacing words in one language with those of another. The tool needs to capture the meaning and tone of the input language and translate it to a similar text in the output language. Machine translation tools are making significant progress in terms of accuracy.

Spam detection with NLP

One of the most common uses of NLP is to detect spam and phishing. Many companies use spam detection to filter out unwanted emails via NLP tools that scan the text of an email for specific patterns, such as overuse of financial terms or characteristic grammar errors. It can also include things like threatening language, inappropriate urgency, misspelled company names, etc.

Sentiment analysis with NLP

Also called opinion mining, this type of tool analyzes language in social media posts, responses, and reviews to identify attitudes and emotions in response to services, products, promotions, and events. Organizations can use this information when designing their products and marketing campaigns.


Text summarization uses Natural Language Processing techniques to digest huge volumes of digital text, creating summaries for indexes, research databases, or readers who don’t have time to read the full text. The best summarization applications take advantage of semantic reasoning and Natural Language Generation (NLG) to provide helpful context and conclusions with summaries.

What is Generative Pre-trained Transformer 3 (GPT-3)?

Before GPT-3, there was GPT-2 – so let’s start our journey here.

In February 2019, Open AI introduced GPT-2, a large transformer-based language model with 1.5 billion parameters and 10GB of memory, trained on 8 million web pages. The diversity of the dataset causes this simple objective to contain naturally occurring demonstrations of tasks across various domains. GPT-2 outperformed strong baselines and previous state-of-the-art results on several challenging tasks.

GPT-2 could generate conditional synthetic text samples of unprecedented quality. Instead of using data from a specific domain (like Wikipedia or news articles), it learned to generate text independently.

Then came GPT-3. Released in 2020, GPT-3 is an autoregressive language model developed based on a neural network with 175 billion synapses. It’s capable of composing sentences by itself using minimal specifications. The generated texts are so well crafted that readers cannot distinguish them from ones written by humans.

Since its launch in 2020, GPT-3 has been used by more than 300 apps to create an average of 4.5 million words daily.

NLP case study: Writing poetry with AI

To check the performance of the GPT-2 model, we decided to train it on a corpus consisting of Polish poems as a side project.


Create a GPT-2-based project that could generate poems in the Polish language.


We used Google Colab for training, and our base material was poems downloaded from using web scraping methods.

To generate poems in different styles, it was necessary to train a separate model for separate styles, i.e., a model for Young Poland and a model for the interwar period.

We selected Young Poland and the interwar period due to the nature of the texts created in this era, such as clear rhymes. Models for other eras were also trained – for example, Romantic poetry. During the pre-training, the model was not trained on texts from the 16th century – and once it started training on texts from the Renaissance, it was exposed to new words. When asked to create poems in the Romantic style, it generated words that don’t exist at all – for example, non-existent archaisms based on existing words.

The poems were divided into stanzas. An alternative was division into lines, but the model trained in this way could not generate a semantically coherent text.

Each stanza was tokenized (converted to a numerical form) using a tokenizer previously taught to represent characters in Polish. The length of the sequences created from the stanzas was a maximum of 512 tokens (limitation related to the architecture of the model and the trade-off resulting from it), auxiliary tokens and (beginning of sentence and end of a sentence) were added to each sequence, signaling beginning and end of a sentence to the model.

The input data prepared that way was used in the process of training the GPT2 model. Note that the model had already been pre-trained, i.e., it previously learned language representations on many Polish texts. You can find the pre-trained model under this link.


We created a service for poem generation using trained models; you can see how it works here:

Can ChatGPT write poetry?

We did some testing and saw that chatGPT tends to generate similar poems every time we prompt it.

Using the prompt of generating a poem in the style of Young Poland or the Interwar period, I generated different poems – here are samples:

Compare the above ChatGPT results with ours:

Finding 1:

As you can see, the ChatGPT poems differ very little from each other – whereas our model generates something different almost every time.

This likely results from the fact that chatGPT was not trained strictly for generating poems, but for conversations. So the model converses well, but wasn’t necessarily trained on a large base of verses – hence the repetition.

Finding 2:

It’s clear that the poems generated by ChatGPT are more consistent, while ones generated by our tool tend to be pretty chaotic. Oftentimes, one sentence has no connection to the following one.

ChatGPT is a conversational model, so it aims to maintain the context of the conversation and logical coherence well. Our model, on the other hand, was trained on a small set of Polish poems, so it’s more “limited” in this respect.

In addition, the ChatGPT architecture is based on GPT-3 which is definitely better – it has more parameters affecting the learning process than GPT-2 on which we based our model. The amount of data on which GPT-3 was trained is also much larger than the volume used to train the Polish GPT-2 model that we used.

Wrap up

With many startups focusing on natural language processing (NLP) and machine translation, it’s safe to say that we’re on the brink of an explosion in AI-based language software. GPT-3 already powers hundreds of bots, apps, corporate blogs, social media feeds, and content farms. AI creates billions of words of functional and grammatically correct copy per day.

OpenAI recently shed light on a new version of the model called GPT-4, which will be 500 times more powerful and able to sort through even vaster amounts of data. As systems get bigger, they get better at pruning mistakes that lead to incorrect sentences and repeating strategies that lead to good ones. GPT-4 might shock us with an even greater ability to find just the right words and arrange them into verses that sound human-made.

January 31, 2023