Generating text using Markov Chains and Recurrent Neural Network
Training my computer to generate meaningful texts for me? Sounds miraculous. In fact it is just a probability game. For the Markov Chains, new texts are generated based on the probability from what the character or word is right before it. For Recurrent Neural Network, new texts are generated based on the whole history of the source data. The source data is crucial in both methods.
This project I chose to use Shakespeare’s Hamlet as my source text again to train my computer with the RNN methods to write in Shakespeare’s grammar and style. Since this is a rather large amount of data, I trained it with epochs of 10 which took my MacBook Air about 30 mins.
Markov Chains model exercise:
I used the transcriptions of the 3 presidential debates from 2016 between Trump and Clinton as my source text.
Character level/order 9:
Word level/order 2:
From the results, the word level/order 2 model generates text that makes more sense.
Some sample results this model generated:
['The participants tonight are on there, we’ll help them get off. But I take responsibility for that.', 'But increasingly, we are looking for every sanction against Iran when I go around, despite the tax rate. They don’t give you the answers to that, because she’s got no business ability. We need comprehensive background checks, and we think of NATO? And you can check it out.', 'CLINTON: That is your business, then I think cyber security, cyber warfare will be off and the bureaucratic red tape, because they can’t bring their money back into their country.', 'But what did we learn about what has been the policy of the laws of the election?', 'I will invite you to defend that. And, Mr. Trump, you have stiffed over the world, especially China. They’re the best, the best plants. With the United States has much greater capacity. And we have to get a casino license, and they cost two and three and four times what they’re doing.']
RNN model exercise:
I used the text I transcribed from a video for my first assignment as the source data to train my model. With a txt file of *** characters took less than 3 minutes.
Github link to code: