Monday, March 30, 2015

Two weeks ago, to commemorate the anniversary of the relaunch of FiveThirtyEight, I looked at what the site had published and made Markov bots to replicate several of our writers.
Here’s how I defined a Markov bot:
A basic Markov bot is a program that can take a random walk through how people write. If I make a dictionary of how I write, for instance — taking everything I wrote in a year, breaking it down into coherent fragments, and then breaking those fragments down to individual three-word phrases used to construct a sentence — I can use that dictionary to build a bot to replicate my speech.
The result of this random walk is typically a mildly coherent, often funny sentence or phrase that imitates the bones of how I speak. I downloaded every word that several colleagues and I wrote so we can replace all this expensive talent with simple python scripts.
After generating a few sample paragraphs for that article, I decided to make a Twitter bot that tweets faux-FiveThirtyEight sentences for the foreseeable future. That Twitter bot can be found at @538bot. Here’s a sampling:
It’s still a work in progress, but it’s functioning. Not bad for a weekend of work.
I’m going to get into a more detailed step-by-step process about how I made the bot later — probably on some combination of Tumblrand GitHub — but here’s the short of it: I scraped the content of the site using the Chrome browser add-on Kimono.
Then I followed the outline from this Stack Overflow post to make the Markov Bot output in Python. (Special thanks to Olivia Walch for being the “no really, it’s totally supposed to do that” voice during that process.) Then I got the Twitter bot running on my Raspberry Piusing Twython.
This was a fun little experiment. So go ahead and follow @538bot to see what exactly the fox said. Mostly, I’m just happy that this bot will have a role model when “Avengers: Age of Ultron” comes out.

COMMENTS Add Comment

FILED UNDER 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.