I was writing a Markov text generator yesterday, and happened to have a bunch of corpora made up of Less Wrong comments lying around from a previous toy project. This quickly resulted in the Automated Gwern Comment Generator, and then the Automated sixes_and_sevens Comment Generator.
Anyone who’s ever messed around with text generated from simple Markov processes (or taken the time to read the content of some of their spam messages) will be familiar with the hilarious, and sometimes strangely lucid, garbage they come out with. Here is AutoGwern:
Why is psychopathy not strongly confident. In a slippery slope down to affirm or something. Several common misunderstandings are superior, but I copied from StackExchange running in a new trilemma on the police, there is a very large effect sizes. The DNA studies at the clandestine level.
Plausible yet pseudonymously provided a measure of the role of SATs is pretty darn awful advice, and how they plan to use data from markets that shut down in self-experimentation, then that. The repeating logic. It wouldn’t surprise when two women sent me unsolicited PM’s asking if we drop the <1m bitcoins Satoshi Nakamoto or La Griffe du Lion. That just means that it.
Thanks.
(I should point out that I’m picking on Gwern because his contribution to LW means I have an especially massive text file with his name on it.)
Here’s some AutoMe:
I’ll be reading this. Then the restless spirit of Paul Graham sat on my body in some obscure location, Nigel Hawthorne was the hypothetical was that sword-privilege is a response, though in the idea in my life so cognitively exhausting relative to your calibrated sitters. Which should they pick? This seems credible, but I’d be a TV and film from a welfare system.
It’s your lucky day. I can start hearing a bicycle. There’s even in a position on all users’ self-esteem, adjusting it for various other OKCupid users to encourage them, There are a small box of which routinely deals with his Dangerously Ambitious Startup Ideas essay, about how well-trained in deconstructivism.
I’ve actually a lecture by default?
This is perhaps an unfair question, which you’ll only a sixth of the QS-types take in a group to help setting off to the music, OK? A little less fat. It isn’t obvious, but I’m actually kind of these selections of the above, Yvain could also happen when I feel like my brain compels me to be the money pump.
I have become weirdly fascinated by these. Although they ditch any comprehensible meaning, they preserve distinctive tics in writing style and vocabulary, and in doing so, they preserve elements of tone and sentiment. Without any meaningful content to try and parse, it’s a lot easier to observe what the written style feels like.
On a less introspective note, it’s also interesting to note how dumping out ~300 words maintains characteristics of Less Wrong posts, like edits at the end, or spontaneous patches of Markov-generated rot13.
Also Yvain could happen when I feel like my brain compels me to be the money pump.
I did this a while back using cobe: dumped in gwern.net, all my IRC logs, LW comments etc. It seems that the key is to set the depth fairly high to get more meaningful language out, but then it works eerily well: with some curation, it really did sound like me! (As long as you were reading quickly and weren’t trying to extract real meaning out of it. I wonder if a recurrent neural network could do even better?) You can see some at http://pastebin.com/tPGL300J—I particularly like
gwern> haha oh wow, for a moment i thought it wasn’t working. and then I actually read the entire line slowly going ’wait a minute...‘ gwern> = 15:35 <@gwern> (for all that I read fiction the right way to think of asking for a different purpose… the company did last year; but in fact was designed to protect MO5(G) from the “Alternate Reality” scene to Shinji saying “I don’t think reader is central. I did avoid the Internet, in the Japanese media industries. One doesn’t expect the sequences of marcello, but surely there were real phenomenon going on? Well, either you have a thesis? take that and multiply by the likelihood of various events in a set of stubby barriers (baffles) sticking up from the old-school method of people self reporting how much they owe each other. The first line is solid: the rhythm of travel still in his hand as the very poor reverting to low-tech, low productivity craft production of goods the wealthy can manufacture efficiently. One way in which a missing item has to be punctually at a certain time period (typically thirty seconds) will be agricultural. Drones can be used only once for any such study, using rhesus monkeys, was [“Effects of caffeine on hippocampal neurogenesis and function’, Han et al, 2004), very little progress has been made to alter the design of Mark.06 is a bit secluded, but is at least 1 was a complete waste of time’ A> that is wonderfully coherent B> That’s more like it
gwern> ‘<@gwern> kurzweil is a flake...’ <-- it speaks the truth!
gwern> ‘= 15:32 < gwern> turns out I’m not a loaf of white bread’
gwern> ‘= 06:05 < gwern> crime pays’
gwern> ‘The atheist church fits in a single session on the Millennium Falcon.’ C> what. gwern> well, it’s just Han. everyone else believes in the Force or whatever Wookiees believe in
gwern> ’17:08 < gwern> ’A lot of people are working through SICP all the time (FLOSS programmers may lose interest, another 17 years will see a sequence of output to get the impression you know it’s up to something,” “It just seems horribly inappropriate and wrong, but… ‘Medication’ here is ‘kusuri.’”] In order to ensure the birth of Reitaisai SP, an additional staff of 94, “over” 1,500 active archers in South Korea alone do not indicate the underlying conditions that will extend lifespan because it is contrary to the general corruption they are also unceremoniously written out of the picture.” The remaster boasts sharper, jitter-free visuals with intensified colors; the enhanced audio is palpable as soon as unusually smart people. The best covered seems to be dead.″ gwern> damn, Reitaisai SP is going to be awesome—over 1500 archers are coming from south korea? B> From there alone! gwern> ‘their arrows will blot out the sun!’ gwern> ‘good. then we will cosplay in the shade.’ A> SMARTAAAA gwern> LOLIS! HOLLLDDDDD!!!!!!
gwern> ‘the video is killing the ref, I’m getting 1993/1994 addresses for that one episode and then went through her budget with them, if I set ‘div#content { line-height: 100%; }‘, I still see the white snakes’ <-- CSS would be so much easier to program if we could just get rid of the damn white snakes in the browsers!
gwern> ‘gwern> puritan: supposedly my .htaccess was put in charge of directing an episode for the first big step: a switch between mother and friend is an easy to read it a lot as they run a huge trade surplus, and arises from unmeasured benefits, such as transfer effects that scaled with training task gains. Again, such an experiment is just a rose falling apart. The spacing between the items, there is Pentuple N-back (PNB) which was successfully read and converted to CSV by mdb-tools. The entries look like this (as before), the alternative system, but he must have misplaced it for a lifetime by homosexual acts.’ <-- I realize I sound like a homophobe here, but I swear, the markov chains are quoting me out of context! C> pentuple n-back converted to CSV. how does that work exactly? gwern> C: oh, you just treat the black spaces as the delimiter and parse as usual D> “such an experiment is just a rose falling apart” :)
gwern> ‘= 00:35 <@gwern> second episode of EVANGELION: he keeps undermining himself! there must not be very practical without them. but even if, say, NERV base. (Faculty of Medicine, Topol remembers being in the company to fund a significant bandwidth burden. It had become apparent that the producers who count-the marginal ones-are not especially likely to dislike strongly, as explained above. Small variations in observed market prices are thus less likely to find a hafu character, and, my ultimate goal, use technology to alleviate global hunger, malnutrition, and improve the soil.” “Can we order them on the index, and we would say: Well, one would reach the right answer :) it’s not even goign to be no less high. Over 2012, the result is that the long run they would get the right to ride on me and choke me to meet his gaze. “Can’t do the study to try and reach out with their thumbs. That’s very rare. Rather, it publishes a program to ready the other chapters required much more X-ray energy absorbed in the ’90s in part because legionary soldiers who could usually read, write and swim underwater!’ <-- that’s why Rome conquered the known world: fucking underwater legions
I was writing a Markov text generator yesterday, and happened to have a bunch of corpora made up of Less Wrong comments lying around from a previous toy project. This quickly resulted in the Automated Gwern Comment Generator, and then the Automated sixes_and_sevens Comment Generator.
Anyone who’s ever messed around with text generated from simple Markov processes (or taken the time to read the content of some of their spam messages) will be familiar with the hilarious, and sometimes strangely lucid, garbage they come out with. Here is AutoGwern:
(I should point out that I’m picking on Gwern because his contribution to LW means I have an especially massive text file with his name on it.)
Here’s some AutoMe:
I have become weirdly fascinated by these. Although they ditch any comprehensible meaning, they preserve distinctive tics in writing style and vocabulary, and in doing so, they preserve elements of tone and sentiment. Without any meaningful content to try and parse, it’s a lot easier to observe what the written style feels like.
On a less introspective note, it’s also interesting to note how dumping out ~300 words maintains characteristics of Less Wrong posts, like edits at the end, or spontaneous patches of Markov-generated rot13.
Also Yvain could happen when I feel like my brain compels me to be the money pump.
Well that’s a fanfiction I haven’t read before.
This should be generated for every user on their “overview” page. :D
I did this a while back using cobe: dumped in
gwern.net
, all my IRC logs, LW comments etc. It seems that the key is to set the depth fairly high to get more meaningful language out, but then it works eerily well: with some curation, it really did sound like me! (As long as you were reading quickly and weren’t trying to extract real meaning out of it. I wonder if a recurrent neural network could do even better?) You can see some at http://pastebin.com/tPGL300J—I particularly like