I think more helpful people than unhelpful people come here. I remember a friend in grad school who had someone publish an algo he had discovered in the journal issue 1 before his publication, halfway across the world. I think it’s kind of like an avalanche, there is some sense to being quiet until you know enough to have a reasonable estimate of the impact of your action. As a rule though, I’d rather see ideas traded here than behind DARPA firewalls.
keefe
It was a simple bug, fix is committed and a pull request is in, I’ll send an email out now to get this into production.
I would start with something like reuters API, http://wordnet.princeton.edu/ and some research on these guys http://pdos.csail.mit.edu/scigen/ this is a fairly well studied problem by spammers, so I’d also work there
I’m pretty familiar with the codebase though I transitioned to ebay before getting too much done on it, send me an email if you want some feedback, I have more free time these days looking to contribute to open source for long term strategic reasons
play poker?
For you, posting about the fact that you decided not to post it reinforces the idea that you are important enough to have an impact with a blog post, which is more likely to reinforce the most common bias of intelligent people.
I think that you are you are on a solid research path here. I think you have reached the bounds of business oriented software and it’s time to look into something like apache mahout or RDF. Decision tree implementations are available all over, just find a data structure and share them and run inference engines like owlim or pellet and see what you can see.
RDF is a good interim solution because you can start encoding things as structured data. I have some JSON->RDF stuff for inference if you get to that point.
Here is one way to represent these graphs as RDF.
Each edge becomes an edge to a blank node, that blank node has the label, arrival probability and could link to evidence supporting. Representing weighted graphs in RDF is fairly well studied.
The question is, what is your net goal of this from a computational artifact point of view?
the bullet points...
neuroscience research, particularly related to neuroeconomics
rewriting codebase from 10 projects by breaking down into about 30 smaller more easily tested components
reviewing and automating infrastructure selections (java,eclipse,jquery,couchdb,postgres,bash,ubuntu lts, maven, git, custom code for lots of stuff, apache for various little things notably hadoop and mahout) deciding on feature subsets for internal use, trusted group use, and launch trajectory in 2012
some work on money and such annoying tasks
adjusting to being engaged to my cofounder and out of cali permanently
increasing online presence
...this is all changes that benefit many different wordlines, need to select how to go public on various things
what about your infrastructure required downtime due to laptop failure? a VPS or dropbox or gmail file system etc can meet security and uptime requirements one suggestion is that you could install ubuntu to boot off of a memory stick, store your files in a truecrypt volume and autobackup that encrypted file to various places
hmmm I can confirm this both here and on a local copy of the codebase, I’ll have a look and make sure Wes knows
I spent a fair amount of time in martial arts and have a similar attitude toward generalization of kata/form. This idea is standing behind my consistent emphasis on the benefits of coding (particularly TDD) for this community. It builds thought patterns that are useful for tasks that computers typically perform better.
http://www.kdnuggets.com/ practical, well curated machine learning from jobs to datasets to articles
torrentz.eu—indexes various torrent sites
http://mvnrepository.com/ - search for includes on maven sites
http://www.crunchbase.com/ evaluating startups etc
slickdeals is dope, that is for sure.
I think high level generalizations are found in aphorisms and teaching stories from all around the world. They can sometimes be shorthand for a whole story, for example I often remind myself not to eat my money referencing this story:
Mulla Nasrudin, as everyone knows, comes from a country where fruit is fruit and meat is meat, and curry is never eaten. One week he was plodding along a dusty Indian road, having newly descended from the high mountains of Kafiristan, when a great thirst overtook him. “Soon,” he said to himself, “I must come across somewhere that good fruit is to be had.” No sooner were the words formed in his brain than he rounded a corner and saw sitting in the shade of a tree a benevolent-looking man with a basket of fruit in front of him. Piled high in the basket were huge, shiny red fruits. “This is what I need,” said Nasrudin. Taking two tiny coppers from the knot at the end of his turban, he handed them to the fruit-seller. Without a word, the man handed him the whole basket, for this kind of fruit is cheap in India, and people usually buy it in smaller amounts. Nasrudin sat down in the place vacated by the fruiterer and started to much the fruits. Within a few seconds, his mouth was burning. Tears streamed down his cheeks; fire was in his throat. The Mulla went on eating. An hour or two passed, and then an Afghan hillman came past. Nasrudin hailed him, “Brother, these infidel fruits must come from the very mouth of Sheitan!” “Fool!” said the hillman. “Hast thou never heard of the chillis of Hindustan? Stop eating them at once, or death will surely claim a victim before the sun is down.” “I cannot move from here,” gasped the Mulla, “until I have finished the whole basketful.” “Madman! those fruits belong in curry! Throw them away at once.” “I am not eating fruit any more,” croaked Nasrudin, “I am eating my money.”
--Idries Shah’s “The Pleasntries of the Incredible Mulla Nasrudin
I think it’s appropriate to separate work ethic and akrasia mastery from rationality. Saying that work ethic is a choice is, imho, a relatively simplistic view. People often get fired for something trivial (smoking when a drug test is coming up, repeated absence, etc) that they know full well is a suboptimal decision and the short term benefits of getting high (or whatever) override their concern for the long term possible consequences. I think it makes sense to make some distinction that rationality is the ability to select the right path to walk and self discipline is the wherewithal to walk it.
I wonder how well defined “my goals” are here or how much to trust expectations. I think a rough approximation could involve these various systems generating some impulse map and then OPFC and some other structures get involved in selecting an action. I don’t think a closed form expression of a goal is required in order to say that the goal exists.
USGS has good info.
http://www.usgs.gov/ http://cegis.usgs.gov/ontology.html
http://dbpedia.org/About Also there is no need to scrape wikipedia, work has been done for you. You can do sparql queries to get most of your statements and the CEGIS site supposedly has a working sparql endpoint but I haven’t used that in years.
I was a civ junkie for a long time… one interesting thing is that the manual had structured data representations of everything in the game. It was also deadly exploitable, you would usually just not use certain strategies because they’re boring.
Overall the concept of embodied cognition makes a lot of sense. Yoga and particularly martial arts give a lot of tools for embodied cognition. Particularly martial arts—these are all telegraphed gestures, in an adversarial mode it’s not useful to advertise weaknesses. Changing weight balance and the subtle foot motions involved in shifting into a martial arts stance is less obvious and communicates strength to anyone able to notice.
It’s probably worthwhile asking people to put the logo with an alt text of sponsored by leading to lesswrong or siai. people that stumble onto such an article that don’t know about lw or siai are likely to be interest, should also help pagerank.
in one sentence… the vote processing mechanism required a reference to the global configuration for pylons and the pylons configuration import was missing.
not super interesting unfortunately :]
it was probably something like a munged automerge or something
statements that are ~50% true… this is actually pretty hard, mine some dataset for statistical info?
generally, I would look into RDF, (protege and topbraid composer free will let you poke around for free without knowing the data format)
US 2000 Census in RDF
Freebase has all manner of data in RDF
http://aws.amazon.com/publicdatasets/ public data sets, not all in RDF but “it’s more important that the data have structure” and all that
cancer stats