sixes_and_sevens comments on Open Thread, May 19 − 25, 2014

sixes_and_sevens 19 May 2014 17:02 UTC
8 points
0
A while ago I mentioned how I’d set up some regexes in my browser to alert me to certain suspicious words that might be indicative of weak points in arguments.

I still have this running. It didn’t have the intended effect, but it is still slightly more useful than it is annoying. I keep on meaning to write a more sophisticated regex that can somehow distinguish the intended context of “rather” from unintended contexts. Natural language is annoying and irregular, etc., etc.

Just lately, I’ve been wondering if I could do this with more elaborate patterns of language. It’s recently come to my attention that expressions of the form “in saying [X] (s)he is [Y]” is often indicative of sketchy value-judgement attribution. It’s also very easy to capture with a regex. It’s gone in the list.

So, my question: what patterns of language are (a) indicative of sloppy thinking, weak arguments, etc., and (b) reliably captured by a regex?

(In the back of my mind, I am imagining some sort of sanity-equivalent of a spelling and grammar check that you can apply to something you’ve just written, or something you’re about to read. This is probably one of those projects I will start and then abandon, but for the time being it’s fun to think about.)
- satt 26 May 2014 20:36 UTC
  2 points
  0
  Parent
  The pair “tend to always” or “always tend to”. Sometimes they come off to me as a way to exploit the rhetorical force of “always” while committing only to a hedged “tend to”, in which case they can condense a two-step of terrific triviality into three words. There are likely other phrases that can provide plausibly deniable pseudo-certainty but I can’t think of any.
  
  More generally, the Unix utility diction tries to pick out “frequently misused, bad or wordy diction”, which is a kinda related precedent.
  - sixes_and_sevens 26 May 2014 23:12 UTC
    2 points
    0
    Parent
    
    two-step of terrific triviality
    
    When they come in the form of portentous pronouncements, Daniel Dennett calls these “deepities”; ambiguous expressions having one meaning which is trivially true but unimportant, and another that is obviously false but would be earth-shatteringly significant if it were true.
    
    Also related in cold reading is the Rainbow Ruse.
- TsviBT 20 May 2014 15:16 UTC
  2 points
  0
  Parent
  “[...]may be the case[...]”
  
  Sometimes this phrase is harmless, but sometimes it is part of an important enumeration of possible outcomes/counterarguments/whatever. If “the case” does not come with either a solid plan/argument or an explanation why it is unlikely or not important, then it is often there to make the author and/or the audience feel like all the bases have been covered. E.g.,
  
  We should implement plan X. It may be the case that [important weak point of X], but [unrelated benefit of X].
- moridinamael 19 May 2014 18:53 UTC
  2 points
  0
  Parent
  I had the notion a while ago to try to write a linter to aid in tasks beyond code correctness by automatically detecting the desired features in a plethora of objects. Kudos on actually doing it and in a not hare-brained fashion.
- Punoxysm 20 May 2014 3:38 UTC
  0 points
  0
  Parent
  As a former Natural Language Processing researcher, the technology definitely exists. Using general vocabulary combined with many (semi-manually generated) regexes to figure out argumentative or weaselly sentences with decent accuracy should be doable. It could improve over time if you input exemplar sentences you came across.
  - sixes_and_sevens 20 May 2014 11:38 UTC
    4 points
    0
    Parent
    Do you have a recommendation for a good language-agnostic text / reference resource on NLP?
    
    ETA: my own background is a professional programmer with a reasonable (undergrad) background in statistics. I’ve dabbled with machine learning (I’m in the process of developing this as a skill set) and messed around with python’s nltk. I’d like a broader conceptual overview of NLP.
    - Punoxysm 20 May 2014 20:13 UTC
      4 points
      0
      Parent
      I’d recommend this book for a general overview : http://nlp.stanford.edu/fsnlp/
      
      However, tasks like parsing are unnecessary for many tasks. A simple classifier on a sparse vector of word counts can be quite effective as a starting point in classifying sentence/document content.