Meta: What tool turns rich text into clean HTML?

If you write an article in Word, Writer, Scrivener, Google Docs, or another rich text editor, and then copy+paste that rich text into an online WYSIWYG editor like the one on Less Wrong or WordPress, the HTML generated by LW or WordPress is incredibly messy and does tons of weird stuff to your text.

Because of this, I’ve taken to composing all my posts in Markdown, which is plain text (like HTML) but easier to read, and can be easily converted to clean HTML.

Ideally, though, authors would be able to compose articles in whatever editor they want, and then paste their rich text into a simple web tool that strips all formatting from the HTML except the formatting they want to keep.

HTML Purifier, TIDY, and HTML Tidy aren’t quite what we need. Word2CleanHTML, Word HTML Cleaner and WordOff, along with CKEditor’s and TinyMCE’s ‘Paste from Word’ features, kinda work, but not really: they still make mistakes pretty often when I try them.

What I was hoping to find was something like Word2CleanHTML but with three changes:

  1. Does a good job when pasting from just about any rich text editor, not just Word.

  2. Allows the user to choose which formatting to keep, using a list of checkboxes for bold, italic, strikethrough, headings, text coloring, blockquotes, etc.

Does this exist, and I couldn’t find it?
Or, is this relatively easy for a coder to create?