Music Video maker and self professed “Fashion Victim” who is hoping to apply Rationality to problems and decisions in my life and career probably by reevaluating and likely building a new set of beliefs that underpins them.
CstineSublime
I, in search of idiohobbies, will ask “what have you done by which I may know you?”.
How do people normally respond to that? Are there any people who, perhaps, feel ashamed of what they have done/made/comes-off-the-tip-of-their-tongue and wish not for it to define how you view them?
Can you give an example of what you mean by aggressive discourse? because I think I’m bringing the baggage of assuming it refers to tone and inclusion of sarcasm, mocking the interlocutor, as well as name calling and Ad hominem arguments etc. etc.
Can you help me, how do you get LLMs to restrict their results or avoid certain topics?
I often find using LLMs and search engines feels like a Abbot and Costello routine whenever I try to use a negative. If a search engine doesn’t afford you the opportunity to use a negative operator, writing something like “Categories but not Kantian” will ensure you’ll get a whole lot of search results about Kantian Categories.
Likewise, I find that my attempts to prompt ChatGPT or Claude with some kind of embargo or negative “avoid mentioning...” “try not to...” will almost always ensure the inclusion of the very thing I explicitly told it not to do. Most annoying is if it uses a word which I just don’t understand the sense it’s being used, it will substitute it for a synonym.i.e. if it says it “relates” a value over here to a value over there, when explicitly told to not use “relate or any synonym” it will use “connection” “attaches” or any number of synonyms.
Unfortunately all parts of the prompt are attended to equally so the LLM will be just as confused as poor Lou Costello and there is no way to negatively attend or produce negative prompts which will mask out any tokens close to the things you want to exclude (one hack in Diffusion Image Modelling is to hijack the Classifier-Free Guidance technique which can push the conditional embedding of the prompt slightly further away from the Unconditional prompt, which is more popularly known as “Negative Prompt”)
How do others get around this? The most simplest solution I can think of is simply to “don’t mention the war”—if you don’t want Kantian categories, well… don’t mention the words Kant, Idealism, or anything of the sort. This does get harder if the first reply of the LLM does offer those things. The only possible strategy I have to combat this is to try and find idiomatic words which point more in the direction of what subject you’d like it limited to—am I looking for Aristotelian categories, categories of Pokémon, Heavy metal sub-genres, corporate categories for tax purposes etc.
I’m sure there is a word already (potentially ‘to pull a Homer’?) but Claude suggested the name “Paradoxical Heuristic Effectiveness” for situations where a non-causal rule or heuristic outperforms a complicated causal model.
I first became aware of this idea when I learned about the research of psychologist John Gottman who claims he has identified the clues which with 94% accuracy will determine if a married couple will divorce. Well, according to this very pro-Gottman webpage, 67% of all couples will divorce within 40 years. (According to Forbes, it’s closer to 43% of American couples that will end in divorce, but that rockets up to 70% for the third marriage).
A slight variation where a heuristic performs almost as well as a complicated model with drastically less computational cost, which I’ll call Paradoxical Heuristic Effectiveness: I may not be able to predict with 94% accuracy whether a couple will divorce, but I can with 57% accuracy: it’s simple, I say uniformly “they won’t get divorced.” I’ll be wrong 43% of the time. But unlike Gottman’s technique which requires hours of detailed analysis of microexpressions and playing back video tapes of couples… I don’t need to do anything. It is ‘cheap’, computationally both in terms of human computation or even in terms of building spreadsheets or even MPEG-4 or other video encoding and decoding of videos of couples.
My accuracy, however, rockets up to 70% if I can confirm they have been married twice before. Although this becomes slightly more causal.
Now, I don’t want to debate the relative effectiveness of Gottman’s technique, only the observation that his 94% success rate seems much less impressive than just assuming a couple will stay together. I could probably achieve a similar rate of accuracy through simply ascertaining a few facts: 1. How many times, if ever either party have been divorced before? 2. Have they sought counseling for this particular marriage? 3. Why have they sought counseling?
Now, these are all causally relevant facts. What is startling about by original prediction mechanism is just assuming that all couples will stay together is that it is arbitrary. It doesn’t rely on any actual modelling or prediction which is what makes it so computationally cheap.
I’ve been thinking about this recently because of a report of someone merging two text encoder models together T5xxl and T5 Pile: the author claims to have seen an improvement in prompt adherence for their Flux (and image generation model), another redditor opines is within the same range of improvement one would expect from merging random noise to the model.
The exploits of Timothy Dexter appear to be a real world example of Paradoxical Heuristic Effectiveness, as the story goes he was trolled into “selling coal to Newcastle” a proverb for an impossible transaction as Newcastle was a coal mining town – yet he made a fortune because of a serendipitous coal shortage at the time.
To Pull a Homer is a fictional idiom coined in an early episode of the Simpsons where Homer Simpson twice averts a meltdown by blindly reciting “Eeny, meeny, miny, moe” and happening to land on the right button on both occasions.
However, Dexter and Simpson appear to be examples of unknowingly find a paradoxically effective heuristic with no causal relationship to their success – Dexter had no means of knowing there was a coal shortage (nor apparently understood Newcastle’s reputation as a coal mining city) nor did Simpson know the function of the button he pushed.
Compare this to my original divorce prediction heuristic with a 43% failure rate: I am fully aware that there will be some wrong predictions but on the balance of probabilities it is still more effective than the opposite – saying all marriages will end in divorce.
Nicholas Nassim Taleb gives an alternative interpretation of the story of Thales as the first “option trader” – Thales is known for making a fantastic fortune when he bought the rights to all the olive presses in his region before the season, there being a bumper crop which made them in high demand. Taleb says this was not because of foresight or studious studying of the olive groves – it was a gamble that Thales as an already wealthy man was well positioned to take and exploit – after all, even a small crop would still earn him some money from the presses.
But is this the same concept as knowingly but blindly adopting a heuristic, which you as the agent know has no causal reason for being true, but is unreasonably effective relative to the cost of computation?
Related to this there was a period of time maybe 2 years ago where online any and all ills related to self-improvement or productivity were prescribed “Atomic Habits” by James Clear. “I’m having trouble studying, any recommendations?” you’d get a two word response: Atomic Habits. “I’m trying to learn a new skill but can’t keep it together” “You should read Atomic Habits”. People weren’t forthcoming with why it was effective or what lessons they gleaned from it. But were effusive in their praise and insist that it should be read.
This also applies to television shows, everyone told me to watch Game of Thrones[1], and I know there’s an XKCD comic about the mathematics of television timesinks.
My theory - “recommendations” for media are never about you, the potential reader but are the result of availability heuristic and whatever is top-of-mind for the person doing the recommendation.
The flip side is that it causes this terrible imperative to consume content you have no personal interest in just to stay socially relevant. When cultural touchstones should, ideally, be about shared values—not having enough information to remain relevant at the proverbial watercooler- ^
To paraphrase actual conversations: “Why?” “Well they get you really invested in these characters… and then they kill them” “And why would I want to put myself through that?” “Well… it’s just good okay!”
- ^
Can they give you specific examples of the clients they gained as a result of publishing a video on a social network?
To be honest I haven’t asked for specific examples (and I guess I’ll need to find a way to ask for it which is not misconstrued as confrontational) but no one has been forthcoming.
Agreed, they don’t. Maybe shares make it more likely for the video to reach a potential client.
Yup, “Hey look at this, you should get them to do your next music video for you” or within a band: “hey look at this video they did for this band, we could use something like that”.
I suspect that a good video needs to be “actionable”: it should give a specific example of a problem that you can solve, and it should explicitly encourage them to contact you if they need to have a problem like that solved.
That rings true. The only person I know personally who has gotten such high social media engagement they are now getting spots on traditional media is an “expert”, therefore they provide actionable advice. They have both the credentials and the industry experience to back it up. It also (unfortunately) helps they intentionally solicit controversy and use clickbaity statements. And it’s a topic which is always in demand. At their behest I’ve tried putting out didactic videos on what bands and artists should do for music videos, explaining different tropes and conventions where are cool. but like after 2 months I ran out of ways to make it “actionable”. Maybe if I continued the grind for 6+ months the algorithm would have started pushing my content more on people outside of my network’s Instagram feed?
Or maybe I need to pay for ads?
I’m not sure how I (me, specifically—may be generalizable to others?) can apply any of those unless I’m already receiving feedback, reward.
In the interest of being specific and concrete I’ll use one example—my personal bugbear: the refrain from people who tell me that as a creative freelancer I need to “get your stuff out there” stuff here nebulously referring to the kinds of videos I can make. “There” is an impossibly vague assertion that the internet and social media are vectors for finding clients.
Yet I have low belief in “getting stuff out there” is an effective measure to improve my standing as a creative freelancer, let’s go through your suggested influences one-by-one:
Peer Pressure: well it doesn’t work evidently since I’ve been repetitively told that “you need to put your stuff out there” is true—but I don’t believe it. These people are often peers, stating it as a fact, yet it doesn’t shift my belief. The caveat I would put here is I have not had luck finding clients through previous forays online and most of my clients appear to come from offline networks and relationships.Getting Quick Feedback: This does seem like the most effective means of shifting belief—however it is no applicable in this example as the feedback is non-existant, let alone quick. Likes and comments don’t translate into commissions and clients.
Getting the information from a trustworthy source: yes, generally true, call it “appeal to authority” call it Aristotle’s theory of ethos in rhetoric. Yet not applicable in this example, in fact people who repeat this refrain appear less trustworthy to me.
Getting other reward in Parallel: Likes and comments are rewards in a sense, yet do not influence my belief because it is not directly affecting the core metric which is—getting more clients or commissions.
However there are some caveats: the advice is impossibly vague and therefore impossible to action. Which begs the question of—what is my lack of faith or belief with? If I had to pin it down it would be “spamming the internet with my videos is not sufficient to generate meaningful relationships with clients and collaborators”. The truth is that most of my clients come from word of mouth among offline networks.
It might be worth me applying this framework to another activity or theory I have “low belief” and compare the two? hmmm…
I’m afraid I can’t read probabilistic notation, but on first blush what you’ve described does sound like I’m simply reinventing the wheel—and poorly compared to Jeffrey’s Theory there. So yes, it is related to the expected value. And I like how Jeffrey’s theory breaks the degree of belief and the desire into two separate values.
I don’t like the word “motivation” because I find one of the chief factors in whether I’m motivated to do something or not is belief. Most discussions of motivation seem to basically see it as the pain or “cost” of doing something versus the reward. However just because you do something, be it painful or easy, doesn’t mean you’ll get the reward.
Perhaps some fragmentary dialogue will illustrate my thinking:
“digging for gold is hard work and you’re not even sure if you’ll find anything”—low motivation. High cost (hard work) no certainty of reward.
”I’d basically be sitting on my phone all afternoon, and they have to pay me $500″ - high motivation. Low cost (easy) guaranteed reward.
Now let’s compare it to this:
“buying a lottery ticket is easy but you’re not even sure if you’ll find anything”
“You should put it on the internet, you might go viral”
Personally, this is why I don’t buy lottery tickets. And hopefully this illustrates why I don’t like the implication that motivation is simply how easy a task is and the magnitude of reward. Because the certainty matters.
The problem becomes if you’re a pessimist like me—then EVERYTHING has low certainty. Therefore you don’t do much of anything. Becoming more ‘motivated’ isn’t simply a matter of wanting it more—it is having belief.
No doubt that sycophancy and the fear of expressing potentially friendship damaging truths allows negative patterns of behavior to continue unimpeded but I think you’ve missed the two most necessary factors in determining if advice—solicited or unsolicited—is a net benefit to the recipient:
1. you sufficiently understand and have the expertise to comment on their situation
&
2. you can offer new understanding they aren’t already privy to.
Perhaps the situations where I envision advice is being given is different to yours?
The problem I notice with most unsolicited advice is it’s either something the recipient is already aware of (i.e. the classic sitcom example is someone touches a hot dish and after the fact is told “careful that pan is hot”—is it good advice? Well in the sense that it is truthful, maybe. But the burn already having happened, it is not longer useful.) This is why it annoys people, this is why it is taken as an insult to their intelligence.
A lot of people have already heard the generic or obvious advice and there may be many reasons why they aren’t following it,[1] and most of the time hearing this generic advice being repeated will not be of a benefit even if they have all the qualities you enumerate: that you’re willing to accept the cost of giving advice, that they are rational enough to not take offense, they are good at taking advice and criticism, and they value honest feedback even when they disagree.
Take this example exchange:
A: “Why are you using the grill to make melted cheese, we have a toaster oven.”B: “the toaster is shorted out, it’s broken”
You must sufficiently understand the recipient’s situation if you are to have any hope of improving it. If you don’t know what they know about the toaster oven, then unsolicited advice can’t help.
Another major problem I’ve found with unsolicited advice is that it lacks fine grain execution detail. My least favourite advice as a freelancing creative is “you need to get your name out there”—where is there? On that big nebulous internet? How does that help me exactly? Unless I needed further reinforcement of the fact that what material I am putting online isn’t reaching my current interlocutor—but it doesn’t give me any clues how to go about remedying that.
Advice, for it to be useful needs more than just sympathy and care for the person’s well being—it needs understanding of the situation which is the cause of their behavior.
My personal metric for the “quality” of advice is how actionable it is. This means that it can’t be post-facto (like the sitcom hot pan), it needs to understand causes and context—such as why they aren’t using the toaster oven, and most importantly it needs to suggest explicit actions that can be taken in order to change the situation (and which cations the recipient can or can’t take can only be determined by properly understanding their situation and the causes of their behavior).- ^
Caveat: I’m sure there’s a genre of fringe cases where repetition becomes “the medium is the message”—that is they do need to hear it again. But there’s a certain point where doing the same thing again and again and expecting a different result is quite stark raving mad.
- ^
Not for my purposes. For starters I use a lot of image and video generation, and even then you have U-nets and DITs so I need something more generalized. Also, if I’m not mistaken, what you’ve described is only applicable to autoregressive transformers like ChatGPT. Compare to say T5 which is not autoregressive.
What are Transformers? Like what is concrete but accurate-enough conversational way of describing it that doesn’t force me to stop the conversation dead in it’s tracks to explain jargon like “Convolutional Neural Network” or “Multi-Head Attention”?
Its weird that I can tell you roughly how the Transformers in a Text Encoder-Decoder like T5 is different from the Autoregressive Transformers that generate the text in ChatGPT (T5 is parallel, ChatGPT sequential), or how I can even talk about ViT and DiT transformers in image synthesis (ViT like Stable Diffusion down and upsample the image performing operations on the entire latent, DiT work on patches). But I don’t actually have a clear definition for what is a transformer.
And if I was in a conversation with someone who doesn’t know much about compsci (i.e. Me—especially 5 months ago), how would I explain it:
“well for text models it is a mechanism that after a stream of words has been tokenized (i.e. blue might be “bl” and “ue” which each have a special id number) and the embeddings retrieved based on those token id numbers which are then used to compute the Query, Key, and Value vectors which often use a similarity measure like a Cosine Similarity Measure to compare the embedding of this key vector to the Qu—HEY, WHERE ARE YOU GOING! I DIDN’T EVEN GET TO THE NORMALIZATION!
Obviously this isn’t a definition, this is a “how it works” explanation and what I’ve just written as an example is heavily biased towards the decoder. But if someone asks me “what is a transformer?” what is a simple way of saying it in conversation?
Off the top of my head it’s because people are weary of Chesterton’s Fence/Sealioning (feigning ‘just asking questions’ when actually they have an agenda which they mask with the plausible deniability of feigning naive curiosity) and as you say—the topic being sensitive so it generates a ‘ugh field’ are two pillars of what makes certain topics difficult to discuss.
I’ve noticed this pattern on a lot of, usually political topics but it could also be some kind of interpersonal drama/gossip, someone asks a you question which appears to be an invitation to get your opinion on something.
”Hey what do you think about Blork?”
You give a non-commital answer, but that neutrality is enough and they are off and away with their soliloquy on why Blork is either the greatest thing to happen to Western Civilization or the very end of it. Very rarely is it followed up with: “What do you like about Blork?” or “How do you friend’s feel about Blork?” or any other question question which is rooted in a genuine desire to learn about Blork rather than a pretense to soapbox on it.
The amount of times that I’ve had someone monologue to me a “you know everyone gets it wrong about [thing which has a bad reputation]” despite (or perhaps because) I haven’t shown any judgement, and despite the fact I have shown no interest or curiosity in discussing the topic further. I think this has taught people to be very on-guard about any ‘sensitive’ topic. After all, now if I have someone ask a seemingly innocent question about Blork, I’m going to shut down the conversation least I risk another monologue.
This naturally makes it very hard for people who want to understand why Chesterton’s Fence is there like your situation with lead poisoning being a cause of sexism: curiosity is mistaken a veil of plausible deniability for a ready formed a position.
What I’m forgetting is there’s the plausible deniability on the other side, overcompensating and exaggerating their disgust or even projecting their own feelings.
”Why are you justifying sexism? I wouldn’t do that, because I’m not sexist. Do you see how not-sexist I am by accusing you of being sexist? Methinks I am not protesting too much. Do you see how progressive I am”
Take for example a controversy on Australian television involving Harry Connick Jnr, where a amateur talent contest segment of a variety show features a imitation of the Jackson 5, with the backup dancers in blackface, and the singer in exaggerated white-face. Connick Jnr was one of the judges on the panel and was furious, even demanding an on-air apology. Others pointed out that Connick may have been burned from his own past doing blackface on SNL.
Now the Connick Jnr example isn’t a discussion, but it does add another possible pillar to why people make assumptions about intentions on broaching sensitive topics.
That Nixon one really wow’d me, the fact that it exaggerated his jowls but after a bit of google searching it seems like other models also seem to have been trained on the Nixon caricature rather than the man himself.
I’m also a big fan of that Fleischer style distracted boyfriend remix.
Never the less, the ease of ‘prompting’ if that’s what you can even call it now is phenomenal.
I’m looking at this not from a CompSci point of view by a rhetoric point of view: Isn’t it much easier to make tenuous or even flat out wrong links between Climate Change and highly publicized Natural Disaster events that have lot’s of dramatic, visceral footage than it is to ascribe danger to a machine that hasn’t been invented yet, that we don’t know the nature or inclinations of?
I don’t know about nowadays but for me the two main pop-culture touchstones for me for “evil AI” are Skynet in Terminator, or HAL 9000 in 2001: A Space Odyssey (and by inversion—the Butlerian Jihad in Dune). Wouldn’t it be more expedient to leverage those? (Expedient—I didn’t say accurate)
I want to let you know I’ve been reflecting on the reactive/proactive news consumption all week. It has really brought into focus a lot of my futile habits not just over news consumption, but idle reading and social media scrolling in general.[1] Why do I do it? I’m always hoping for that one piece of information, that “one simple trick” which will improve my decision making models, will solve my productivity problems, give me the tools to let me accomplish goals XY&Z. Which of course begs the question of why am I always operating on this abstracted, meta-level, distanced level from goals XY&Z and the simple answer is: if I knew how to solve them directly, I’d be actively working on the sets to solve them.
That’s a lot of TMI but I just wanted to give you a sense of the affect this had on me.That’s not how proactive thinking works. Imagine if a company handed you a coupon and your immediate thought was “how can I use this coupon to save money”? That’s not you saving money. That’s the company tricking you into buying their product.
Or those little “specials” at the Gas Station—buy one chocolate bar, get another free—the customer didn’t save 100% of the price of the second chocolate bar, they lost 100% because they had no intention of buying a chocolate bar until they saw that impulse-hacking “offer”.
- ^
On the flip side is the wasteful consumption that I don’t read—my collection of books that I probably won’t ever read. Why buy them? Seems as pointless as reading ephemeral news slop.
- ^
I think you’re right. Although I’m having a hard time expressing where to draw the line between a simile and a analogy even after glancing at this article; https://www.grammarpalette.com/analogy-vs-simile-dont-be-confused/
Thank you for sharing that, it is interesting to see how others have arrived at similar ideas. Do you find yourself in a rhythm or momentum when sprinting and shooting?
Bad information can inform a decision that detracts from the received value. I suppose if it is perceived to be valuable it still is a useful term—do you think that would get the point across better?
I make music videos and my decision I often make poorly is:
How would you dimensionalize this decision?
Off the top of my head some of the low-leverage dimensions would be the standard metrics like views, like, even reshares. Others might be “timeliness” of the actual content (i.e. what current trends it is very mindful and demure of), and “ethos” which might also be called how “on brand” it is—which I breaks into it’s own series of dimensions.
However in my case the real leverage is: “does this solicit me more music video commissions?”
I’ve tried in the past to break this down into broad categories of dimensions like:
But like… none seem to have any leverage. What in my approach am I doing wrong?