Software engineer and small time DS/ML practitioner.
Templarrr
llms don’t work on unseen data
Unfortunately I hear this quite often, sometimes even from people who should know better.
A lof of them confuses this with the actual thing that exist: “supervised ML models (which LLM is just a particular type of) tend to work much worse on the out-of-training distribution data”. If you train your model to determine the volume of apples and oranges and melons and other round-y shapes—it will work quite well on any round-y shape, including all kind of unseen ones. But it will suck at predicting the volume of a box.
You don’t need model to see every single game of chess, you just need the new situations to be within the distribution built from massive training data, and they most often are.
Real out-of-distribution example in this case would’ve been to only train it on chess and then ask what is the next best move in checkers (relatively easy OOD—same board, same type of game) or minecraft.
the *real* problem is the huge number of prompts clearly designed to create CSAM images
So, people with harmful and deviant from the social norm taste instead of causing problems in the real world try to isolate themselves in the digital fantasies and that is a problem...exactly how?
I mean, obviously, it’s coping mechanism, not trying to fix the problem, but also our society isn’t known to be very understanding to people coming out with this kind of deviations when they want to fix it.
India getting remarkably better in at least one way, as the percentage of the bottom 20% who own a vehicle went from 6% to 40% in only ten years.
Is it better though? This stats show only “who owns a vehicle” not “who is happy about the fact”. It doesn’t show how many people were forced to take mortgage because owning a vehicle was an only way to live. In ideal world nobody should have a need for a personal vehicle to survive, leaving it only as a luxury, not a lifeline.
The inclusion of ‘natural disaster’ shows that this simply is not a thing people are thinking about at all.
Chicxulub and Popigai impactors were both pretty natural. Actually within the listed 5 things “natural disasters” is the only category that had actual extinction events in the past. So I’m a bit confused with this comment.
Peter Thiel on his struggle to leave California
Honestly, at this point one with some self-awareness would start to suspect that the problem may not be on the cities side. Nothing wrong with the search for the better place for themself, everyone is entitled to it, but when literally nothing fits...
If the answer is yes to all of the above
Point 2 needs rephrasing.
“Does it sound exciting or boring?” “Yes”
Most Importantly Missing
Where’s my “Babylon 5”? Honestly, risking to get the anger of trekkies here, but it’s “DS9 but better”
Does the Nobel Prize sabotage future work?
My first thought was “regression to the mean” and judging from a lot of comments in the original post I’m not the only one. If you’re on the top of the world, the only way to go is down.
Except there should also be an understanding what constitutes a constructive “questioning the science”. There can be no debate between quantum physicists and cobbler about quantum physics. Questioning the science isn’t “I decided I know better” and isn’t “I don’t want to beleive in your results” (by itself). You question the science by checking, double-checking, finding weaknesses in the previous science. And by making new, better, more rigorous science.
People tend to forget this part even more often than the part about questioning being the integral part of science.
Compared to how much carbon a human coder would have used? Huge improvement.
JSON formatting? That’s literally millisecond in dedicated tool. And contrary to LLM will not make mistakes you need to control for. Someone using LLM for this is just someone too lazy to turn on the brain.
That said, it’s not like people not using their brain isn’t frequent occurence, but still… not something to praise.
I’m not implying, I’m saying it outright. Depending on the way you measure and the source for the data police only solves between 5% and 50% of the crime. And that only takes into account reported crime, so actual fraction, even measured in the most police-friendly way, is lower. At the very least the same amount of criminals are walking free as being caught.
Criminals are found in places police check for criminals. And those become stats, sociological profiles and training data for AI to pick up patterns from.
On the topic of “why?” reaction—that is just how supervised machine learning works. Model learns the patterns in the training data (and interpolate between data points with the found patterns). And the training data only contains the information about prosecutions, not actual crime. If (purely theoretical) people called Adam were found in the training data guilty 100% of the time—this pattern will be noticed. Even though the name has nothing to do with the crime.
It’s really difficult to get truly unbiased training data. There are bias mitigation algorithms that can be applied after the fact on the model trained on biased data but they have their own problems. First of all their efficiency in bias mitigation itself usually varies from “bad” to “meh” at best. And more importantly most of them work by introducing counter-bias that can infuriate people that one will be biased against now and that counter-bias will have its own detrimental secondary effects. And this correction usually makes the model in general less accurate.
Giving physical analogy to attempts to fix the model “after the fact”… If one of the blades of the helicopter get chipped and become 10cm shorter—you don’t want to fly on this unbalanced heavy rotating murder shuriken now. You can “balance” it by chipping the opposite blade the same way or all the blades the same way, but while solving the balance now you have less thrust and smaller weight of the component and you need to update everything else in the helicopter to accommodate etc etc. So in reality you just throw away chipped blade and get a new one.
Unfortunately sometimes you can’t get unbiased data because it doesn’t exist.
virtually all the violent crime in the city was caused by a few hundred people
virtually all the violent crime prosecutions was caused by a few hundred people. Which is very much not the same. That’s the real reason why EU “pretend that we do not know such things”. If the goal is to continue prosecute who we always prosecuted—we can use AI all the way. If we want to do better… we can’t.
The situation is that there is a new drug that is helping people without hurting anyone, so they write an article about how it is increasing ‘health disparities.’
Isn’t “solving for the equilibrium” a big thing in this community? That’s what articles like this do—count not only first order effects, but also what those lead to.
Specifically—people with money and resources gobbling up all the available “miracle” drug, making people with less resources unable to get one even for the established medical use. So yeah, I really don’t see a problem with the article title (specifically title, hadn’t read the content!), it’s stating the facts. Finding new usage for limited resource makes poor people access to it even worse than before.
Of course, “let’s make less miracle drugs” isn’t a solution, solution is to make more of them, so that everyone who need one can get one. Finding new cures isn’t the problem, terrible distribution pipelines is.
only to find out it is censored enough I could have used DALL-E and MidJourney.
Last “censoring” of Stable Diffusion was done via the code and could’ve been turned off via 2 lines of code change. Was it done other way this time?
Probably some people would have, if asked in advance, claimed that it was impossible for arbitrarily advanced superintelligences to decently compress real images into 320 bits
And it still is.
This is really pushing the definition of what can be considered “image compression”. Look, I can write a sentence “black cat on the chessboard” and most of you (except the people with aphantasia) will see an image in their mind eye. And that phrase is just 27 bytes! I have a better “image compression” than in the whitepaper! Of course everyone see different image, but that’s just “high frequency details”, not the core meaning.
First it was hands. Then it was text, and multi-element composition. What can we still not do with image generation?
Text generation is considerably better, but still limited to few words, maybe few sentences. Ask it to generate you a monitor with Python code on it and you’ll see current limitations of this. It is an improvement for sure but in no way “solved” task.
POSIWID. Metric being optimized is not “having the most money”. It is debatable if it should be, as one of the “poor Europeans” my personal opinion is that we’re doing just fine.
There are 2 topics mixed here.
Existence of the contrarians.
Side effects of their existence.
My own opinion on 1 is that they are necessary in moderation. They are doing the “exploration” part in the “exploration-exploitation dilemma”. By the very fact of their existence they allow the society in general to check alternatives and find more optimal solutions to the problems comparing to already known “best practices”. It’s important to remember that almost everything that we know now started from some contrarian—once it was a well established truth that Monarchy is the best way to rule the people and democrats were dangerous radicals.
On the 2 - it is indeed a problem that contrarian opinions are more interesting on average, but the solution lies not in somehow making them less attractive—but by making more interesting and attractive conformist materials. That’s why it is paramount to have highly professional science educators and communicators, not just academics. My own favorites are vlogbrothers (John and Hank Green) in particular and their team in Complexly in general.
Two things to note.
First—I feel like putting every occupation in the same pile and deciding are you for or against licensing isn’t helpful? I personally don’t need licensed lawnmower, but I would very much prefer licensed doctor. The cost of mistake in two occupations differs a lot and can be used for a threshold which jobs should require a license.
Second—there should be a difference between doing a thing to yourself (argument can be made even that here we shouldn’t have any limits), doing things for free to your friends/relatives with their full knowledge of your skill level and experience (most of the non life-threatening things can probably be allowed here) and selling your craft for money.