Nitpick, is there a reason why the margins are so large?
FactorialCode
[Question] How well can the GPT architecture solve the parity task?
The content on the front page is noticeably off center to the right on 1440x900 monitors.
Edit: The content is noticeably off center to the right in general.
On the standardization and interoperability side of things. There’s been effort to develop decentralized social media platforms and protocols. Most notably being the various platforms of the Fediverse. Together with opensource software, this let’s people build large networks that keep the value of network effects while removing monopoly power. I really like the idea of these platforms, but due to the network monopoly of existing social media platforms I think they’ll have great difficulty gaining traction.
Yeah, that’s pretty pricy. Google is telling me that they can do 1 million characters/month for free using a wavenet. That might be good enough.
What’s the going rate for audio recordings on Fiverr?
AvE: Assistance via Empowerment
With the ongoing drama that is currently taking place. I’m worried that the rationalist community will find itself inadvertently caught up in the culture war. This might cause a large influx of new users who are more interested in debating politics than anything else on LW.
It might be a good idea to put a temporary moratorium/barriers on new signups to the site in the event that things become particularly heated.
Organizations, and entire nations for that matter, can absolutely be made to “feel fear”. The retaliation just needs to be sufficiently expensive for the organization. Afterwards, it’ll factor in the costs of that retaliation when deciding how to act. If the cost is large enough, it won’t do things that will trigger retaliation.
There is no guarantee that it is learning particularly useful representations just because it predicts pixel-by-pixel well which may be distributed throughout the GPT,
Personally, I felt that that wasn’t really surprising either. Remember that this whole deep learning thing started with exactly what OpenAI just did. Train a generative model of the data, and then fine tune it to the relevant task.
However, I’ll admit that the fact that theres an optimal layer to tap into, and that they showed that this trick works specifically with transformer autoregressive models is novel to my knowledge.
This isn’t news, we knew that sequence predictors could model images for almost a decade now and openAI did the same thing last year with less compute, but no one noticed.
Many of the users on LW have their real names and reputations attached to this website. If LW were to come under this kind of loosely coordinated memetic attack, many people would find themselves harassed and their reputations and careers could easily be put in danger. I don’t want to sound overly dramatic, but the entire truth seeking and AI safety project could be hampered by association.
That’s why even though I remain anonymous, I think it’s best if I refrain from discussing these topics at anything except the meta level on LW. Even having this discussion strikes me as risky. That doesn’t mean that we shouldn’t discuss these topics at all. But it needs to be on a place like r/TheMotte where there is no attack vector. This includes using different usernames so we can’t be traced back here. Even then, the reddit AEO and the admins are technically weak points.
I’m going to second the request for a title change and propose:
Simulacra levels and their interactions, with applications to COVID-19
The Economic Consequences of Noise Traders
I’m getting 404 on that link. I think you need to get rid of the period.
Allow me to present an alternative/additional hypothesis:
The market is only as smart as the people who participate in it. In the long run, the smarter agents in the system will tend to accrue more wealth than the dumber agents. With this wealth they will be able to move markets and close arbitrage opportunities. However, if an army of barely litterate idiots are given access to complex leveraged financial instruments, free money, and they all decide to “buy the dip”, it doesn’t matter what the underlying value of the stock is. It’s going up.
Not to say that what you’re saying doesn’t apply. It probably exacerbates the problem, and is the main mechanism behind market bubbles. But there are multiple examples of a very public stocks going up or getting a large amount of attention, and then completely unrelated companies with plausible sounding tickers also shooting up in tandem.
This only makes any sense in the world where the market is driven by fools eager to loose all their money or more.
Alright, I’ve only played with this a bit, but I’m already finding interesting papers from years past that I’ve missed. I’m just taking old papers I’ve found notable and throwing them in and finding new reading material.
My only complaint is that it feels like there’s actually too little “entropy” in the set of papers that get generated they’re almost too similar, I end up having to make several hops through the graph to find something truly eye catching. It might also just be that papers I consider notable are few and far between.
I think virtue ethics and the “policy consequentialism” I’m gesturing at are different moral frameworks that will under the right circumstances make the same prescriptions. As I understand it, one assigns moral worth to outcomes, and the actions it prescribes are determined updatelessly. Whereas the other assigns moral worth to specific policies/policy classes implemented by agents, without looking at the consequences of those policies.
Epistemic status: Ramblings
I don’t know how much you can really generalise these lessons. For instance, when you say:
How much slower is e-coli optimization compared to gradient descent? What’s the cost of experimenting with random directions, rather than going in the “best” direction? Well, imagine an inclined plane in n dimensions. There’s exactly one “downhill” direction (the gradient). The n-1 directions perpendicular to the gradient don’t go downhill at all; they’re all just flat. If we take a one-unit step along each of these directions, one after another, then we’ll take n steps, but only 1 step will be downhill. In other words, only ~O(1/n) of our travel-effort is useful; the rest is wasted.
In a two-dimensional space, that means ~50% of effort is wasted. In three dimensions, 70%. In a thousand-dimensional space, ~99.9% of effort is wasted.
This is true, but if I go in a spherically random direction, then if my step size is small enough, ~50% of my efforts will be rewarded, regardless of the dimensionality of the space.
How best to go about optimisation depends on the cost of carrying out optimisation, the structure of the landscape, and the relationship between the utility and the quality of the final solution.
Blind guess and check is sometimes a perfectly valid method when you don’t need to find a very good solution, and you can’t make useful assumptions about the structure of the set, even if the carnality of the possible solution set is massive.
I often don’t even think “optimisation” and “dimensionality” are really natural ways of thinking about solving may real world engineering problems. There’s definitely an optimisation component to engineering process, but it’s often not central. Depending on circumstances, it can make more sense to think of engineering as “satisficing” vs “optimising”. Essentially, you’re trying to find a solution instead of the best solution, and the process used to solve the problem is going to look vastly different in one case vs another. This is similar to the notions of “goal directed agency” vs “utility maximisation”.
In many cases when engineering, you’re taking a problem and coming up with possible high level breakdowns of the problem. In the example of bridges, this could be deciding weather to use a cantilever bridge or a suspension bridge or something else entirely. From there, you solve the related sub-problems that have been created by the breakdown, until you’ve sufficiently fleshed out a solution that looks actionable.
The way you go about this depends on your optimisation budget. In increasing order of costs:
-You might go with the first solution that looks like it will work.
-You’ll recursively do a sort of heuristic optimisation at each level, decide on a solution, and move to the next level
-You’ll flesh out multiple different high level solutions and compare them.
. . .
-You search the entire space of possible solutions
This is where the whole “slack” thing and getting stuck in local optima comes back, even in high dimensional spaces. In many cases, you’re often “committed” to a subset of the solution space. This could be because you’ve decided to design a cantilever bridge instead of a suspension bridge. It could also be because you need to follow a design you know will work, and X is the only design your predecessors have implemented IRL that has been sufficiently vetted. (This is especially common in aircraft design, as the margins of error are very tight) It could even be because you’re comrades have all opted to go with a certain component, and so that component benefits from economies of scale and becomes the best choice even if another component would be objectively better we’re it to be mass produced.(I’ll leave it as an exercise to the reader to think of analogous problems in software engineering)
In all cases, you are forced to optimise within a subset of the search space. If you have the necessary slack, you can afford to explore the other parts of the search space to find better optima.
Huh.
I did not believe you so went and checked the internet archive. Sure enough, all the old posts with a ToC are off center. I did not notice until now.