As an example, I think in the game “both players win if they choose the same option, and lose if they pick different options” has “the two players pick different options, and lose” as one of the feasible outcomes, and it is not on the Pareto frontier, because if they picked the same thing, they would both win, and that would be a Pareto improvement.
drocta
There are a few places where I believe you mean to write a but instead have instead. For example, in the line above the “Applicability” heading.
I like this.
I am trying to check that I am understanding this correctly by applying it, though probably not in a very meaningful way:
Am I right in reasoning that, for , that iff ( (C can ensure S), and (every element of S is a result of a combination of a possible configuration of the environment of C with a possible configuration of the agent for C, such that the agent configuration is one that ensures S regardless of the environment configuration)) ?
So, if S = {a,b,c,d} , then
would have , but, say
would have , because , while S can be ensured, there isn’t, for every outcome in S, an option which ensures S and which is compatible with that outcome ?
Thanks! (The way you phrased the conclusion is also much clearer/cleaner than how I phrased it)
This reminds me of the “Converse Lawvere Problem” at https://www.alignmentforum.org/posts/5bd75cc58225bf06703753b9/the-ubiquitous-converse-lawvere-problem a little bit, except that the different functions in the codomain have domain which also has other parts to it aside from the main space .
As in, it looks like here, we have a space of values , which includes things such as “likes to eat meat” or “values industriousness” or whatever, where this part can just be handled as some generic nice space , as one part of a product, and as the other part of the product has functions from to .
That is, it seems like this would be like, .Which isn’t quite the same thing as is described in the converse Lawvere problem posts, but it seems similar to me? (for one thing, the converse Lawvere problem wasn’t looking for homeomorphisms from X to the space of functions from X to functions to [0,1] , just a surjective continuous function).
Of course, it is only like that if we are supposing that the space we are considering, , has to have all combinations of “other parts of values” with “opinions on the relative merit of different possible values”. Of course if we just want some space of possible values, and where each value has an opinion of each value, then that’s just a continuous function from a product of the space with itself, which isn’t any problem.
I guess this is maybe more what you meant? Or at least, something that you determined was sufficient to begin with when looking at the topic? (and I guess most more complicated versions would be a special case of it?)
Oh, if you require that the “opinion on another values” decomposes nicely in ways that make sense (like, if it depends separately on the desirability of the base level values, and the values about values, and the values about values about values, etc., and just has a score for each which is then combined in some way, rather than evaluating specifically the combinations of those) , then maybe that would make the space nicer than the first thing I described (which I don’t know whether such a thing exists) in a way that might make it more likely to exist.
Actually, yeah, I’m confident that it would exist that way.
Let
And let
And then let ,
and for definewhich seems like it would be well defined to me. Though whether it can captures all that you want to capture about how values can be, is another question, and quite possibly it can’t.
I am unsure as to what the judge’s incentive is to select the result that was more useful, given that they still have access to both answers? Is it just because the judge will want to be such that the debaters would expect them to select the useful answer so that the debaters will provide useful answers, and therefore will choose the useful answers?
If that’s the reason, I don’t think you would need a committed deontologist to get them to choose a correct answer over a useful answer, you could instead just pick someone who doesn’t think very hard about certain things / that doesn’t see their choice of actions as being a choice of what kind of agent to be / someone who doesn’t realize why one-boxing makes sense.
(Actually, this seems to me kind of similar to a variant of transparent Newcomb’s problem, with the difference being that the million dollar box isn’t even present if it is expected that they would two-box if it were present, and the thousand dollar box has only a trivial reward in it instead of a thousand dollars. One-boxing in this would be choosing the very-useful-but-not-an-answer answer, while two-boxing would be picking the answer that seems correct, and also using whatever useful info is in both answers.)I suspect I’m just misunderstanding something.
Ah, thank you, I see where I misunderstood now. And upon re-reading, I see that it was because I was much too careless in reading the post, to the point that I should apologize. Sorry.
I was thinking that the agents were no longer being trained, already being optimal players, and so I didn’t think the judge would need to take into account how their choice would influence future answers. This reading clearly doesn’t match what you wrote, at least past the very first part.
If the debaters are still being trained, or the judge can be convinced that the debaters are still being trained, then I can definitely see the case for a debater arguing “This information is more useful, and because we are still being trained, it is to your benefit to choose the more useful information, so that we will provide the more useful information in the future”.I guess that suggests that the environments in which the judge confidently believes (and can’t be convinced otherwise) that the debaters are/aren’t still being trained, are substantially different, and so if training produces the optimal policy in which it is trained, then after training was done, it would likely still do the “ignoring the question” thing, even if that is no longer optimal when not being trained (when the judge knows that the debaters aren’t being trained).
That something can be modeled using some Turing machine, doesn’t imply that it can be any Turing machine.
If I have some simple physical system, such that I can predict how it will behave, well, it can be modeled by a Turing machine, but me being able to predict it doesn’t imply that I’ve solved the halting problem.A realistic conception of agents in an environment doesn’t involve all agents having unlimited compute at every time-step. An agent cannot prevent the universe from continuing simply by getting stuck in a loop and never producing its output for its next action.
The agent/thinker are limited in the time or computational resources available to them, while the predictor is unlimited.
My understanding is that this is generally situation which is meant. Well, not necessarily unlimited, just with enough resources to predict the behavior of the agent.
I don’t see why you call this situation uninteresting.
The link in the rss feed entry for this at https://agentfoundations.org/rss goes to https://www.alignmentforum.org/events/vvPYYTscRXFBvdkXe/ai-safety-beginners-meetup which is a broken link (though, easily fixed by replacing “events” with “posts” in the url) .
[edit: it appears that it is no longer in the rss feed? It showed up in my rss feed reader.]
I think this has also happened with other “event” type posts in the rss feed before, but I may be remembering wrong.
I suspect this is some bug in how the rss feed is generated, but possibly it is a known bug which just hasn’t been deemed important enough to fix yet.
I assume that when the event is updated that the additional information will include how to join the meetup?
I am interested in attending.
This comment I’m writing is mostly because this prompted me to attempt to see how feasible it would be to computationally enumerate the conditions for the weights of small networks like the 2 input 2 hidden layer 1 output in order to implement each of the possible functions. So, I looked at the second smallest case by hand, and enumerated conditions on the weights for a 2 input 1 output no hidden layer perceptron to implement each of the 2 input gates, and wanted to talk about it. This did not result in any insights, so if that doesn’t sound interesting, maybe skip reading the rest of this comment. I am willing to delete this comment if anyone would prefer I do that.
Of the 16 2-input-1-output gates, 2 of them, xor and xnor, can’t be done with the perceptrons with no hidden layer (as is well known), for 8 of them, the conditions on the 2 weights and the bias for the function to be implemented can be expressed as an intersection of 3 half spaces, and the remaining 6 can of course be expressed with an intersection of 4 (the maximum number that could be required, as for each specific input and output, the condition on the weights and bias in order to have that input give that output is specified by a half space, so specifying the half space for each input is always enough).
The ones that require 4 are: the constant 0 function, the constant 1 function, return the first input, return the second input, return the negation of the first input, and return the negation of the second input.These seem, surprisingly, among the simplest possible behaviors. They are the ones which disregard at least one input. It seems a little surprising to me that these would be the ones that require an intersection of 4 half spaces.
I haven’t computed the proportions of the space taken up by each region so maybe the ones that require 4 planes aren’t particularly smaller. And I suppose with this few inputs, it may be hard to say that any of these functions are really substantially more simple than any of the rest of them. Or it may be that the tendency for simpler functions to occupy more space only shows up when we actually have hidden layers and/or have many more nodes.Here is a table (x and y are the weights from a and b to the output, and z is the bias on the output):
outputs for the different inputs when this function is computed
0000 (i.e. the constant 0) z<0, x+y+z<0, x+z<0, y+z<0
0001 (i.e. the and gate) x+y+z>0, x+z<0, y+z<0
0010 (i.e. a and not b) z<0, x+y+z<0, x+z>0
0011 (i.e. if input a) z<0, x+y+z>0, x+z>0, y+z<0
0100 (i.e. b and not a) z<0, x+y+z<0, y+z>0
0101 (i.e. if input b) z<0, x+y+z>0, x+z<0, y+z>0
0110 (i.e. xor) impossible
0111 (i.e. or) z<0, x+z>0, y+z>0
1000 (i.e. nor) z>0, x+z<0, y+z<0
1001 (i.e. xnor) impossible
1010 (i.e. not b) z>0, x+y+z<0, x+z>0, y+z<0
1011 (i.e. b->a ) z>0, x+y+z>0, x+z<0
1100 (i.e. not a) z>0, x+y+z<0, x+z<0, y+z>0
1101 (i.e. a->b ) z>0, x+y+z>0, y+z<0
1110 (i.e. nand ) x+y+z<0, x+z>0, y+z>0
1111 (i.e. constant 0) z>0, x+z>0, y+z>0, x+y+z>0
nitpick : the appendix says possible configurations of the whole grid, while it should say possible configurations. (Similarly for what it says about the number of possible configurations in the region that can be specified.)
For the volumes, I suppose that because scaling all of these parameters by the same positive constant doesn’t change the function computed, it would make sense to compute the volumes of the corresponding regions of the cube, and this would handle the issues with these regions having unbounded size.
(this would still work with more parameters, it would just be a higher dimensional sphere)
Er, would that give the same thing as the limit if we took the parameters within a cube?
Anyway, at least in this case, if we use the “projected onto the sphere” case, we could evaluate the areas by splitting the regions (which would be polygons of some kind, with edges being arcs of great circles) into triangles, and then using the formulas for the areas of triangles on a sphere. Actually, they might already be triangles, I’m not sure.Would this work in higher dimensions? I don’t know of formulas for computing the measure of a n-simplex (with flat facets or whatever the right terminology is) within an n-sphere, but I suspect that they shouldn’t be too bad?
I’m not sure which is the more sensible thing to measure, the volumes of the intersection of the half spaces (intersected with a large cube centered at the origin and aligned with the coordinate axes), or the volume (one dimension lower) of that intersected-with/projected-onto the unit sphere.
Well, I guess if we assume that the coefficients are identically and independently distributed with a Gaussian distribution, then that would be a fairly natural choice, and should result in things being symmetric about rotations in the origin, which would seem to point to the choice of projecting it all to the (hyper-)sphere.
Well, I suppose in either case (whether on the sphere or in a cube), even before trying to apply some formulas about the area of a triangle on a sphere, there’s always the “just take the integral” option.
(in the cube option, this would I think be more straightforwards. Just have to do a triple integral (more in higher dimensions) of 1 with linear inequalities for the bounds. No real issues should show up.)
I’ll attempt it with the conditions for “and” for the “on the sphere” case, to check the feasibility.
If we have x+y+z>0, x+z<0, y+z<0, then we necessarily also have z<0 , x>0, y>0 , in particular x<-z , y<-z . If we have x,y,z on the unit sphere, then we have x^2+y^2+z^2=1 . So, for each value of z (which must be strictly between −1 and 0) we have x^2 + y^2 = 1 - z^2 , and because we have x>0 and y>0 , for a given z, for each value of x there is exactly one value of y, and visa versa.
So, y = sqrt(1 - z^2 - x^2) , and so we have x + sqrt(1 - z^2 - x^2) > -z , …
this is somewhat more difficult to calculate than I had hoped.
Still confident that it can be done, but I shouldn’t finish this calculation right now due to responsibilities.
It looks like, at least in this case with 3 parameters, that it would probably be easier to use the formulas for the area of triangles on a sphere, but I wouldn’t be surprised if, when generalizing to higher dimensions, doing it that way becomes harder.It looks like Chris Mingard’s reply has nice results which say much of what I think one would want from this direction? Well, it is less “enumerate them specifically”, and more “for functions which have a given proportion of outputs being 1″, but, still. (also I haven’t read it, just looked briefly at it)
I don’t know what particular description language you would want to use for this. I feel like this is such a small case that small differences in choice of description language might overwhelm any difference in complexity that these would have within the given description language?
I’ve now computed the volumes within the [-a,a]^3 cube for and, or, and the constant 1 function. I was surprised by the results.
(I hadn’t considered that the ratios between the volumes will not depend on the size of the cube)
If we select x,y,z uniformly at random within this cube, the probability of getting the and gate is 1⁄48, the probability of getting the or gate is 2⁄48, and the probability of getting the constant 1 function is 13⁄48 (more than 1⁄4).
This I found quite surprising, because of the constant 1 function requiring 4 half planes to express the conditions for it.So, now I’m guessing that the ones that required fewer half spaces to specify, are the ones where the individual constraints are already implying other constraints, and so actually will tend to have a smaller volume.
On the other hand, I still haven’t computed any of them for if projecting onto the sphere, and so this measure kind of gives extra weight to the things in the directions near the corners of the cube, compared to the measure that would be if using the sphere.
You said that you thought that this could be done in a categorical way. I attempted something which appears to describe the same thing when applied to the category FinSet , but I’m not sure it’s the sort of thing you meant by when you suggested that the combinatorial part could potentially be done in a categorical way instead, and I’m not sure that it is fully categorical.
Let S be an object.
For i from 1 to k, let be an object, (which is not anything isomorphic to the product of itself with itself, or at least is not the terminal object) .
Let be an isomorphism.
Then, say that is a representation of a factorization of S.
If and are each a representative of a factorization of S, then say that they represent the same factorization of S iff there exist isomorphisms such that , where is the isomorphism obtained from the with the usual product map, the composition of it with f’ is equal to f, that is, .Then say that a factorization is, the class of representative of the same factorization. (being a representation of the same factorization is an equivalence relation).
For FinSet , the factorizations defined this way correspond to the factorizations as originally defined.
However, I’ve no idea whether this definition remains interesting if applied to other categories.
For example, if it were to be applied to the closed disk in a category of topological spaces and continuous functions, it seems that most of the isomorphisms from [0,1] * [0,1] to the disk would be distinct factorizations, even though there would still be many which are identified, and I don’t really see talking about the different factorizations of the closed disk as saying much of note. I guess the factorizations using [0,1] and [0,1] correspond to different cosets of the group of automorphisms of the closed disk by a particular subgroup, but I’m pretty sure it isn’t a normal subgroup, so no luck there.
If instead we try the category of vector spaces and linear maps over a particular field, then I guess it looks more potentially interesting. I guess things over sets having good analogies over vector spaces is a common occurrence. But here still, the subgroups of the automorphism groups given largely by the products of the automorphism groups of the things in the product, seems like they still usually fail to be a normal subgroup, I think. But regardless, it still looks like there’s some ok properties to them, something kinda Grassmannian-ish ? idk. Better properties than in the topological spaces case anyway.
In the section about Non-Dogmatism , I believe something was switched around. It says that if the logical inductor assigns prices converging to $1 to a proposition that cannot be proven, that the trader can buy shares in that proposition at prices of $ and thereby gain infinite potential upside. I believe this should say that if the logical inductor assigns prices converging to $0 to a proposition that can’t be dis-proven, instead of prices converging to $1 for a proposition that can’t be proven .
(I think that if the price was converging to $1 for a proposition that cannot be proven, the trader would sell shares at prices $ , for potential gain of $1 each time, and potential losses of , so, to have this be $ , this should be .)
There’s also a little formatting error with the LaTeX in section 4.1
Nice summary/guide! It made the idea behind the construction of the algorithm much more clear to me.
(I had a decent understanding of the criterion, but I hadn’t really understood big picture of the algorithm. I think I had previously been tripped up by the details around the continuity and such, and not following these led to me not getting the big picture of it.)
The part about Chimera functions was surprising, and I look forward to seeing where that will go, and to more of this in general.
In section 2.1 , Proposition 2 should presumably say that is a partial order on rather than on .
My understanding:
One could create a program which hard-codes the point about which it oscillates (as well as some amount which it always eventually goes that far in either direction), and have it buy once when below, and then wait until the price is above to sell, and then wait until price is below to buy, etc.
The programs receive as input the prices which the market maker is offering.
It doesn’t need to predict ahead of time how long until the next peak or trough, it only needs to correctly assume that it does oscillate sufficiently, and respond when it does.
Do I understand correctly that in general the elements of A, B, C, are achievable probability distributions over the set of n possible outcomes? (But that in the examples given with the deterministic environments, these are all standard basis vectors / one-hot vectors / deterministic distributions ?)
And, in the case where these outcomes are deterministic, and A and B are disjoint, and A is much larger than B, then given a utility function on the possible outcomes in A or B, a random permutation of this utility function will, with high probability, have the optimal (or a weakly optimal) outcome be in A?
(Specifically, if I haven’t messed up, if asymptotically (as |B| goes to infinity) then the probability of there being something in A which is weakly better than anything in B goes to 1 , and if then the probability goes to at least , I think?
Coming from )While I’d readily believe it, I don’t really understand why this extends to the case where the elements of A and B aren’t deterministic outcomes but distributions over outcomes. Maybe I need to review some of the prior posts.
Like, what if every element of A was a probability distribution with over 3 different observation-histories (each with probability 1⁄3) , and every element of B was a probability distribution over 2 different observation-histories (each with probability 1⁄2)? (e.g. if one changes pixel 1 at time 1, then in addition to the state of the pixel grid, one observes at random either a orange light or a purple light, while if one instead changes pixel 2 at time 1, in addition to the pixel grid state, one observes at random either a red, green, or blue light, in addition to the pixel grid) Then no permutation of the set of observations-histories would convert any element of A into an element of B, nor visa versa.
What came to mind for me before reading the spoiler-ed options, was a variation on #2, with the difference being that, instead of trying to extract P’s hypothesis about B, we instead modify T to get a T’ which has P replaced with a P’ which is a paperclip minimizer instead of maximizer, and then run both, and only use the output when the two agree, or if they give probabilities, use the average, or whatever.
Perhaps this could have an advantage over #2 if it is easier to negate what P is optimizing for than to extract P’s model of B. (edit: though, of course, if extracting the model from P is feasible, that would be better than the scheme I described)
On the other hand, maybe this could still be dangerous, if P and P’ have shared instrumental goals with regards to your predictions for B?
Though, if P has a good model of you, A, then presumably if you were to do this, both P and P’ would expect you would do this, and, so I don’t know what would make sense for them to do?
It seems like they would both expect that, while they may be able to influence you, that insofar as the influence would effect the expected value of number of paperclips, it would be canceled out by the other’s influence (assuming that the ability to influence # paperclips via changing your prediction of B, is symmetric, which, I guess it might not be..).
I suppose this would be a reason why P would want its thought processes to be inscrutable to those simulating it, so that the simulators are unable to construct P’ .
__
As a variation on #4, if P is running on a computer in a physics simulation in T, then almost certainly a direct emulation of that computer running P would run faster than T does, and therefore whatever model of B that P has, can be computed faster than T can be. What if, upon discovering this fact about T, we restrict the search among Turing machines to only include machines that run faster than T?
This would include emulations of P, and would therefore include emulations of P’s model of B (which would probably be even faster than emulating P?), but I imagine that a description of an emulation of P without the physics simulation and such would have a longer description than a description of just P’s model of B. But maybe it wouldn’t.