How AI/​AGI/​Consciousness works—my layman theory

This is just my lay­man the­ory. Maybe it’s ob­vi­ous to ex­perts, prob­a­bly has flaws. But it seems to make sense to me, per­haps will give you some ideas. I would love to hear your thoughts/​feed­back!

Con­sume input

The data you need from the world(like video), and use­ful met­rics we want to op­ti­mize for, like num­ber of pa­per­clips in the world.

Make pre­dic­tions and take action

Like deep learn­ing does.

How do hu­man brains con­vert their struc­ture into ac­tion?

Maybe like:

- Take the cur­rent pic­ture of the world as an in­put.

- Come up with ran­dom ac­tion.

- “Imag­ine” what will hap­pen.

Take the cur­rent world + ac­tion, and run it through the ANN. Pre­dict the out­come of the ac­tion ap­plied to the world.

- Does the out­put in­crease the met­rics we want? If yes — send out the sig­nals to take ac­tion. If no — come up with an­other ran­dom ac­tion and re­peat.

Up­date beliefs

Look at the out­come of the ac­tion. Does the pic­ture of the world cor­re­spond to the pic­ture we’ve imag­ined? Did this ac­tion in­crease the good met­rics? Did the num­ber of pa­per­clips in the world in­crease? If it did — pos­i­tive re­in­force­ment. Back­prop­a­ga­tion, and re­in­force the weights.


Take cur­rent pic­ture of the world=> Imag­ine ap­ply­ing an ac­tion to it ⇒ Take ac­tion ⇒ Pos­i­tive/​Nega­tive re­in­force­ment to im­prove our model ⇒ Re­peat un­til the met­rics we want equal to the goal we have set.


Con­scious­ness is neu­rons ob­serv­ing/​rec­og­niz­ing pat­terns of other neu­rons.

When you see the word “cat”— pho­tons from the page come to your retina and are con­verted to neu­ral sig­nal. A net­work of cells rec­og­nizes the shape of let­ters C, A, and T. And then a higher level, more ab­stract net­work rec­og­nizes that these let­ters to­gether form the con­cept of a cat.

You can also rec­og­nize sig­nals com­ing from the nerve cells within your body, like feel­ing a pain when stab­bing a toe.

The same way, neu­rons in the brain rec­og­nize the sig­nals com­ing from the other neu­rons within the brain. So the brain “ob­serves/​feels/​ex­pe­riences” it­self. Builds a model of it­self, just like it builds a map of the world around, “mir­rors” it­self(GEB).

Sen­tient and self-improving

So the struc­ture of the net­work it­self is fed as one of it’s in­puts, along with the video and met­rics we want to op­ti­mize for. It can see it­self as a part of the state of the world it bases pre­dic­tions on. That’s what be­ing sen­tient means.

And then one of the pos­si­ble ac­tions it can take is to mod­ify it’s own struc­ture. “Imag­ine” mod­ifyng the struc­ture a cer­tain way, if you pre­dict that it leads to the bet­ter pre­dic­tions/​out­comes —mod­ify it. If it did lead to more pa­per­clips — re­in­force the weights to do more of that. So it keeps con­tinu­ally self im­prov­ing.


We don’t want this to lead to the in­finite amount of pa­per­clips, and we don’t know how to quan­tify the things we value as hu­mans. We can’t turn the “amount of hap­piness” in the world into a con­crete met­rics with­out the un­in­tended con­se­quences(like all hu­man brains be­ing hooked up to wires that stim­u­late our plea­sure cen­ters).

That’s why in­stead of try­ing to en­code the ab­stract val­ues to max­i­mize for, we en­code very spe­cific goals.

- Make 100 pa­per­clips (util­ity func­tion is “Did I make 100 pa­per­clips?”)

- Build 1000 cars

- Write a pa­per on how to cure cancer

Hu­mans re­main in charge, de­ter­mine the goals we want, and let AI figure out how to ac­com­plish them. Still could go wrong, but less likely.

(origi­nally pub­lished on my main blog)