Charbel-Raphael Segerie
https://crsegerie.github.io/
Living in Paris
I read this book two years ago when it was published in French. I found it incredibly exciting to read, and that’s what motivated me to discover this site and then move on to a master’s degree in machine learning.
This book saved me a lot of time in discovering Bayesianism, and made a much deeper change in my way of thinking than if I had simply read a textbook of Bayesian machine learning.
I am of course happy to have read the sequences, but I think I am lucky to have started with the equation of knowledge which is much shorter to read and which provides the theoretical assurances, motivation, main tools, enthusiasm and pedagogy to engage in the quest for Bayesianism.
upvoted.
But ln(370/20)/ln(2) = 4.2. This means that the new strain doubled 4 times between September and mid-November, suggesting a typical doubling time of just over two weeks.
This is approximately what is observed at the end of December.
But indeed, I don’t understand why the number of infected people suddenly decreases at the end of November. An explanation would be helpful.
Where can we find the source saying that there were about 20 cases of new strains in September?
Thanks for posting!
Could you just explain a bit “will only be likely to contain arbitrary patterns of sizes up to log(10^120)” please ? Or give some pointers with other usage of such calculation ?
Why would Vim be important ? I mean, everybody uses VS Code nowadays, and it’s much more easy and versatile and no need to read a book to understand it...
You can SSH directly with VS Code in just one click with the remote extension.
Visual Studio Code lets you perform most tasks directly from the keyboard. You can even use a Vim emulator if you like.
But more importantly, “faster edits gives me a faster iteration time” : when developing complex stuff, your writing speed is clearly not the limiting factor. Using proper file structure visualization and navigation tools is way more important.
Isn’t this paper already a shrieking fire alarm?
Just curious, what’s the problem with “nu” in verbal conversation ?
Interesting.
distillation: because the blog post should be shorter than the main paper.
blending: Because we need to reduce the inferential gap by explaining the prerequisites.
dilution: a corollary of the two previous ones
Either consciousness is a mechanism that has been recruited by evolution for one of its abilities to efficiently integrate information, or consciousness is a type of epiphenomenon that serves no purpose.
Personally I think that consciousness, whatever it is, serves a purpose, and has an importance for the systems that try to sort out the anecdotal information from the information that deserves more extensive consideration. It is possible that this is the only way to effectively process information, and therefore that in trying to program an agi, one naturally comes across it
I agree that today the concept of consciousness is very poorly defined, but I think that in the future it will be possible to define it in a way that will make sense, or at least that we will be able to correct our current intuitions.
How can one tell if a human is conscious?
In humans, we have clues. For example, it is possible to experiment by varying the duration of a light stimulus. There are stimuli of very short duration that are processed by visual areas V1 and V2, but which do not go up to the parietal cortex. For stimuli of slightly longer duration, the light spot induces a conscious response. The subject can then say: yes, I saw the spot. In humans, the study can be done on a declarative basis.
How can one tell if an AI is conscious?
At present, such an experiment cannot be performed on GPT (unless an analogy is made between the duration of a stimulus for humans and the activation in the attention layers of the transformer?)
Indications of consciousness in an AI would be:
- A global workspace, which allows a central piece of information to be accessible from anywhere in the system (e.g. in the schema of the paper attention is all you need, the link between the last encoder and all decoders, or recently in the Socratic models, an implementation of a common language module)
- consciousness seems to be a way of processing information in a complete Turing fashion, but it is quite slow, whereas unconscious processes automatically a multitude of bits in parallel. So I would guess another clue would be a distinction between several modes of functioning:
an automatic mode, system 1, such as GPU parallelized vision networks today
vs a slower mode, system 2, which would be a very general program which cannot be parallelized and should be run on cpu.
- Perhaps an implementation of metacognition or a reflexive system: parts of the neural network that attempt to predict the future or current state of other parts of the neural network. But this point depends on whether one considers the mirror test to be necessary for consciousness.
- ?
Are you using “conscious” and “sentient” as synonyms? → Yes. Maybe sentient is consciousness with only valence but it is basically the same thing. But you’re right, I shouldn’t have used both worlds at the same time without further explanation.
Do you think some, most, or all humans are conscious? → Yes, I think most humans are conscious. Why do you think this? Because I am conscious, but I’m not sure if I really answer your question here.
“I’m not sure what it means for it to be “desirable” for an entity to be conscious.” → Yes this is a good question!
Is it desirable for a stone to be conscious? Um, yeah. I’m not sure what that means.
Is it desirable for a human to be visually conscious? Yes. I don’t want you to become Blindsight.
Is it desirable for a human to be conscious? By extension, yes, I think so.
Yeah, I know, I just wanted to begin answering with this and to present in one sentence (and without mentioning it, my bad...) the concept of neural correlates of consciousness
Super work! It must have required a crazy amount of technical and manual work.
The fact that you manage to reduce the number of failures by 3 orders of magnitude is quite impressive. Did you separate train and test set at the beginning of the project?
In conclusion, using a 300M parameter model to supervise a 10 times larger model seems to work well, it gives a lot of hope.
I do not understand the “we only doubled the amount of effort necessary to generate an adversarial counterexample.”. Aren’t we talking about 3oom?
Ah, “The tool-assisted attack went from taking 13 minutes to taking 26 minutes per example.”
Interesting. Changing the in-distribution (3oom) does not influences much the out-distribution (*2)
There is no fire alarm for AGIs? Maybe just subscribe to the DeepMind RSS feed…
On a more serious note, I’m curious about the internal review process for this article, what role did the DeepMind AI safety team play in it? In the Acknowledgements, there is no mention of their contribution.
Demis Hassabis is not even mentioned in the paper. Does that mean this is considered a minor paper for DeepMind?
Good Post!
Here is another point. The population of a city is constrained by the agricultural area accessible in less than 3-4 days, which is the time corresponding to the storage time of vegetables and fruits. During antiquity, Paris was the biggest town in France and was inhabited by 10,000 inhabitants, which corresponds to the population fed by a circle of arable land within a 3-day radius of oxcarts. If in the future transport becomes more constrained (oil shortage?), we should then expect to see the size of the cities greatly reduced.
If you speak French you can look at Jean-Marc Jancovici’s lecture https://www.youtube.com/watch?v=Ci_cz18A2F8