# Colonization models: a programming tutorial (Part 1/​2)

Introduction

Are we alone in the universe? How likely is our species to survive the transition from a Type 0 to a Type II civilization? The answers to these questions would be of immense interest to our race; however, we have few tools to reason about these questions. This does not stop us from wanting to find answers to these questions, often by employing controversial principles of inference such as ‘anthropic reasoning.’ The reader can find a wealth of stimulating discussion about anthropic reasoning at Katja Grace’s blog, the site from which this post takes its inspiration. The purpose of this post is to give a quantitatively oriented approach to anthropic reasoning, demonstrating how computer simulations and Bayesian inference can be used as tools for exploration.

The central mystery we want to examine is the Fermi paradox: the fact that

1. we are an intelligent civilization

2. we cannot observe any signs that other intelligent civilizations ever existed in the universe

One explanation for the Fermi paradox is that we are the only intelligent civilization in the universe. A far more chilling explanation is that intelligent civilizations emerge quite frequently, but that all other intelligent civilizations that have come before us ended up destroying themselves before they could manage to make their mark on their universe.

We can reason about which of the above two explanations are more likely if we have the audacity to assume a model for the emergence and development of civilizations in universe ‘similar to ours.’ In such a model, it is usually useful to distinguish different ‘types’ of civilizations. Type 0 civilizations are civilizations with similar levels of technology as ourselves. If a Type 0 civilization survives long enough and accumulates enough scientific knowledge, it can make a transition to a Type I civilization—a civilization which has attained mastery of their home planet. A Type I civilization, over time, can transition to a Type II civilization if it colonizes its solar system. We would suppose that a nearby civilization would have to have reached Type II in order for their activities to be prominent enough for us to be able to detect them. In the original terminology, a Type III civilization is one which has mastery of its galaxy, but in this post we take it to mean something else.

The simplest model for the emergence and development of civilizations would have to specify the following:

1. the rate at which intelligent life appears in universes similar to ours;

2. the rate at which these intelligent species transition from Type 0 to Type II, Type III civilizations—or self-destruct in the process;

3. the visibility of Type II and Type III civilizations to Type 0 civilizations elsewhere

5. the speed at which those Type III civilizations can expand and colonize the universe.

In the model we propose in the post, the above parameters are held to be constant throughout the entire history of the universe. The importance of the model is that after given a particular specification of the parameters, we can apply Bayesian inference to see how well the model explains the Fermi paradox. The idea is to simulate many different histories of universes for a given set of parameters, so as to find the expected number of observers who observe the Fermi paradox given a particular specification of the parameters. More details about Bayesian inference given in Part 2 of this tutorial.

This post is targeted at readers who are interested in simulating the emergence and expansion of intelligent civilizations in ‘universes similar to ours’ but who lack the programming knowledge to code these simulations. In this post we will guide the reader through the design and production of a relatively simple universe model and the methodology for doing ‘anthropic’ Bayesian inference using the model.

I. The model

We are going to simulate the entire history, or time-line, of a universe. We assume that the universe has a finite lifespan; readers are invited to attempt to extend the model to allow for indefinite lifespans. For computational simplicity, we will break up the history of the universe into, say, 1000 discrete epochs, where each epoch might represent twenty million years. The epoch number is represented by a variable t ranging from 1 to 1000. The epoch t = 1 does not represent the very beginning of the universe, but rather, the earliest period where intelligent life has a significant chance of appearing; likewise, the last epoch t=1000 would be the last period in which the cosmology of the universe allows for the emergence of intelligent life.

We assume the cosmological model of space as a (the three-dimensional surface) of a four-dimensional unit sphere centered at the origin. That is, the universe is represented as the set of all points (x,y,z,w) where x^2 + y^2 + z^2 + w^2 = 1. Note that since w can be calculated from x, y, z, only the points x, y, z need be stored in the simulation.

Now it is important to correctly specify the speed of light in terms of the units we have chosen for time and space. Assuming the maximum geodesic of the universe is 100 billion light years, and given that each time step is twenty million years, this would put the speed of light as s = 4 x 10^-4 in our units.

One could go further to populate this universe with randomly located habitable planets. However, we would need a lot of planets for a realistic model, so this approach would not be particularly computationally efficient. Instead, we will assume that the number of habitable planets in a region of space is proportional to its volume. This will allow us to do the simulation without specifying the location of the planets.

Now we need to specify the mechanism by which new Type 0 civilizations emerge in our universe. For the sake of simplicity, at most one new Type 0 civilization appears in each time step. (A more realistic model might employ a Poisson distribution to determine the number of new civilizations at each time step.) The location of the new civilization is located uniformly at random within the uninhabited volume of the universe. We will suppose that no new Type 0 civilization can emerge on planets already inhabited by others civilizations. Thus, the probability that a new Type 0 civilization emerges must depend on how much of the universe is uninhabited. We will posit that this probability is a constant factor, a, times the proportion of universe which is uninhabited. A more advanced model might have the propensity for life factor a vary with time, to account for planetary and stellar evolution.

Next we need to specify how existing civilizations transition to the next stage, self-destruct, become visible or colonize other planets. We will use four types of civilizations:

• Type 0. Civilizations at around our stage of development.

• Type II. Civilizations which are developed enough to be visible to others. However, we make a distinction between two classes of Type II civilizations, a distinction not present in the original formulation of the Kardeshev scale.

• Type IIa. Civilizations which have adopted a non-expansionist policy, and which will remain Type IIa civilizations forever.

• Type IIb. Civilizations which may become Type III (expansionist) civilizations in the future

• Type III. We make the extreme assumption that all Type III civilizations are expansionist. That is, once a civilization becomes Type III, it starts colonizing the surrounding space as quickly as possible. Space colonized by Type III civilizations is sterilized, it can not longer produce new Type 0 civilizations.

We have not included the “Type I” civilization in our model, since for our purposes Type I civilizations are virtually identical to Type 0 civilizations. In the same way, Type IIa civilizations would be virtually identical to ‘peaceful’ Type III civilizations for the purposes of out model, which is why hold all Type III civilizations to be expansionist.

Here will be our model for this process:

1. At each time step, the ‘sphere of colonization’ of a type III civilization increases its radius by the maximum speed of colonization, k * s. Here we have the speed multiplier k as constant, but a more realistic model might have the speed of colonization increase as the Type III civilization gains more technology

2. Each Type IIb civilization has a probability e of transitioning to a Type III civilization.

3. Each Type 0 civilization has probability b of self-destructing and probability c of transitioning to Type IIa and probability d of transitioning to Type IIb (and remains Type 0 with probability 1-b-c-d).

4. When a civilization transitions to either Type IIa or Type IIb, it becomes ‘visible.’ A sphere of ‘visibility’ expands from the location of the civilization at the speed of light, and all observers within this expanding sphere become alerted to the existence of the Type II civilization.

5. With probability a, locate a random point x within the unit sphere which is a candidate site for a new Type 0 civilization. If that point is within the colonization sphere of any existing civilization, no new Type 0 civilization is generated. Otherwise, create a new Type 0 civilization at that point.

To sum up, each civilization in our universe has four variables associated with it.

1. The location of its ‘home planet’, in three-dimensional coordinates.

2. Its civilization stage (Type 0, Type IIa, Type IIb, Type III)

In each step of the simulation, we need to store these quantities in a data structure. The data structure for the simulation should have a slot for each time step of the simulation; in each slot, we will store a list of civilizations with their associated quantities.

After we have run the simulation, we can extract various quantities of interest from the data structure, for example, we might want to know:

1. How many Type 0 civilizations exist at each time step?

2. How many of these Type 0 civilizations are in the ‘sphere of visibility’ of a Type II+ civilization, at each time step?

II. Implementation

The program we will be using is the R programming environment, an open-source computing environment commonly used for statistical data analysis. No programming experience is assumed, but readers are expected to be able to learn by example. Conversely, readers with programming skills can skip to the quoted code blocks, reading the explanations when needed.

After you install and run R, you should see a blank terminal. To run the code, simply copy and paste the provided code into the buffer and press ‘enter.’ We give the code in blocks so as to make copying and pasting more convenient. Here is the first block of code—don’t be alarmed! We will walk through the code line-by-line immediately afterwards.

# Example 1
s = 4e-4
a = 0.10
b = 0.20; c = 0.05; d = 0.05
e = 0.10
k = 0.90
civs = list()
earth_coordinates = c(0.1,0.2,0.3,sqrt(.86))
human_civ = list(location=earth_coordinates,type=0,r_v=0,r_c=0,age=0)
zerg_civ = list(location=c(0.2,0.1,0.3,sqrt(.86)),type=3,r_v=0.2,r_c=0.1,age=20)
civs[[1]] = human_civ
civs[[2]] = zerg_civ

Now we’ll explain the code for “Example 1”, starting from the top.

1. The first line is a comment. All of the text to the right of a hash ‘#’ symbol is ignored by the interpreter

2. ‘s = 4e-4’ sets the speed of light to 4 x 10^-4. The ‘e’ stands for “times ten to the power of...”

3. sets the probability a to 1100. Recall this is the base rate for a Type 0 civilization to emerge in a time step.

4. sets the probabilities b, c, d. The semicolon ‘;’ separates multiple commands that are on the same line

5. sets the probability e, (type IIb → type III transition)

6. sets the constant k to 0.9. That is, the maximum speed of colonization is taken to be 90% of the speed of light.

7. civs is a data structure for storing the information of all the civilizations which exist at a particular time step. It will be a list, with an element for each civilization in existence at that time

8. earth_coordinates is a vector with three elements: (0.1, 0.2, 0.3). The c() function forms a vector out of the enclosed numbers. As you might guess, this particular vector is the spatial coordinates of planet Earth in our example.

9. human_civ is a data structure containing all the information needed to describe (our) civilization: location will be the vector given by earth_coordinates: (0.1,0.2,0.3); type=0 indicates a Type 0 civilization, r_v is the radius of our sphere of visibility, which is zero because we haven’t reached Type II yet. r_c is the radius of our sphere of colonization, which is zero because we haven’t reached Type III yet.

10. zerg_civ holds the information for another civilization, which is older than ours by 400 million years. It has a home planet with coordinates (0.2,0.1,0.2), which has reached Type III, and which can be seen by all observers within distance 0.2, and which has colonized all planets within distance 0.1 of its home planet!

11. civs[[1]] = human_civ sets the first element of the list civs to be the data structure encapsulation information about our civilization. Yes, you can have the elements of lists be other lists—we will be doing that a lot.

12. civs[[2]] = zerg_civ sets the second element of the list civs to be the information about the Zerg civilization

(In this code, we will have the parameters a, b, c, d, e, k,s be global variables. This is not the best coding practice, but it simplifies the code for this tutorial.)

It is certainly cause for alarm that there is a hostile Type III civilization in our vicinity. Can we see them? Have we already been colonized by them? To answer these questions, we need to program a distance function. We’ll be lazy here and use the Euclidean distance, instead of calculating the length of the geodesic on a 3-sphere.

# Code Block 1
distance = function(x,y) {
return(sqrt(sum((x-y)^2)))
}

1. distance is the name of the function. = function(x,y) tells R that this is to be a function with two arguments, x and y. The { brace, or ‘begin-block symbol’, means that the subsequent code will be part of the functionthe return() command is for returning the ‘answer’ computed by the function. In this case, the answer is the Euclidean distance of the vectors x and y; sqrt() evaluates the square root of the argument. sum() evaluates the sum of the components of a vector. ^2 applied to a vector squares each component of the vector.

2. } or ‘end-block symbol’ indicates that we are done writing the code for the function

After you copy and paste the above code, there will be a new function in the environment, distance(). We will use it now

# Example 2
x = human_civ\$location; x
y = zerg_civ\$location; y
d = distance(x,y); d
d < zerg_civ\$r_v
if (d < zerg_civ\$r_v) { print(“enemy sighted!”) }
if (d < zerg_civ\$r_c) { print(“war has begun...”) }

Line 2: look back at how we defined the list human_civ. The \$ symbol allows you to access the element of that list labeled as the ‘location.’

Line 4: Recall x is the coordinates of our planet, and y is the coordinates of the zerg home planet. We call the function we just defined to find the distance between our home planet and the zerg home planet, and then we store the answer in a new variable, d

Line 5: A boolean expression: is d less than the zerg’s radius of visibility?

Line 6 & 7: The if() statement executes the code in brackets only if the statement in the parentheses is true

The ‘<’ symbol is obviously the comparison ‘less than’. Less obvious: ‘>=’ for greater-or-equal-to, ‘==’ for equal to. A common newbie mistake is to confuse the assignment ‘=’ symbol for the equality ‘==’ symbol—they have very different meanings!

EXERCISE 1: Add some additional civilizations to the list civs manually. We code the civilization types as follows

• type=0 for Type 0 civs

• type=1 for Type IIa (non-expansionist) civs

• type=2 for Type IIb civs

• type=3 for Type III (expansionist) civs

Now we start on coding the simulation step. Recall the simulation step consist of two parts. In the first step, we roll to see if any of the existing civilizations transition to another stage, and we update the radii of visibility and colonization for Type II+ civilizations. In the second step, we roll to see if a new Type 0 civilization emerges.

We will code a separate transition function for each Type of civilization. We’ll start with the Type IIb and Type III civilizations, they are the easiest.

# Code Block 2
transition_type2b = function(civ) {
civ\$r_v = civ\$r_v + s
civ
}
transition_type3 = function(civ) {
civ\$r_v = civ\$r_v + s
civ\$r_c = civ\$r_c + k * s
civ
}

Notice that there is no return() statement in either function. That is because a function automatically returns the value in the last line (civ in this case) whenever it makes it all the way to the last line.

Now, test to see if the transition function works properly on the Zerg civ. Also test the transition for Type 2b planets on any Type 2b planet you created in Exercise 1.

# Example 3
zerg_civ = civs[[2]]
zerg_civ = transition_type3(zerg_civ)
zerg_civ
civs[[2]] = zerg_civ

Notice how the transition function works. The transition function returns an updated version of the civ it was given. But it does not update the original civ automatically: you have to make sure that the updated state gets stored in the appropriate place.

Now, we want to make a function which automatically chooses the right transition for a given civ. Also, we can make it keep track of the civilization’s age.

# Code Block 3
transition_civ = function(civ) {
civ\$age = civ\$age + 1
type = civ\$type
if (type == 0) { civ = transition_type0(civ) }
if (type == 1) { civ = transition_type2a(civ) }
if (type == 2) { civ = transition_type2b(civ) }
if (type == 3) { civ = transition_type3(civ) }
civ
}

EXERCISE: What happens if you call the transition() function on a dead (type=-1) civilization?

In order to handle the transition rolls for type 0 and type IIa civilizations, we need a way to generate random numbers. R has a built-in function to generate random numbers from 0 to 1, the runif() function, which we demonstrate.

# Code Block 4
transition_type2a = function(civ) {
civ\$r_v = civ\$r_v + s
if (runif(1) < e) {
civ\$type = 3
}
civ
}

The command runif(1) returns one random number between 0 and 1. See that the boolean statement (runif(1) < e) is true with probability e.

Now the transition for a Type 0 civilization is more complicated: it has a probability of dying, of going to Type IIa, of going to Type IIb, or staying type 0, and all these events are mutually exclusive. There is a simple trick for handling all that with a single roll.

# Code Block 5
transition_type0 = function(civ) {
roll = runif(1)
if (roll < b) { civ\$type = −1; return(civ) }
roll = roll—b
if (roll < c) { civ\$type = 1; return(civ) }
roll = roll—c
if (roll < d) { civ\$type = 2; return(civ) }
civ
}

EXERCISE: Does the above code indeed transition the civ to: dead with probability b, Type IIa with probability c, Type IIb with probability d?

Finally, we make a function for automatically applying the transition function to every civ in a list. How we will go through the list is to use a for() loop to access each of the elements civs[[1]], civs[[2]], all the way up to civs[[no_civs]].

# Code Block 6
transition_all = function(civs) {
no_civs = length(civs)
if (no_civs == 0) { return(civs) }
for (i in 1:no_civs) {
civ = civs[[i]]
civ = transition_civ(civ)
civs[[i]] = civ
}
civs
}

1. [comment]

2. Recall the usage of the function: you should call civs = transition_all(civs) to update the list civs

3. This step is important since R will give an error if you try to access an element in an empty list

4. The function length() returns the number of elements contained in a list.

5. This for loop creates a variable i initialized at 1. At the end of every iteration, it increments i by one. It stops iterating when i exceeds no_civs.

6. Recall this accesses the ith element

8. Store the updated civ in the list

9. Marks the end of the for() loop

10. Returns the updates list civs

EXERCISE: Check to see that this properly updates the civilizations in your list. And by the way, what civilization does Humanity become in the end?

Now the last step of the simulation is to randomly decide if a new civilization emerges.

PLANET-GENERATION PROCEDURE

1. Generate a random number from 0 to 1. If it is less than a, no new civilization emerges, and we exit the procedure.

2. Supposing we passed step 1, now we pick a random point x in the universe. If that point is in an uninhabited region of space we create a new Type 0 civilization with home planet located at x. But if point x is located within the sphere of colonization of any Type III civilization, no new civilization is generated.

For this, we need a way to draw random points on the unit 3-sphere, and a method for checking whether a point lies within the sphere of colonization of an existing civilization. There is a cool trick from probability theory for generating points from the surface of an N-sphere, which takes advantage of the fact that the multivariate Gaussian has spherical symmetry. The rnorm() function samples from the standard Gaussian with mean 0 and variance 1. After we generate a vector which has 4 independently and identically distributed mean 0 Gaussian components, all we have to do is normalize it so that it has radius 1.

# Code Block 7
random_in_sphere = function() {
x = rnorm(4)
r = sqrt(sum(x^2))
return(x/​r)
}

EXERCISE: Try out the new function by entering the command random_in_sphere(), see that you get random points with radius 1.

Next we need to code the function is_uncolonized(x), which returns FALSE if the point x is within the sphere of colonization of an existing civilization, and TRUE otherwise. First, we check to see if there are any existing civilizations in the universe at all. If not, there is nothing to check, and we return TRUE instantly. But if there are any existing civilizations, we will have to see if our point x is within the colonization sphere of any of the existing civilizations. We will go through the list of existing civilizations one by one. using a for() loop. If at any point we find that x is within a colonization sphere, we return FALSE. However, if we make it through the for() loop, that means the point x was in none of the colonization spheres, so we can return TRUE.

# Code Block 8
is_uncolonized = function(civs,x) {
no_civs = length(civs)
if (no_civs == 0) {
return(TRUE)
}
for (i in 1:no_civs) {
current_civ = civs[[i]]
y = current_civ\$location
if (distance(x,y) < current_civ\$r_c) {
return(FALSE)
}
}
return(TRUE)
}

EXERCISE: Test these functions.

We are almost done with the simulation part of the code. We write a function to implement the planet generation step, and another to handle the simulation step in its entirety.

# Code Block 9
planet_generation = function(civs) {
no_civs = length(civs)
# with probability 1-a, no is created civilization
if (runif(1) > a) { return(civs) }
#now we pick a random point
x = random_in_sphere()
#is it uninhabited?
if (is_uncolonized(civs,x)) {
#yay! now we get to add a new civilization
no_civs = no_civs + 1
new_civ = list(location=x, type=0, r_v=0, r_c=0)
civs[[no_civs]] = new_civ
}
civs
}
simulation_step = function(civs) {
civs = transition_all(civs)
civs = planet_generation(civs)
civs
}

EXERCISE: Run a few simulation steps. Increase the parameters a, b, etc. if things are too slow.

Now, we’ll write a function for generating the entire simulation history. The function will return a list, history, whose tth element will be the list of civilizations civs at time step t of the simulation.

# Code Block 10
simulation = function() {
history = list()
civs = list()
for (t in 1:1000) {
civs = simulation_step(civs)
history[[t]] = civs
}
history
}

EXERCISE: Run an entire simulation. How many civilizations are around at time 1000?

Fantastic! Now we can generate entire simulated histories of universes with adjustable parameters! Now what?

The reader quickly finds that it’s a cumbersome task to manually inspect all of these history objects generated by the simulation.

We leave it as a challenge to any reader experienced in R to come up with a function for visually displaying history objects in a pleasing manner.

In the second part of this tutorial, we will generate lots and lots of history objects, and write functions for extracting relevant information from these objects, in order to do Bayesian inference.

EXERCISE: Modify the simulation so that all civilizations of type II or lower which come within the colonization sphere of a Type III civilization are destroyed. (Remember that destroyed Type II civs are still ‘visible’!)

EXERCISE: Implement the correct distance function for the length of the geodesic between two points on a 3-sphere. Does this affect results in Part 2?