Due to Python’s style of reference passing, most of these print statements will show matching id values even if you use any kind of object, not just True/False. Try to predict the output here, then run it to check:
def compare(x, y):
print(x == y, id(x) == id(y), x is y)
a = {"0": "1"}
b = {"0": "1"}
print(a == b, id(a) == id(b), a is b)
compare(a, b)
c = a
d = a
print(c == d, id(c) == id(d), c is d)
compare(c, d)
When I was coming up with an answer to this question, I got stuck on what the operator is did. I only had a vague sense of how to use it—I knew comparison with None was done via is but didn’t know why—so I had to look up what is actually did.
Identity comparisons
The operators “is” and “is not” test for an object’s identity: “x is y” is true if and only if x and y are the same object. An Object’s identity is determined using the “id()” function. “x is not y” yields the inverse truth value.
Here’s the doc for id():
id(obj, /)
Return the identity of an object.
This is guaranteed to be unique among simultaneously existing objects.
(CPython uses the object’s memory address.)
Then I understood that is would literally check if two objects are the same object. So in the above example we’d get True False False from print(a == b, id(a) == id(b), a is b) and True True True from print(c == d, id(c) == id(d), c is d).
Object Storage in Memory
Speaking of checking if two objects being the same object stored in the same location in memory, gilch made more comments about object storage models (paraphrased):
Compared to C/C++, Python has a more consistent object storage model: everything is an object, only references to objects are stored on the stack, pointing to the actual objects stored in the heap. This means that Python objects are scattered all over the place. One important aspect of CPU optimization is caching contiguous blocks of memory in CPU caches, but Python’s model cause cache-miss to be high since two objects adjacent to each other in memory are likely unrelated. This performance degradation is the price for Python’s simple memory model.
For computing tasks that have high requirement on performance, NumPy is optimized for making use of blocks of contiguous memory.
== and is
Some remarks gilch made about == and is:
The == operator calls the __eq__ method of an object. The default __eq__ inherits from is, and does a check if two objects are the same object. (Source?) We can have two instances of a number, but not two instances of a True or False.
{} is used to represent both sets and dictionaries, but {} itself would be interpreted as an empty dictionary instead of an empty set:
>>> type({})
<class 'dict'>
To make an empty set, we’d use the set() constructor:
>>> set()
set()
Gilch gave me a puzzle: make an empty set without using the set() constructor.
I came up with the answer {1} - {1} pretty quickly, but gilch had another solution in mind that did not involve using any numbers or letters. Hint: passing in iterables to a constructor results in different values than passing in the same iterables in expressions:
Using splat, the other way to make an empty set without using the set() constructor is
>>> {*[]}
set()
Magic Methods for Attributes (Continued from last time)
When I was working on the solution that involved modifying the __dict__ last time, I was getting pretty confused about the difference between dir(), vars() and __dict__.
Gilch started by asking me to construct a simple class and making an instance:
class SimpleClass:
def __init__(self, x):
self.x = x
sc = SimpleClass(42)
Then we listed out the attributes of sc in different ways:
The difference between dir and vars is that dir returns all attributes of an object, including the attributes of its class and attributes inherited from its superclasses; on the other hand, vars only returns attributes stored in the default __dict__ attribute, which excludes inherited attributes. This StackOverflow question goes into more details.
__mro__
__mro__ stands for “method resolution order,” which provides the inheritance path from the current class all the way up to object. It is honestly the most handy tool I’ve learned from this session.
Note that __mro__ is a class attribute, not an instance attribute:
>>> sc.__mro__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass' object has no attribute '__mro__'
>>> type(sc).__mro__
(<class '__main__.SimpleClass'>, <class 'object'>)
Magic Methods for Attributes
Now we can verify that dir(sc) returns the sum of vars(sc), vars(SimpleClass) and vars(object):
>>> vars(sc)
{'x': 42}
>>> vars(type(sc))
mappingproxy({'__module__': '__main__', '__init__': <function SimpleClass.__init__ at 0x7f2ce3b79dc0>, '__dict__': <attribute '__dict__' of 'SimpleClass' objects>, '__weakref__': <attribute '__weakref__' of 'SimpleClass' objects>, '__doc__': None})
>>> vars(object)
mappingproxy({'__repr__': <slot wrapper '__repr__' of 'object' objects>, '__hash__': <slot wrapper '__hash__' of 'object' objects>, '__str__': <slot wrapper '__str__' of 'object' objects>, '__getattribute__': <slot wrapper '__getattribute__' of 'object' objects>, '__setattr__': <slot wrapper '__setattr__' of 'object' objects>, '__delattr__': <slot wrapper '__delattr__' of 'object' objects>, '__lt__': <slot wrapper '__lt__' of 'object' objects>, '__le__': <slot wrapper '__le__' of 'object' objects>, '__eq__': <slot wrapper '__eq__' of 'object' objects>, '__ne__': <slot wrapper '__ne__' of 'object' objects>, '__gt__': <slot wrapper '__gt__' of 'object' objects>, '__ge__': <slot wrapper '__ge__' of 'object' objects>, '__init__': <slot wrapper '__init__' of 'object' objects>, '__new__': <built-in method __new__ of type object at 0x955f60>, '__reduce_ex__': <method '__reduce_ex__' of 'object' objects>, '__reduce__': <method '__reduce__' of 'object' objects>, '__subclasshook__': <method '__subclasshook__' of 'object' objects>, '__init_subclass__': <method '__init_subclass__' of 'object' objects>, '__format__': <method '__format__' of 'object' objects>, '__sizeof__': <method '__sizeof__' of 'object' objects>, '__dir__': <method '__dir__' of 'object' objects>, '__class__': <attribute '__class__' of 'object' objects>, '__doc__': 'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'})
>>> type(sc)
<class '__main__.SimpleClass'>
>>> list(vars(sc).keys()) + list(vars(SimpleClass).keys()) + list(vars(object).keys())
['x', '__module__', '__init__', '__dict__', '__weakref__', '__doc__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__init__', '__new__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__', '__doc__']
>>> set(_) == set(dir(sc))
True
Why did we need to covert the two lists to sets when comparing them at the end?
Two of the attributes, __init__ and __doc__, were overridden.
>>> SimpleClass.__init__
<function SimpleClass.__init__ at 0x7f2ce3b79dc0>
>>> object.__init__
<slot wrapper '__init__' of 'object' objects>
>>> SimpleClass.__doc__
>>> object.__doc__
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'
Inheritance and __mro__
Noticing that I didn’t understand inheritance completely, gilch gave another example.
class SimpleClass:
def __init__(self, x):
self.x = x
x = 42
class SimpleClass2:
x = 24
class SimpleClass3(SimpleClass, SimpleClass2):
pass
Here, SimpleClass3 inherits from SimpleClass and SimpleClass2. Both SimpleClass and SimpleClass2 have implemented class method x, which one would SimpleClass3 have?
class SimpleClass4:
__slots__ = ()
sc4 = SimpleClass4()
>>> sc4.__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object has no attribute '__dict__'
>>> SimpleClass4.x = 42
>>> sc4.x = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object attribute 'x' is read-only
What happened here is that by overriding __slots__ we have restricted the __dict__ attribute of any instance of SimpleClass4. Not adding instance methods means less memory used.
We get a tuple object when we call NewTuple(). However, this only works for subtypes of the superclass of the current class. If we pass in—listwhich is not a subclass of—tuplewe would get an error:
class NewTuple(tuple):
def __init__(self, x):
print(x)
def __new__(cls, y):
return super().__new__(list, [y]) # passing in list instead of tuple
>>> NewTuple(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "code.py", line 8, in __new__
return super().__new__(list, [y])
TypeError: tuple.__new__(list): list is not a subtype of tuple
Of course, we can always pass in the current class to make the constructor return an instance of the current class:
class NewTuple(tuple):
def __init__(self, x):
print(x)
def __new__(cls, y):
return super().__new__(cls, [y]) # passing in cls
Next puzzle from gilch: make a @trace decorator that prints inputs and return values.
I came up with a first pass solution:
def trace(f):
return lambda *args: print(*args, f(*args))
@trace
def addition(x, y):
return x + y
>>> addition(2, 5)
2 5 7
Then gilch added a condition: the decorated function still needs to return the same value as the undecorated version.
I was pretty stumped on this one. It seemed that I’d need two different statements in the lambda function returned by the decorator for this to work, one to do the printing and the other one to return the value. So gilch gave me a hint: Think about what the expression print('hi') or 1 + 2 evaluates to. Then it occurred to me that, since print returns None, I could use or to combine statements as long as only one of them evaluates to something with boolean value True. After an attempt, I also realized that the statement that produces the True value would need to come last to prevent the expression evaluation being short-circuited.
def trace(f):
r = []
return lambda *args, **kwargs: r.append(f(*args, **kwargs)) or print(*r, args, kwargs) or r.pop()
@trace
def addition(x, y):
return x + y
Gilch asked me to write a function named progn that takes any number of parameters and only returns the last one. Using progn, we can get rid of the or’s:
def progn(*args):
return args[-1]
def trace(f):
r = []
return lambda *args, **kwargs: progn(r.append(f(*args, **kwargs)), print(*r, args, kwargs), r.pop())
@trace
def addition(x, y):
return x + y
def progn(*args):
return args[-1]
def trace(f):
return lambda *args, **kwargs: progn(
r := f(*args, **kwargs), # moved r inside of the lambda
print(args, kwargs, r),
r)
@trace
def addition(x, y):
return x + y
>>> addition(2, y=3)
(2,) {'y': 3}
5
Earlier I was stumped because I wanted to put two statements inside a lambda function but couldn’t. With progn and :=, it’s possible to combine multiple statements into one, so effectively create a lambda with multiple statements.
An Apprentice Experiment in Python Programming, Part 4
[Note to readers: The Jupyter notebook version of this post is here]
Previously: https://www.lesswrong.com/posts/fKTqwbGAwPNm6fyEH/an-apprentice-experiment-in-python-programming-part-3
Python Objects in Memory (from comments)
In the previous post, purge commented:
When I was coming up with an answer to this question, I got stuck on what the operator
isdid. I only had a vague sense of how to use it—I knew comparison withNonewas done viaisbut didn’t know why—so I had to look up whatisactually did.Here’s the doc for
id():Then I understood that
iswould literally check if two objects are the same object. So in the above example we’d getTrue False Falsefromprint(a == b, id(a) == id(b), a is b)andTrue True Truefromprint(c == d, id(c) == id(d), c is d).Object Storage in Memory
Speaking of checking if two objects being the same object stored in the same location in memory, gilch made more comments about object storage models (paraphrased):
Compared to C/C++, Python has a more consistent object storage model: everything is an object, only references to objects are stored on the stack, pointing to the actual objects stored in the heap. This means that Python objects are scattered all over the place. One important aspect of CPU optimization is caching contiguous blocks of memory in CPU caches, but Python’s model cause cache-miss to be high since two objects adjacent to each other in memory are likely unrelated. This performance degradation is the price for Python’s simple memory model.
For computing tasks that have high requirement on performance, NumPy is optimized for making use of blocks of contiguous memory.
==andisSome remarks gilch made about
==andis:The
==operator calls the__eq__method of an object. The default__eq__inherits fromis, and does a check if two objects are the same object. (Source?) We can have two instances of a number, but not two instances of aTrueorFalse.{}and Set ConstructorWe went into a tangent where gilch checked my understanding of sets. We encountered some corner cases like Python interpreting
Trueas1andFalseas0:{}is used to represent both sets and dictionaries, but{}itself would be interpreted as an empty dictionary instead of an empty set:To make an empty set, we’d use the
set()constructor:Gilch gave me a puzzle: make an empty set without using the
set()constructor.I came up with the answer
{1} - {1}pretty quickly, but gilch had another solution in mind that did not involve using any numbers or letters. Hint: passing in iterables to a constructor results in different values than passing in the same iterables in expressions:Using splat, the other way to make an empty set without using the
set()constructor isMagic Methods for Attributes (Continued from last time)
When I was working on the solution that involved modifying the
__dict__last time, I was getting pretty confused about the difference betweendir(),vars()and__dict__.Gilch started by asking me to construct a simple class and making an instance:
Then we listed out the attributes of
scin different ways:The difference between
dirandvarsis thatdirreturns all attributes of an object, including the attributes of its class and attributes inherited from its superclasses; on the other hand,varsonly returns attributes stored in the default__dict__attribute, which excludes inherited attributes. This StackOverflow question goes into more details.__mro____mro__stands for “method resolution order,” which provides the inheritance path from the current class all the way up toobject. It is honestly the most handy tool I’ve learned from this session.Note that
__mro__is a class attribute, not an instance attribute:Magic Methods for Attributes
Now we can verify that
dir(sc)returns the sum ofvars(sc),vars(SimpleClass)andvars(object):Why did we need to covert the two lists to sets when comparing them at the end?
Two of the attributes,
__init__and__doc__, were overridden.Inheritance and
__mro__Noticing that I didn’t understand inheritance completely, gilch gave another example.
Here,
SimpleClass3inherits fromSimpleClassandSimpleClass2. BothSimpleClassandSimpleClass2have implemented class methodx, which one wouldSimpleClass3have?However, this changes when we switch the order of inheritance:
So the inheritance order decides which superclass takes precedence. The Python documentation on method resolution order as well as this talk gives more detailed explanations of the algorithm.
__slots____slots__is used for saving memory.What happened here is that by overriding
__slots__we have restricted the__dict__attribute of any instance ofSimpleClass4. Not adding instance methods means less memory used.As we can see here,
sc4does not have a__dict__attribute here, sovars(sc4)has become invalid too.Accessing Attributes of a Superclass
Next, gilch provided an example of using the keyword
super. First, we create a classNewTuplethat inherits fromtuple:Then we can access the constructor of the superclass by calling
super().__new__and passing in thetupleclass as the first argument:We get a
tupleobject when we callNewTuple(). However, this only works for subtypes of the superclass of the current class. If we pass in—listwhich is not a subclass of—tuplewe would get an error:Of course, we can always pass in the current class to make the constructor return an instance of the current class:
Trace
Next puzzle from gilch: make a
@tracedecorator that prints inputs and return values.I came up with a first pass solution:
Then gilch added a condition: the decorated function still needs to return the same value as the undecorated version.
I was pretty stumped on this one. It seemed that I’d need two different statements in the lambda function returned by the decorator for this to work, one to do the printing and the other one to return the value. So gilch gave me a hint: Think about what the expression
print('hi') or 1 + 2evaluates to. Then it occurred to me that, sinceprintreturnsNone, I could useorto combine statements as long as only one of them evaluates to something with boolean valueTrue. After an attempt, I also realized that the statement that produces theTruevalue would need to come last to prevent the expression evaluation being short-circuited.Progn
Gilch asked me to write a function named
prognthat takes any number of parameters and only returns the last one. Usingprogn, we can get rid of theor’s:Assignment Expression
Gilch introduced the assignment expression, and we rewrote the solution to use it:
Earlier I was stumped because I wanted to put two statements inside a lambda function but couldn’t. With
prognand:=, it’s possible to combine multiple statements into one, so effectively create a lambda with multiple statements.