Due to Python’s style of reference passing, most of these print statements will show matching id values even if you use any kind of object, not just True/False. Try to predict the output here, then run it to check:
def compare(x, y):
print(x == y, id(x) == id(y), x is y)
a = {"0": "1"}
b = {"0": "1"}
print(a == b, id(a) == id(b), a is b)
compare(a, b)
c = a
d = a
print(c == d, id(c) == id(d), c is d)
compare(c, d)
When I was coming up with an answer to this question, I got stuck on what the operator is did. I only had a vague sense of how to use it—I knew comparison with None was done via is but didn’t know why—so I had to look up what is actually did.
Identity comparisons
The operators “is” and “is not” test for an object’s identity: “x is y” is true if and only if x and y are the same object. An Object’s identity is determined using the “id()” function. “x is not y” yields the inverse truth value.
Here’s the doc for id():
id(obj, /)
Return the identity of an object.
This is guaranteed to be unique among simultaneously existing objects.
(CPython uses the object’s memory address.)
Then I understood that is would literally check if two objects are the same object. So in the above example we’d get True False False from print(a == b, id(a) == id(b), a is b) and True True True from print(c == d, id(c) == id(d), c is d).
Object Storage in Memory
Speaking of checking if two objects being the same object stored in the same location in memory, gilch made more comments about object storage models (paraphrased):
Compared to C/C++, Python has a more consistent object storage model: everything is an object, only references to objects are stored on the stack, pointing to the actual objects stored in the heap. This means that Python objects are scattered all over the place. One important aspect of CPU optimization is caching contiguous blocks of memory in CPU caches, but Python’s model cause cache-miss to be high since two objects adjacent to each other in memory are likely unrelated. This performance degradation is the price for Python’s simple memory model.
For computing tasks that have high requirement on performance, NumPy is optimized for making use of blocks of contiguous memory.
== and is
Some remarks gilch made about == and is:
The == operator calls the __eq__ method of an object. The default __eq__ inherits from is, and does a check if two objects are the same object. (Source?) We can have two instances of a number, but not two instances of a True or False.
{} is used to represent both sets and dictionaries, but {} itself would be interpreted as an empty dictionary instead of an empty set:
>>> type({})
<class 'dict'>
To make an empty set, we’d use the set() constructor:
>>> set()
set()
Gilch gave me a puzzle: make an empty set without using the set() constructor.
I came up with the answer {1} - {1} pretty quickly, but gilch had another solution in mind that did not involve using any numbers or letters. Hint: passing in iterables to a constructor results in different values than passing in the same iterables in expressions:
Using splat, the other way to make an empty set without using the set() constructor is
>>> {*[]}
set()
Magic Methods for Attributes (Continued from last time)
When I was working on the solution that involved modifying the __dict__ last time, I was getting pretty confused about the difference between dir(), vars() and __dict__.
Gilch started by asking me to construct a simple class and making an instance:
class SimpleClass:
def __init__(self, x):
self.x = x
sc = SimpleClass(42)
Then we listed out the attributes of sc in different ways:
The difference between dir and vars is that dir returns all attributes of an object, including the attributes of its class and attributes inherited from its superclasses; on the other hand, vars only returns attributes stored in the default __dict__ attribute, which excludes inherited attributes. This StackOverflow question goes into more details.
__mro__
__mro__ stands for “method resolution order,” which provides the inheritance path from the current class all the way up to object. It is honestly the most handy tool I’ve learned from this session.
Note that __mro__ is a class attribute, not an instance attribute:
>>> sc.__mro__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass' object has no attribute '__mro__'
>>> type(sc).__mro__
(<class '__main__.SimpleClass'>, <class 'object'>)
Magic Methods for Attributes
Now we can verify that dir(sc) returns the sum of vars(sc), vars(SimpleClass) and vars(object):
>>> vars(sc)
{'x': 42}
>>> vars(type(sc))
mappingproxy({'__module__': '__main__', '__init__': <function SimpleClass.__init__ at 0x7f2ce3b79dc0>, '__dict__': <attribute '__dict__' of 'SimpleClass' objects>, '__weakref__': <attribute '__weakref__' of 'SimpleClass' objects>, '__doc__': None})
>>> vars(object)
mappingproxy({'__repr__': <slot wrapper '__repr__' of 'object' objects>, '__hash__': <slot wrapper '__hash__' of 'object' objects>, '__str__': <slot wrapper '__str__' of 'object' objects>, '__getattribute__': <slot wrapper '__getattribute__' of 'object' objects>, '__setattr__': <slot wrapper '__setattr__' of 'object' objects>, '__delattr__': <slot wrapper '__delattr__' of 'object' objects>, '__lt__': <slot wrapper '__lt__' of 'object' objects>, '__le__': <slot wrapper '__le__' of 'object' objects>, '__eq__': <slot wrapper '__eq__' of 'object' objects>, '__ne__': <slot wrapper '__ne__' of 'object' objects>, '__gt__': <slot wrapper '__gt__' of 'object' objects>, '__ge__': <slot wrapper '__ge__' of 'object' objects>, '__init__': <slot wrapper '__init__' of 'object' objects>, '__new__': <built-in method __new__ of type object at 0x955f60>, '__reduce_ex__': <method '__reduce_ex__' of 'object' objects>, '__reduce__': <method '__reduce__' of 'object' objects>, '__subclasshook__': <method '__subclasshook__' of 'object' objects>, '__init_subclass__': <method '__init_subclass__' of 'object' objects>, '__format__': <method '__format__' of 'object' objects>, '__sizeof__': <method '__sizeof__' of 'object' objects>, '__dir__': <method '__dir__' of 'object' objects>, '__class__': <attribute '__class__' of 'object' objects>, '__doc__': 'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'})
>>> type(sc)
<class '__main__.SimpleClass'>
>>> list(vars(sc).keys()) + list(vars(SimpleClass).keys()) + list(vars(object).keys())
['x', '__module__', '__init__', '__dict__', '__weakref__', '__doc__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__init__', '__new__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__', '__doc__']
>>> set(_) == set(dir(sc))
True
Why did we need to covert the two lists to sets when comparing them at the end?
Two of the attributes, __init__ and __doc__, were overridden.
>>> SimpleClass.__init__
<function SimpleClass.__init__ at 0x7f2ce3b79dc0>
>>> object.__init__
<slot wrapper '__init__' of 'object' objects>
>>> SimpleClass.__doc__
>>> object.__doc__
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'
Inheritance and __mro__
Noticing that I didn’t understand inheritance completely, gilch gave another example.
class SimpleClass:
def __init__(self, x):
self.x = x
x = 42
class SimpleClass2:
x = 24
class SimpleClass3(SimpleClass, SimpleClass2):
pass
Here, SimpleClass3 inherits from SimpleClass and SimpleClass2. Both SimpleClass and SimpleClass2 have implemented class method x, which one would SimpleClass3 have?
class SimpleClass4:
__slots__ = ()
sc4 = SimpleClass4()
>>> sc4.__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object has no attribute '__dict__'
>>> SimpleClass4.x = 42
>>> sc4.x = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object attribute 'x' is read-only
What happened here is that by overriding __slots__ we have restricted the __dict__ attribute of any instance of SimpleClass4. Not adding instance methods means less memory used.
We get a tuple object when we call NewTuple(). However, this only works for subtypes of the superclass of the current class. If we pass in—listwhich is not a subclass of—tuplewe would get an error:
class NewTuple(tuple):
def __init__(self, x):
print(x)
def __new__(cls, y):
return super().__new__(list, [y]) # passing in list instead of tuple
>>> NewTuple(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "code.py", line 8, in __new__
return super().__new__(list, [y])
TypeError: tuple.__new__(list): list is not a subtype of tuple
Of course, we can always pass in the current class to make the constructor return an instance of the current class:
class NewTuple(tuple):
def __init__(self, x):
print(x)
def __new__(cls, y):
return super().__new__(cls, [y]) # passing in cls
Next puzzle from gilch: make a @trace decorator that prints inputs and return values.
I came up with a first pass solution:
def trace(f):
return lambda *args: print(*args, f(*args))
@trace
def addition(x, y):
return x + y
>>> addition(2, 5)
2 5 7
Then gilch added a condition: the decorated function still needs to return the same value as the undecorated version.
I was pretty stumped on this one. It seemed that I’d need two different statements in the lambda function returned by the decorator for this to work, one to do the printing and the other one to return the value. So gilch gave me a hint: Think about what the expression print('hi') or 1 + 2 evaluates to. Then it occurred to me that, since print returns None, I could use or to combine statements as long as only one of them evaluates to something with boolean value True. After an attempt, I also realized that the statement that produces the True value would need to come last to prevent the expression evaluation being short-circuited.
def trace(f):
r = []
return lambda *args, **kwargs: r.append(f(*args, **kwargs)) or print(*r, args, kwargs) or r.pop()
@trace
def addition(x, y):
return x + y
Gilch asked me to write a function named progn that takes any number of parameters and only returns the last one. Using progn, we can get rid of the or’s:
def progn(*args):
return args[-1]
def trace(f):
r = []
return lambda *args, **kwargs: progn(r.append(f(*args, **kwargs)), print(*r, args, kwargs), r.pop())
@trace
def addition(x, y):
return x + y
def progn(*args):
return args[-1]
def trace(f):
return lambda *args, **kwargs: progn(
r := f(*args, **kwargs), # moved r inside of the lambda
print(args, kwargs, r),
r)
@trace
def addition(x, y):
return x + y
>>> addition(2, y=3)
(2,) {'y': 3}
5
Earlier I was stumped because I wanted to put two statements inside a lambda function but couldn’t. With progn and :=, it’s possible to combine multiple statements into one, so effectively create a lambda with multiple statements.
An Apprentice Experiment in Python Programming, Part 4
[Note to readers: The Jupyter notebook version of this post is here]
Previously: https://www.lesswrong.com/posts/fKTqwbGAwPNm6fyEH/an-apprentice-experiment-in-python-programming-part-3
Python Objects in Memory (from comments)
In the previous post, purge commented:
When I was coming up with an answer to this question, I got stuck on what the operator
is
did. I only had a vague sense of how to use it—I knew comparison withNone
was done viais
but didn’t know why—so I had to look up whatis
actually did.Here’s the doc for
id()
:Then I understood that
is
would literally check if two objects are the same object. So in the above example we’d getTrue False False
fromprint(a == b, id(a) == id(b), a is b)
andTrue True True
fromprint(c == d, id(c) == id(d), c is d)
.Object Storage in Memory
Speaking of checking if two objects being the same object stored in the same location in memory, gilch made more comments about object storage models (paraphrased):
Compared to C/C++, Python has a more consistent object storage model: everything is an object, only references to objects are stored on the stack, pointing to the actual objects stored in the heap. This means that Python objects are scattered all over the place. One important aspect of CPU optimization is caching contiguous blocks of memory in CPU caches, but Python’s model cause cache-miss to be high since two objects adjacent to each other in memory are likely unrelated. This performance degradation is the price for Python’s simple memory model.
For computing tasks that have high requirement on performance, NumPy is optimized for making use of blocks of contiguous memory.
==
andis
Some remarks gilch made about
==
andis
:The
==
operator calls the__eq__
method of an object. The default__eq__
inherits fromis
, and does a check if two objects are the same object. (Source?) We can have two instances of a number, but not two instances of aTrue
orFalse
.{}
and Set ConstructorWe went into a tangent where gilch checked my understanding of sets. We encountered some corner cases like Python interpreting
True
as1
andFalse
as0
:{}
is used to represent both sets and dictionaries, but{}
itself would be interpreted as an empty dictionary instead of an empty set:To make an empty set, we’d use the
set()
constructor:Gilch gave me a puzzle: make an empty set without using the
set()
constructor.I came up with the answer
{1} - {1}
pretty quickly, but gilch had another solution in mind that did not involve using any numbers or letters. Hint: passing in iterables to a constructor results in different values than passing in the same iterables in expressions:Using splat, the other way to make an empty set without using the
set()
constructor isMagic Methods for Attributes (Continued from last time)
When I was working on the solution that involved modifying the
__dict__
last time, I was getting pretty confused about the difference betweendir()
,vars()
and__dict__
.Gilch started by asking me to construct a simple class and making an instance:
Then we listed out the attributes of
sc
in different ways:The difference between
dir
andvars
is thatdir
returns all attributes of an object, including the attributes of its class and attributes inherited from its superclasses; on the other hand,vars
only returns attributes stored in the default__dict__
attribute, which excludes inherited attributes. This StackOverflow question goes into more details.__mro__
__mro__
stands for “method resolution order,” which provides the inheritance path from the current class all the way up toobject
. It is honestly the most handy tool I’ve learned from this session.Note that
__mro__
is a class attribute, not an instance attribute:Magic Methods for Attributes
Now we can verify that
dir(sc)
returns the sum ofvars(sc)
,vars(SimpleClass)
andvars(object)
:Why did we need to covert the two lists to sets when comparing them at the end?
Two of the attributes,
__init__
and__doc__
, were overridden.Inheritance and
__mro__
Noticing that I didn’t understand inheritance completely, gilch gave another example.
Here,
SimpleClass3
inherits fromSimpleClass
andSimpleClass2
. BothSimpleClass
andSimpleClass2
have implemented class methodx
, which one wouldSimpleClass3
have?However, this changes when we switch the order of inheritance:
So the inheritance order decides which superclass takes precedence. The Python documentation on method resolution order as well as this talk gives more detailed explanations of the algorithm.
__slots__
__slots__
is used for saving memory.What happened here is that by overriding
__slots__
we have restricted the__dict__
attribute of any instance ofSimpleClass4
. Not adding instance methods means less memory used.As we can see here,
sc4
does not have a__dict__
attribute here, sovars(sc4)
has become invalid too.Accessing Attributes of a Superclass
Next, gilch provided an example of using the keyword
super
. First, we create a classNewTuple
that inherits fromtuple
:Then we can access the constructor of the superclass by calling
super().__new__
and passing in thetuple
class as the first argument:We get a
tuple
object when we callNewTuple()
. However, this only works for subtypes of the superclass of the current class. If we pass in—list
which is not a subclass of—tuple
we would get an error:Of course, we can always pass in the current class to make the constructor return an instance of the current class:
Trace
Next puzzle from gilch: make a
@trace
decorator that prints inputs and return values.I came up with a first pass solution:
Then gilch added a condition: the decorated function still needs to return the same value as the undecorated version.
I was pretty stumped on this one. It seemed that I’d need two different statements in the lambda function returned by the decorator for this to work, one to do the printing and the other one to return the value. So gilch gave me a hint: Think about what the expression
print('hi') or 1 + 2
evaluates to. Then it occurred to me that, sinceprint
returnsNone
, I could useor
to combine statements as long as only one of them evaluates to something with boolean valueTrue
. After an attempt, I also realized that the statement that produces theTrue
value would need to come last to prevent the expression evaluation being short-circuited.Progn
Gilch asked me to write a function named
progn
that takes any number of parameters and only returns the last one. Usingprogn
, we can get rid of theor
’s:Assignment Expression
Gilch introduced the assignment expression, and we rewrote the solution to use it:
Earlier I was stumped because I wanted to put two statements inside a lambda function but couldn’t. With
progn
and:=
, it’s possible to combine multiple statements into one, so effectively create a lambda with multiple statements.