Decoding `for` statements in Python

Shaiq Kar
6 min readMar 14, 2023

--

Let’s start with a simple for statement:

for x in y:
z

Passing the code through the disassembler (dis module) gives us:

 2            0 LOAD_GLOBAL              0 (y)
2 GET_ITER
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 0 (x)

3 8 LOAD_GLOBAL 1 (z)
10 POP_TOP
12 JUMP_ABSOLUTE 4
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE

Looking at the output you may notice that the GET_ITER and FOR_ITER opcodes seem to be specific to for statements while the rest are general opcodes.

Looking up GET_ITER in the eval loop, it appears to mostly be calling PyObject_GetIter(), that it is mostly equivalent to the built-in function iter(). That means there will probably be something to do with next().

Speaking of next(), the FOR_ITER opcode is semantically a call to __next__() on an object's type (as represented by (*iter->ob_type->tp_iternext)(iter) in the C code). What that means is that understanding how iter() and next() work is critical to understanding how for statements work.

The built-ins

Before we go into iter() and next(), we should define two terms. An iterable is a container that returns its contents one by one. An iterator is a type of object that streams those contained items. So, while every iterator is conceptually an iterable (and you should always make your iterators iterables; it’s not much effort), the opposite is not true, and not all iterables are iterators on their own.

iter()

The reason I bothered defining those words is because the purpose of iter() is to take an iterable and return its iterator. Now, what iter() considers an iterable depends on whether you give it one or two arguments.

iter(iterable)

Starting with the semantics of the function when there’s a single argument, we see that the implementation calls PyObject_GetIter() to get the iterator. The pseudocode for this function, which I will explain in a moment, is:

def iter(iterable):
type_ = type(iterable)
if hasattr(type_, "__iter__"): # type_->tp_iter != NULL
iterator = type_.__iter__(iterable)
if hasattr(type(iterator), "__next__"): # PyIter_Check
return iterator
else:
raise TypeError
elif hasattr(type_, "__getitem__"): # PySequence_Check
return SeqIter(iterable) # PySeqIter_New
else:
raise TypeError

The first step is looking for the __iter__() special method on an iterable. Calling this is meant to return an iterator (which is explicitly checked for on the return value). At the C level, the definition of an "iterator" is an object that defines __next__(); both the iterable and iterator protocols can also be checked via their requisite collections.abc classes and issubclass() (I didn't do it this way in the pseudocode simply because the hasattr() checks are closer to how the C code is written; in actual Python code I would probably use the abstract base classes).

There is a footnote relating to __iter__() that says if the attribute on the class is set to None that it won't be used. I think this is implicitly supported due to how the implementation is written, i.e. None is not callable and lacks the appropriate methods that are being checked for.

The second step is: what happens if __iter__() isn't available? In that case, there's a check to see if we are dealing with a sequence by looking for the __getitem__() special method. If the object turns out to be a sequence, then an instance of PySeqIter_Type is returned whose approximate Python implementation would be:

def _seq_iter(seq):
"""Yield the items of the sequence starting at 0."""
index = 0
while True:
try:
yield seq[index]
index += 1
except (IndexError, StopIteration):
return

I say “approximate” because the CPython version supports pickling.

If all of the above fails, then TypeError is ultimately raised.

iter(callable, sentinel)

If you recall earlier, I mentioned that iter() had a two-argument version. In this case iter() is rather different than what we have already discussed:

def iter(callable, sentinel):
if not hasattr(callable, "__call__"): # PyCallable_Check
raise TypeError
return CallIter(callable, sentinel) # PyCallIter_Type

As you can see there’s a check to see if the first argument is callable, and if it is then an instance of the PyCallIter_Type iterator is returned. With the function being so short, the important question is what does the PyCallIter_Type iterator do?

The use-case for this form of iter() is for when you have a no-argument callable that you persistently call until a certain value is returned representing that there isn't anything more coming from the callable.

def _call_iter(callable, sentinel):
while True:
val = callable()
if val == sentinel:
return
else:
yield val

An example of where this could be useful is if you are reading a file a chunk at a time, stopping once b"" is returned:

with open(path, "rb") as file:
for chunk in iter(lambda: file.read(chunk_size), b""):
...

Pulling it all together and doing everything appropriately, the definition of iter() is:

def iter(obj, /, sentinel=_NOTHING,):
"""Return an iterator for the object.

If 'sentinel' is unspecified, the first argument must either be an iterable
or a sequence. If the argument is a sequence, an iterator will be returned
which will index into the argument starting at 0 and continue until
IndexError or StopIteration is raised.

With 'sentinel' specified, the first argument is expected to be a callable
which takes no arguments. The returned iterator will execute the callable on
each iteration until an object equal to 'sentinel' is returned.
"""
# Python/bltinmodule.c:builtin_iter
obj_type = builtins.type(obj)
if sentinel is _NOTHING:
# Python/abstract.c:PyObject_GetIter
try:
__iter__ = _mro_getattr(obj_type, "__iter__")
except AttributeError:
try:
_mro_getattr(obj_type, "__getitem__")
except AttributeError:
raise TypeError(f"{obj_type.__name__!r} is not iterable")
else:
return _seq_iter(obj)
else:
iterator = __iter__(obj)
# Python/abstract.c:PyIter_Check
iterator_type = builtins.type(iterator)
try:
_mro_getattr(iterator_type, "__next__")
except AttributeError:
raise TypeError(
f"{obj_type.__name__!r}.__iter__() returned a non-iterator of type {builtins.type(__iter__)!r}"
)
else:
return __iter__(obj)
else:
# Python/object.c:PyCallable_Check
try:
_mro_getattr(obj_type, "__call__")
except AttributeError:
raise TypeError(f"{obj_type.__name__!r} must be callable")
else:
return _call_iter(obj, sentinel)

next()

To get the next value from an iterator, we pass it to the next() built-in function. It takes an iterator and can optionally take a default value. If the iterator has a value to return, then it's returned. If the iterator is exhausted, though, either StopIteration is raised or the default value is returned if it's been provided.

def next(iterator, /, default=_NOTHING):
"""Return the next value from the iterator by calling __next__().

If a 'default' argument is provided, it is returned if StopIteration is
raised by the iterator.
"""
# Python/bltinmodule.c:builtin_next
iterator_type = builtins.type(iterator)
try: # Python/abstract.c:PyIter_Check
__next__ = _mro_getattr(iterator_type, "__next__")
except AttributeError:
raise TypeError(f"{iterator_type.__name__!r} is not an iterator")
else:
try:
val = __next__(iterator)
except StopIteration:
if default is _NOTHING:
raise
else:
return default
else:
return val

The semantics

For each value returned by the iterable’s iterator, they are assigned to the loop’s target (sometimes called the loop variant), the statements in the body are run, and this repeats until the iterator is exhausted. If there is an else clause for the for loop, then it is executed if a break statement isn't encountered. Breaking this all down gives us:

  1. Get the iterable’s iterator
  2. Assign the iterator’s next value to the loop target
  3. Execute the body
  4. Repeat until the iterator is done
  5. If there’s an else clause and a break statement isn't hit, execute the else clause's body

It kind of sounds like a while loop with an assignment, doesn't it?

  1. Get the iterator with iter()
  2. Call next() and assign the result to the loop target
  3. As long as calling next didn’t raise StopIteration, execute the body
  4. Repeat as necessary
  5. Run the else clause as appropriate

for without an else clause

Let’s start with the simpler case of not having an else clause. In that case we can translate:

for x in y:
z

to:

_iter = iter(y)
while True:
try:
x = next(_iter)
except StopIteration:
break
else:
z
del _iter

If you squint a bit it sort of looks like the for loop example. It's definitely a lot more verbose and would be a pain to write out every time you wanted to iterate, but it would get the work done.

for with an else clause

Now you might have noticed that we used a break statement in our decoding above which would cause a while loop's else clause to always be skipped. How do we change the translation to work when an else clause is present?

Let’s update our example first:

for x in y:
z
else:
s

To eliminate our use of break in our original decoding, we need to come up with another way to denote when the iterator is out of items. A variable simply tracking whether there are more values should be enough to get us what we are after.

_iter = iter(y)
_looping = True
while _looping:
try:
x = next(_iter)
except StopIteration:
_looping = False
continue
else:
z
else:
s
del _iter, _looping

This decoding has a rather convenient side-effect of getting to rely on the while loop's own else clause and its semantics to get what we are after for the for loop!

And that’s it! We can leverage a while loop to implement for loops with no discernible semantic changes (except for the temp variables).

--

--

Shaiq Kar
Shaiq Kar

Written by Shaiq Kar

Uniting the ideals of math with the efficiency of computers to solve problems in our complex world.

No responses yet