This post briefly summarizes the chapter 14 of Fluent Python by Ramalho. Please check the book for more detailed examples and usages.
Iteration is fundamental to data processing. And when scanning datasets that don’t fit in memory, we need a way to fetch the items lazily, that is, one at a time and on demand. This is what the Iterator pattern is about.
import re import reprlib RE_WORD = re.compile(r'\w+') class Sentence: def __init__(self, text): self.text = text self.words = RE_WORD.findall(text) def __getitem__(self, index): # for loop is available return self.words[index] def __len__(self): # it completes the sequence protocol return len(self.words) def __repr__(self): return 'Sentence(%s)' % reprlib.repr(self.text) >>> s = Sentence('Hello there! Mighty fine morning! if you ask me, I am Waldo.') >>> s Sentence('Hello there!..., I am Waldo.') >>> for word in s: print(word) Hello there Mighty ... Waldo >>> list(s) # list takes any iterable to build a list instance ['Hello', 'there', 'Mighty', ..., 'Waldo']
How is the above
for loop possible?
Whenever the interpreter needs to iterate over an object
x, it automatically calls
iter built-in function needs two methods of the object, namely
iterwants to make an iterator via
Based on these observations, we can describe how the Python iterable quacks:
Iterable: Any object from which the iter built-in function can obtain an iterator. Objects implementing an
__iter__method returning an iterator are iterable. Sequences are always iterable; so as are objects implementing a
__getitem__method which takes 0-based indexes.
Note also that the goose-typing is supported with
abc.Iterable, which strictly requires
__iter__ method; it does not consider
>>> s = 'ABC' >>> for c in s: ... print(c)
The above code is equivalent with the below.
>>> s = 'ABC' >>> it = iter(s) >>> while True: ... try: ... print(next(it)) ... except StopIteration: # it signals the iterator exhausted ... del it ... break
Iterator: Any object that implements the
__next__no-argument method which returns the next item in a series or raises
StopIterationwhen there are no more items. Python iterators also implement the
__iter__method so they are iterable as well.
Any Python function that has the
yieldkeyword in its body is a generator function: a function which, when called, returns a generator object. In other words, a generator function is a generator factory.
A generator function in Python make the implementation of an iterator much easier.
def fib(): # invoking it does not result in the infinite loop a, b = 0, 1 while 1: yield b a, b = b, a+b >>> fibgen = fib() # it just gives us a generator >>> fibgen # used for loops <generator object fib at 0x7efec9e86620>
We can make
Sentence truly lazy via
re.finditer (also a lazy version of
class Sentence: def __init__(self, text): self.text = text def __iter__(self): for match in RE_WORD.finditer(self.text): yield match.group()
Sentence gets shorter using generator expressions.
A generator expression can be understood as a lazy version of a list comprehension
class Sentence: def __iter__(self): return (match.group() for match in RE_WORD.finditer(self.text))
chain implementation tells us what the new keyword does for generators.
def chain(*iters): for i in iters: yield from i >>> list(chain('ABC', range(3))) ['A', 'B', 'C', 0, 1, 2]
See the docs.