这节写得很赞,如果这块知识生疏了,值得重新看看回顾下。
Why Sequences Are Iterable: The iter Function
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __getitem__(self, index):
return self.words[index]
def __len__(self):
return len(self.words)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
Python 解释器需要遍历(iterate)一个 object x
时,会先调用 iter(x)
来返回一个 iterator。iter
会:
- Checks whether the object implements
__iter__
, and calls that to obtain an iterator. - If
__iter__
is not implemented, but__getitem__
is implemented, Python creates an iterator that attempts to fetch items in order, starting from index 0 (zero). - If that fails, Python raises
TypeError
, usually saying “C object is not iterable,” where C is the class of the target object.
但是 __getitem__
这个行为,只是为了向后兼容,并不被提倡(也没被 deprecated)。Python 的内置 sequence 类型,都有实现 __iter__
函数。Iterable
这个 ABC 的 __subclasshook__
中,只判断一个类型有没有 __iter__
函数,而不判断 __getitem__
相关。所以上面的 Sequence
类并不能通过 issubclass(Sequence, abc.Iterable)
判断。如果你想判断一个对象是否可以被遍历,当前(Python 3.4)最佳做法是调用 iter(x)
看看有无抛出 TypeError
异常。
Iterables Versus Iterators
- iterable
- Any object from which the
iter
built-in function can obtain an iterator. Objects implementing an__iter__
method returning an iterator are iterable. Sequences are always iterable; as are objects implementing a__getitem__
method that takes 0-based indexes.
Python obtains iterators from iterables.
下面两段代码演示了 Python 的 iterate 机制:
>>> s = 'ABC'
>>> for char in s:
... print(char)
...
A
B
C
# the code above is equivalent to below
>>> s = 'ABC'
>>> it = iter(s)
>>> while True:
... try:
... print(next(it))
... except StopIteration:
... del it
... break
...
A
B
C
Iterator
的 ABC 定义了两个函数,__next__
和 __iter__
。__next__
在被内置函数 next(it)
调用,它需要返回下一个元素,或者抛出 StopIteration
异常。__iter__
一般来说直接返回 self
,即 iter(it) is it
,使得 iterator 本身也 iterable。
- iterator
- Any object that implements the next no-argument method that returns the next item in a series or raises StopIteration when there are no more items. Python iterators also implement the iter method so they are iterable as well.
Sentence Take #2: A Classic Iterator
将上面的例子改造一下,引入了 iterator:
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
return SentenceIterator(self.words)
class SentenceIterator:
def __init__(self, words):
self.words = words
self.index = 0
def __next__(self):
try:
word = self.words[self.index]
except IndexError:
raise StopIteration()
self.index += 1
return word
def __iter__(self):
return self
这是一个比较啰嗦的示例,为的是介绍后面的 generator 机制。
Sentence Take #3: A Generator Function
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
for word in self.words:
yield word
函数体中带 yield
关键字的,即是 generator function。generator function 的机制是:
A generator function builds a generator object that wraps the body of the function. When we invokenext(…)
on the generator object, execution advances to the nextyield
in the function body, and thenext(…)
call evaluates to the value yielded when the function body is suspended. Finally, when the function body returns, the enclosing generator object raisesStopIteration
, in accordance with theIterator
protocol.Calling a generator function returns a generator. A generator yields or produces values. A generator doesn’t “return” values in the usual way: thereturn
statement in the body of a generator function causesStopIteration
to be raised by the generator object.
Sentence Take #4: A Lazy Implementation
前面的例子并不是 lazy 的。因为 words
是提前准备好的。当你只需要前面几个词时,把整串 words 都准备好,可能是对性能和内存的不必要消耗。下面给了一个延迟计算的方法:
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
def __init__(self, text):
self.text = text
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
for match in RE_WORD.finditer(self.text):
yield match.group()
generator function 很方便,但是 generator expression 更方便。
Sentence Take #5: A Generator Expression
对于 generator expression,一个容易理解的解释是:
A generator expression can be understood as a lazy version of a list comprehension: it does not eagerly build a list, but returns a generator that will lazily produce the items on demand. In other words, if a list comprehension is a factory of lists, a generator expression is a factory of generators.
简化后的代码不再由 generator function 返回 generator,而是由 generator expression 产生 generator:
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
def __init__(self, text):
self.text = text
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
return (match.group() for match in RE_WORD.finditer(self.text))
Another Example: Arithmetic Progression Generator
之前的例子都是在一个 collection 中遍历 item,现在作者给出一些其他的例子,比如一个类似 range
的类:
class ArithmeticProgression:
def __init__(self, begin, step, end=None):
self.begin = begin
self.step = step
self.end = end # None -> "infinite" series
def __iter__(self):
result = type(self.begin + self.step)(self.begin) # [1]
forever = self.end is None
index = 0
while forever or result < self.end:
yield result
index += 1
result = self.begin + self.step * index # [2]
这段代码虽然并不是很复杂,但是作者写得很严谨仔细,让我很想把它记下来。作者给代码写了注释:
[1] 处,作者仔细解释了为什么做数据类型的隐式变换,以及他为了写一个地道的变换代码,而去邮件组做了询问:
This line produces a result value equal toself.begin
, but coerced to the type of the subsequent additions. In Python 2, there was a coerce() built-in function but it’s gone in Python 3, deemed unnecessary because the numeric coercion rules are implicit in the arithmetic operator methods. So the best way I could think of to coerce the initial value to be of the same type as the rest of the series was to perform the addition and use its type to convert the result. I asked about this in the Python-list and got an excellent response from Steven D’Aprano.
[2] 处,作者故意维护了一个 index
变量,来避免浮点数运算中误差不段积累:
[...], instead of simply incrementing theresult
withself.step
iteratively, I opted to use an index variable and calculate each result by addingself.begin
toself.step
multiplied byindex
to reduce the cumulative effect of errors when working with with floats.
Generator Functions in the Standard Library
作者将分散在各处的函数按功能做了分类,这部分值得翻阅。具体的代码例子,参考书中代码以及 itertools 模块官方文档。
感觉 itertools 的能力难掌握,建议看看 Future Reading 部分提供的延伸阅读,提供了一些 recipes 供参考。
感觉这部分内容,很有函数式编程的意味。
New Syntax in Python 3.3: yield from
>>> def chain(*iterables):
... for i in iterables:
... yield from i
...
>>> list(chain(s, t))
['A', 'B', 'C', 0, 1, 2]
yield from
看起来像语言糖,但是又不止是语法粮。在第 16 章讲协程时会提到。
A Closer Look at the iter Function
iter
还有个特殊功能。接受一个为 callable 类型的参数,这个 callable 必须是不需要参数的函数。当这个函数执行到返回第 2 个 sentinel 值时,迭代结束:
with open('mydata.txt') as fp:
for line in iter(fp.readline, ''):
process_line(line)
一旦 fp.readline
读到空行或者文件耗尽,则循环结束。
Soapbox
讨论了 generator function 是否应该用一个新的关键字的问题。Guido 不喜欢用新的关键字。但是作者觉得这里应该用新的关键字,因为 generator function 难以复用:
def f():
x = 0
while True:
x += 1
yield x
# 如果想复用 yield 能力,这样做是有问题的
def f():
def do_yield(n):
yield n
x = 0
while True:
x += 1
do_yield(x)
# 虽然你可以用 yield from do_yield(x),但是跟平时函数调用的语义不符