Chapter 4 Data Processing

4.2 Implicit Sequences

Sequence 可以在使用时才分配内存, 比如下面的 range(), 只有在使用时才分配内存, 而不是在定义时分配内存.

>>> r = range(10000, 1000000000)
>>> r[45006230]
45016230

4.2.1 Iterators

迭代器.

>>> primes = [2, 3, 5, 7]
>>> type(primes)
<class 'list'>
>>> iterator = iter(primes)
>>> type(iterator)
<class 'list_iterator'>
>>> next(iterator)
2
>>> next(iterator)
3
>>> next(iterator)
5

当 next()到序列的最后一个元素之后, 会抛出 StopIteration 异常. 可以通过 try 来 catch 这个异常.

>>> next(iterator)
7
>>> next(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> try:
	next(iterator)
    except StopIteration:
	print('No more values')
No more values

每次调用 next(), 迭代器都会维护一个内部状态, 这个状态会记录当前迭代器的位置.

>>> r = range(3, 13)
>>> s = iter(r)  # 1st iterator over r
>>> next(s)
3
>>> next(s)
4
>>> t = iter(r)  # 2nd iterator over r
>>> next(t)
3
>>> next(t)
4
>>> u = t        # Alternate name for the 2nd iterator
>>> next(u)
5
>>> next(u)
6

对迭代器继续调用 iter() 会返回迭代器本身.

>>> v = iter(t)  # Another alterante name for the 2nd iterator
>>> next(v)
8
>>> next(u)
9
>>> next(t)
10

4.2.2 Iterables

可以生成迭代器的的序列称作 iterable value.

strings, tuples, sets, dictionaries 都是 iterable value.

>>> d = {'one': 1, 'two': 2, 'three': 3}
>>> d
{'one': 1, 'three': 3, 'two': 2}
>>> k = iter(d)
>>> next(k)
'one'
>>> next(k)
'three'
>>> v = iter(d.values())
>>> next(v)
1
>>> next(v)
3

在 Sequence 更改内容后, 前面创建的迭代器会失效.

>>> d.pop('two')
2
>>> next(k)

RuntimeError: dictionary changed size during iteration
Traceback (most recent call last):

4.2.3 Build-in Iterators

map, filter, zip, reversed 这些内置函数都会返回一个迭代器.

map(function, iterable), 返回一个迭代器, 这个迭代器会调用 function 对 iterable 中的每个元素进行处理.

>>> def double_and_print(x):
	print('***', x, '=>', 2*x, '***')
	return 2*x
>>> s = range(3, 7)
>>> doubled = map(double_and_print, s)  # double_and_print not yet called
>>> next(doubled)                       # double_and_print called once
*** 3 => 6 ***
6
>>> next(doubled)                       # double_and_print called again
*** 4 => 8 ***
8
>>> list(doubled)                       # double_and_print called twice more
*** 5 => 10 ***
*** 6 => 12 ***
[10, 12]

filter(function, iterable), 返回一个迭代器, 这个迭代器会调用 function 对 iterable 中的每个元素进行处理, 并返回处理结果为 True 的元素.

zip(iterable1, iterable2, ...), 返回一个迭代器, 这个迭代器将 iterable1 和 iterable2 中的元素进行一一配对打包成 tuple.

reversed(sequence), 返回一个迭代器, 这个迭代器会返回 sequence 中的元素, 但是顺序是反的.

4.2.4 For Statements

for 语句的实现就是通过迭代器来遍历. Objects 实现 __iter__ 方法返回迭代器, 再通过实现__next__方法来遍历.

>>> counts = [1, 2, 3]
>>> for item in counts:
	print(item)
1
2
3

相当于

>>> items = counts.__iter__()
>>> try:
	while True:
	    item = items.__next__()
	    print(item)
    except StopIteration:
	pass
1
2
3

4.2.5 Generators and Yield Statements

生成器 generator 使我们能够定义更复杂的迭代, generator 是迭代器的一种.

生成器函数 generator function 是定义生成器的一种方式. 生成器函数使用 yield 语句来返回值, 而不是 return.

对于 generator,不需要实现__iter____next__方法, 但可以使用__next__方法来遍历.

>>> def letters_generator():
	current = 'a'
	while current <= 'd':
	    yield current
	    current = chr(ord(current)+1)
>>> for letter in letters_generator():
	print(letter)
a
b
c
d

>>> letters = letters_generator()
>>> type(letters)
<class 'generator'>
>>> letters.__next__()
'a'
>>> letters.__next__()
'b'
>>> letters.__next__()
'c'
>>> letters.__next__()
'd'
>>> letters.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

4.2.6 Iterable Interface

实现 object 的 __iter__ 方法.

>>> class Letters:
	def __init__(self, start='a', end='e'):
	    self.start = start
	    self.end = end
	def __iter__(self):
	    return LetterIter(self.start, self.end)

>>> b_to_k = Letters('b', 'k')
>>> first_iterator = b_to_k.__iter__()
>>> next(first_iterator)
'b'
>>> next(first_iterator)
'c'
>>> second_iterator = iter(b_to_k)
>>> second_iterator.__next__()
'b'
>>> first_iterator.__next__()
'd'
>>> first_iterator.__next__()
'e'
>>> second_iterator.__next__()
'c'
>>> second_iterator.__next__()
'd'

4.2.7 Creating Iterables with Yield

4.2.8 Iterator Interface

实现 object 的 __next__ 方法。