- Generator Functions and Expressions
In computer science, a generator is a special routine that can be used to control the iteration behavior of a loop.
A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.
Python provides tools that produce results only when needed:
- Generator functions
They are coded as normal def but use yield to return results one at a time, suspending and resuming.
- Generator expressions
These are similar to the list comprehensions. But they return an object that produces results on demand instead of building a result list.
Because neither of them constructs a result list all at once, they save memory space and allow computation time to be split by implementing the iteration protocol.
We can write functions that send back a value and later be resumed by picking up where they left off. Such functions are called generator functions because they generate a sequence of values over time.
Generator functions are not much different from normal functions and they use defs. When created, however, they are automatically made to implement the iteration protocol so that they can appear in iteration contexts.
Normal functions return a value and then exit. But generator functions automatically suspend and resume their execution. Because of that, they are often a useful alternative to both computing an entire series of values up front and manually saving and restoring state in classes. Because the state that generator functions retain when they are suspended includes their local scope, their local variables retain information and make it available when the functions are resumed.
The primary difference between generator and normal functions is that a generator yields a value, rather than returns a value. The yield suspends the function and sends a value back to the caller while retains enough state to enable the function immediately after the last yield run. This allows the generator function to produce a series of values over time rather than computing them all at once and sending them back in a list.
Generators are closely bound up with the iteration protocol. Iterable objects define a __next__() method which either returns the next item in the iterator or raises the special StopIteration exception to end the iteration. An object's iterator is fetched with the iter built-in function.
The for loops use this iteration protocol to step through a sequence or value generator if the protocol is suspended. Otherwise, iteration falls back on repeatedly indexing sequences.
To support this protocol, functions with yield statement are compiled specially as generators. They return a generator object when they are called. The returned object supports the iteration interface with an automatically created __next__() method to resume execution. Generator functions may have a return simply terminates the generation of values by raising a StopIteration exceptions after any normal function exit.
The net effect is that generator functions, coded as def statements containing yield statement, are automatically made to support the iteration protocol and thus may be used any iteration context to produce results over time and on demand.
Let's look at the interactive example below:
>>> def create_counter(n): print('create_counter()') while True: yield n print('increment n') n += 1 >>> c = create_counter(2) >>> c <generator object create_counter at 0x03004B48> >>> next(c) create_counter() 2 >>> next(c) increment n 3 >>> next(c) increment n 4 >>>
Here are the things happening in the code:
- The presence of the yield keyword in create_counter() means that this is not a normal function. It is a special kind of function which generates values one at a time. We can think of it as a resumable function. Calling it will return a generator that can be used to generate successive values of n.
- To create an instance of the create_counter() generator, just call it like any other function. Note that this does not actually execute the function code. We can tell this because the first line of the create_counter() function calls print(), but nothing was printed from the line:
>>> c = create_counter(2)
- The create_counter() function returns a generator object.
- The next() function takes a generator object and returns its next value. The first time we call next() with the counter generator, it executes the code in create_counter() up to the first yield statement, then returns the value that was yielded. In this case, that will be 2, because we originally created the generator by calling create_counter(2).
- Repeatedly calling next() with the same generator object resumes exactly where it left off and continues until it hits the next yield statement. All variables, local state, &c.; are saved on yield and restored on next(). The next line of code waiting to be executed calls print(), which prints increment n. After that, the statement n += 1. Then it loops through the while loop again, and the first thing it hits is the statement yield n, which saves the state of everything and returns the current value of n (now 3).
- The second time we call next(c), we do all the same things again, but this time n is now 4.
- Since create_counter() sets up an infinite loop, we could theoretically do this forever, and it would just keep incrementing n and spitting out values.
The generator function in the following example generated the cubics of numbers over time:
>>> def cubic_generator(n): for i in range(n): yield i ** 3 >>>
The function yields a value and so returns to its caller each time through the loop. When it is resumed, its prior state is restored and control picks up again after the yield statement. When it's used in a for loop, control returns to the function after its yield statement each time through the loop:
>>> for i in cubic_generator(5): print(i, end=' : ') # Python 3.0 #print i, # Python 2.x 0 : 1 : 8 : 27 : 64 : >>>
If we use return instead of yield, the result is:
>>> def cubic_generator(n): for i in range(n): return i ** 3 >>> for i in cubic_generator(5): print(i, end=' : ') #Python 3.0 Traceback (most recent call last): File "
", line 1, in for i in cubic_generator(5): TypeError: 'int' object is not iterable >>>
Here is an example of using generator and yield.
>>> # Fibonacci version 1 >>> def fibonacci(): Limit = 10 count = 0 a, b = 0, 1 while True: yield a a, b = b, a+b if (count == Limit): break count += 1 >>> >>> for n in fibonacci(): print(n, end=' ') 0 1 1 2 3 5 8 13 21 34 55 >>>
Because generators preserve their local state between invocations, they're particularly well-suited for complicated, stateful iterators, such as fibonacci numbers. The generator returning the Fibonacci numbers using Python's yield statement can be seen below.
Here is another version of Fibonacci:
>>> # Fibonacci version 2 >>> def fibonacci(max): a, b = 0, 1 (1) while a < max: yield a (2) a, b = b, a + b (3)
Simple summary for this version:
- It starts with 0 and 1, goes up slowly at first, then more and more rapidly. To start the sequence, we need two variables: a starts at 0, and b starts at 1.
- a is the current number in the sequence, so yield it.
- b is the next number in the sequence, so assign that to a, but also calculate the next value a + b and assign that to b for later use. Note that this happens in parallel; if a is 3 and b is 5, then a, b = b, a + b will set a to 5 (the previous value of b) and b to 8 (the sum of the previous values of a and b).
>>> for n in fibonacci(500): print(n, end=' ') 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 >>> >>> list(fibonacci(500)) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377] >>>
As we can see from the output, we can use a generator like fibonacci() in a for loop directly. The for loop will automatically call the next() function to get values from the fibonacci() generator and assign them to the for loop index variable (n). Each time through the for loop, n gets a new value from the yield statement in fibonacci(), and all we have to do is print it out. Once fibonacci() runs out of numbers (a becomes bigger than max, which in this case is 500), then the for loop exits gracefully.
This is a useful idiom: pass a generator to the list() function, and it will iterate through the entire generator (just like the for loop in the previous example) and return a list of all the values.
To end the generation of values, functions use either a return with no value or simply allow control to fall off the end of the function body.
To see what's happening inside the for, we can call the generator function directly:
>>> x = cubic_generator(5) >>> x <generator object cubic_generator at 0x000000000315F678> >>>
We got back a generator object that supports the iteration protocol. The next(iterator) built-in calls an object's __next__() method:
>>> next(x) 0 >>> next(x) 1 >>> next(x) 8 >>> next(x) 27 >>> next(x) 64 >>> next(x) Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> next(x) StopIteration >>>
We could have built the list of yielded values all at once:
>>> def cubic_builder(n): result =  for i in range(n): result.append(i ** 3) return result >>> for x in cubic_builder(5): print(x, end=' : ') 0 : 1 : 8 : 27 : 64 : >>>
>>> >>> for x in [n ** 3 for n in range(5)]: print(x, end=' : ') 0 : 1 : 8 : 27 : 64 : >>> >>> for x in map((lambda n: n ** 3), range(5)): print(x, end=' : ') 0 : 1 : 8 : 27 : 64 : >>>
As we've seen, we could have had the same result using other approaches. However, generators can be better in terms of memory usage and the performance. They allow functions to avoid doing all the work up front. This is especially useful when the resulting lists are huge or when it consumes a lot of computation to produce each value. Generator distributes the time required to produce the series of values among loop iterations.
As a more advanced usage example, generators can provide a simpler alternatives to manually saving the state between iterations in class objects. With generators, variables accessible in the function's scopes are saved and restored automatically.
The notions of iterators and list comprehensions have been combined in a new feature, generator expressions. Generator expressions are similar to list comprehensions, but they are enclosed in parentheses instead of square brackets:
>>> # List comprehension makes a list >>> [ x ** 3 for x in range(5)] [0, 1, 8, 27, 64] >>> >>> # Generator expression makes an iterable >>> (x ** 3 for x in range(5)) <generator object <genexpr> at 0x000000000315F678> >>>
Actually, coding a list comprehension is essentially the same as wrapping a generator expression in a list built-in call to force it to produce all its results in a list at once:
>>> list(x ** 3 for x in range(5)) [0, 1, 8, 27, 64] >>>
But in terms of operation, generator expressions are very different. Instead of building the result list in memory, they return a generator object. The returned object supports the iteration protocol to yield one piece of the result list at a time in any iteration context:
>>> Generator = (x ** 3 for x in range(5)) >>> next(Generator) 0 >>> next(Generator) 1 >>> next(Generator) 8 >>> next(Generator) 27 >>> next(Generator) 64 >>> next(Generator) Traceback (most recent call last): File "<pyshell#68>", line 1, in <module> next(Generator) StopIteration >>>
Typically, we don't see the next iterator machinery under the hood of a generator expression like this because of for loops trigger the next for us automatically:
>>> for n in (x ** 3 for x in range(5)): print('%s, %s' % (n, n * n)) 0, 0 1, 1 8, 64 27, 729 64, 4096 >>>
In the above example, the parentheses were not required around the generator expression if they are the sole item enclosed in other parentheses. However, there are cases when extra parentheses are required as in the example below:
>>> >>> sum (x ** 3 for x in range(5)) 100 >>> >>> sorted(x ** 3 for x in range(5)) [0, 1, 8, 27, 64] >>> >>> sorted((x ** 3 for x in range(5)), reverse=True) [64, 27, 8, 1, 0] >>> >>> import math >>> list( map(math.sqrt, (x ** 3 for x in range(5))) ) [0.0, 1.0, 2.8284271247461903, 5.196152422706632, 8.0] >>>
Generator expressions are a memory-space optimization. They do not require the entire result list to be constructed all at once while the square-bracketed list comprehension does. They may also run slightly slower in practice, so they are probably best used only for very large result sets.
The same iteration can be coded with either a generator function or a generator expression. Let's look at the following example which repeats each character in a string five times:
>>> G = (c * 5 for c in 'Python') >>> list(G) ['PPPPP', 'yyyyy', 'ttttt', 'hhhhh', 'ooooo', 'nnnnn']
The equivalent generator function requires a little bit more code but as a multistatement function, it will be able to code more logic and use more state information if needed:
>>> def repeat5times(x): for c in x: yield c * 5 >>> G = repeat5times('Python') >>> list(G) ['PPPPP', 'yyyyy', 'ttttt', 'hhhhh', 'ooooo', 'nnnnn'] >>>
Both expressions and functions support automatic and manual iteration. The list we've got in the above example iterated automatically. The following iterate manually:
>>> G = (c * 5 for c in 'Python') >>> I = iter(G) >>> next(I) 'PPPPP' >>> next(I) 'yyyyy' >>> >>> G = repeat5times('Python') >>> I = iter(G) >>> next(I) 'PPPPP' >>> next(I) 'yyyyy' >>>
Note that we make new generators here to iterator again. Generators are one-shot iterators.
Both generator functions and generator expressions are their own iterators. So, they support just one active iteration. We can't have multiple iterators. In the previous example for generator expression, a generator's iterator is the generator itself.
>>> G = (c * 5 for c in 'Python') >>> # My iterator is myself: G has __next__() method >>> iter(G) is G True >>>
If we iterate over the results stream manually with multiple iterators, they will all point to the same position:
>>> G = (c * 5 for c in 'Python') >>> # Iterate manually >>> I1 = iter(G) >>> next(I1) 'PPPPP' >>> next(I1) 'yyyyy' >>> I2 = iter(G) >>> next(I2) 'ttttt' >>>
Once any iteration runs to completion, all are exhausted. We have to make a new generator to start again:
# Collect the rest of I1's items >>> list(I1) ['hhhhh', 'ooooo', # Other iterators exhausted too >>> next(I2) Traceback (most recent call last): File "<pyshell#45>", line 1, in <module> next(I2) StopIteration # Same for new iterators >>> I3 = iter(G) >>> next(I3) Traceback (most recent call last): File "<pyshell#47>", line 1, in <module> next(I3) StopIteration # New generator to start over >>> I3 = iter( c* 5 for c in 'Python') >>> next(I3) 'PPPPP' >>>
The same applies to generator functions:
>>> def repeat5times(x): for c in x: yield c * 5 >>> # Generator functions work the same way >>> G = repeat5times('Python') >>> iter(G) is G True >>> I1, I2 = iter(G), iter(G) >>> next(I1) 'PPPPP' >>> next(I1) 'yyyyy' >>> # I2 at same position I1 >>> next(I2) 'ttttt' >>>
This is different from the behavior of some built-in types. Built-in types support multiple iterators and passes and reflect their in-place changes in active iterators:
>>> >>> L = [1, 2, 3, 4] >>> I1, I2 = iter(L), iter(L) >>> next(I1) 1 >>> next(I1) 2 >>> # Lists support multiple iterators >>> next(I2) 1 >>> # Changes reflected in iterators >>> del L[2:] >>> next(I1) Traceback (most recent call last): File "<pyshell#21>", line 1, in <module> next(I1) StopIteration >>>
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization