At a glance, the yield statement is used to define generators, replacing the return of a function to provide a result to its caller without destroying local variables. Unlike a function, where on each call it starts with new set of variables, a generator will resume the execution where it was left off.

About Python Generators

Since the yield keyword is only used with generators, it makes sense to recall the concept of generators first.

The idea of generators is to calculate a series of results one-by-one on demand (on the fly). In the simplest case, a generator can be used as a list, where each element is calculated lazily. Lets compare a list and a generator that do the same thing – return powers of two:

Iterating over the list and the generator looks completely the same. However, although the generator is iterable, it is not a collection, and thus has no length. Collections (lists, tuples, sets, etc) keep all values in memory and we can access them whenever needed. A generator calculates the values on the fly and forgets them, so it does not have any overview about the own result set.

Generators are especially useful for memory-intensive tasks, where there is no need to keep all of the elements of a memory-heavy list accessible at the same time. Calculating a series of values one-by-one can also be useful in situations where the complete result is never needed, yielding intermediate results to the caller until some requirement is satisfied and further processing stops.

Using the Python “yield” keyword

A good example is a search task, where typically there is no need to wait for all results to be found. Performing a file-system search, a user would be happier to receive results on-the-fly, rather the wait for a search engine to go through every single file and only afterwards return results. Are there any people who really navigate through all Google search results until the last page?

Since a search functionality cannot be created using list-comprehensions, we are going to define a generator using a function with the yield statement/keyword. The yield instruction should be put into a place where the generator returns an intermediate result to the caller and sleeps until the next invocation occurs. Let’s define a generator that would search for some keyword in a huge text file line-by-line.

Now, assuming that my “directory.txt” file contains a huge list of names and phone numbers, lets find someone with “Python” in the name:

When we call the search function, its body code does not run. The generator function will only return the generator object, acting as a constructor:

This is a bit tricky, since everything below def search(keyword, filename): is normally meant to execute after calling it, but not in the case of generators. In fact, there was even a long discussion, suggesting to use “gen”, or other keywords to define a generator. However, Guido decided to stick with “def”, and that’s it. You can read the motivation on PEP-255.

To make the newly-created generator calculate something, we need to access it via the iterator protocol, i.e. call it’s next method:

And finally, a “classical” example of generators: calculate the first N given number of Fibonacci numbers:

Numbers are calculated until the counter reaches ‘n‘. This example is so popular because the Fibonacci sequence is infinite, making it problematic to fit in memory.

So far the most practical aspects of Python generators have been described. For more detailed info and an interesting discussion take a look at the Python Enhancement Proposal 255, which discusses the feature of the language in detail.

Happy Pythoning!

About The Author