Pure vs. impure iterators in Go

TL;DR ¶

Go has now standardised iterators.
Iterators are powerful.
Being functions under the hood, iterators can be closures.
The classification of iterators suggested by the documentation is ambiguous.
Dividing iterators into two categories, “pure” and “impure”, seems to me preferrable.
Whether iterators should be designed as “pure” whenever possible is unclear.

The advent of iterators in Go ¶

The iterator pattern was popularised by the classic “Gang of Four” book as

[providing] a way to access the elements of an aggregate object sequentially without exposing its underlying representation.

Until recently, the data structures over which you could iterate via a for-range loop were limited to arrays (either directly or through a pointer), slices, strings, maps, channels, and integers. However, in the wake of Go 1.18’s support for parametric polymorphism (a.k.a. “generics”), Go 1.23 standardised the way of defining custom iterators, saw the addition of the iter package in the standard library, and welcomed a couple of iterator factories (i.e. functions or methods that return an iterator) in the slices and maps packages. And Go 1.24 marked the inception in the standard library of even more iterator factories, such as strings.SplitSeq:

// SplitSeq returns an iterator over all substrings of s separated by sep.
// The iterator yields the same strings that would be returned by Split(s, sep),
// but without constructing the slice.
// It returns a single-use iterator.
func SplitSeq(s, sep string) iter.Seq[string]

If you’re not familiar with the syntax and semantics of iterators in Go 1.23+, I recommend you peruse Ian Lance Taylor’s introductory post published on the Go blog.

Once you get past the first impression of bewilderment (“Why so many funcs?!”), it’s pretty smooth sailing. Moreover, callers of iterators typically are isolated from whatever complexity is involved in their implementation.

As a first example, consider the program below (playground), which features a factory function named fib0 that produces an iterator over the sequence of Fibonacci numbers:

package main

import (
	"fmt"
	"iter"
)

func fib0() iter.Seq[int] {
	return func(yield func(int) bool) {
		for a, b := 0, 1; yield(a); a, b = b, a+b {
			// deliberately empty
		}
	}
}

func main() {
	for n := range fib0() {
		if n > 100 {
			break
		}
		fmt.Printf("%d ", n)
	}
}

As you can expect, the program prints the Fibonacci numbers less than 100 in increasing order:

0 1 1 2 3 5 8 13 21 34 55 89

The power of iterators ¶

The standardisation of iterators is a welcome addition to the Go language. Iterators indeed provide tangible benefits:

They promote flexibility and separation of concerns: callers can remain oblivious about how the sequence is produced and can instead focus on what to do with the data; an iterator that abstracts the paginated consumption of GitHub’s API is a good example.
They encourage encapsulation insofar as they expose data as sequences that cannot be mutated like slices and maps can.
They have the potential to boost performance: instead of materialising a data structure containing the entire body of data, they dole out elements one by one only as required by the caller, thereby promising, in many (though not all) situations, a lower latency and reduced heap allocations; they’re also more performant than iterator implementations relying on channels.
They allow for infinite sequences (e.g. the sequence of prime numbers), an affordance that finite data structures like slices and maps never could provide.

Because iterators are so powerful, they’re likely to mushroom in libraries even beyond Go’s standard library. Therefore, to forestall any confusion in the discourse about iterators, the terminology surrounding them should be as precise as possible.

Official guidance on iterator classification ¶

The phrase “single-use iterator” may have caught your eye in strings.SplitSeq’s documentation. It is explained in a section of the iter package:

Most iterators provide the ability to walk an entire sequence: when called, the iterator does any setup necessary to start the sequence, then calls yield on successive elements of the sequence, and then cleans up before returning. Calling the iterator again walks the sequence again.

Some iterators break that convention, providing the ability to walk a sequence only once. These “single-use iterators” typically report values from a data stream that cannot be rewound to start over. Calling the iterator again after stopping early may continue the stream, but calling it again after the sequence is finished will yield no values at all. Doc comments for functions or methods that return single-use iterators should document this fact:
// Lines returns an iterator over lines read from r.
// It returns a single-use iterator.
func (r *Reader) Lines() iter.Seq[string]

This passage of the documentation seemingly divides iterators into two categories. I’ll attempt to elucidate them through a couple of examples.

An example of a “pure” iterator ¶

The result of my fib0 function clearly exemplifies the first category:

Most iterators provide the ability to walk an entire sequence: when called, the iterator does any setup necessary to start the sequence, then calls yield on successive elements of the sequence, and then cleans up before returning. Calling the iterator again walks the sequence again.

If we assign fib0’s result to a variable and then range over that iterator multiple times, we indeed walk the sequence of Fibonacci numbers from the beginning each time:

package main

import (
	"fmt"
	"iter"
)

func fib0() iter.Seq[int] {
	return func(yield func(int) bool) {
		for a, b := 0, 1; yield(a); a, b = b, a+b {
			// deliberately empty
		}
	}
}

func main() {
	seq := fib0()
	for n := range seq {
		if n > 10 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 100 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 1000 {
			break
		}
		fmt.Printf("%d ", n)
	}
}

Output:

0 1 1 2 3 5 8 
0 1 1 2 3 5 8 13 21 34 55 89 
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

In my interpretation of the documentation, the first category of iterators corresponds to externally pure functions, i.e. iterators that may internally rely on mutation but display no externally observable side effects. Therefore, “pure” seems to me an apt qualifier for such iterators; “stateless” comes a close second.

Example of a “single-use” iterator ¶

What about the second category?

Some iterators break that convention, providing the ability to walk a sequence only once. These “single-use iterators” typically report values from a data stream that cannot be rewound to start over. Calling the iterator again after stopping early may continue the stream, but calling it again after the sequence is finished will yield no values at all.

Consider the example below:

func fib1() iter.Seq[int] {
	a, b := 0, 1
	return func(yield func(int) bool) {
		for ; yield(a); a, b = b, a+b {
			// deliberately empty body
		}
	}
}

At a glance, iterator factory fib1 seems almost identical to fib0, and you would be forgiven to dismiss their difference in implementation as unimportant; in fact, you’d be in very good company if you did. However, the difference between fib1 and fib0, far from being merely cosmetic, actually changes the semantics of their results. To understand why, you need some familiarity with closures.

If necessary, refer to the appendix to this post for a refresher on closures.

Recall that, in Go, iterators are functions; as such, iterators can be closures! In fib0, variables a and b are declared within the resulting iterator. By contrast, in fib1, variables a and b are free variables of the resulting iterator, which also happens to mutate those variables. Therefore, if we repeatedly range over fib1’s result (playground) as we did over fib0’s result earlier, we observe a very different behaviour:

package main

import (
	"fmt"
	"iter"
)

func fib1() iter.Seq[int] {
	a, b := 0, 1
	return func(yield func(int) bool) {
		for ; yield(a); a, b = b, a+b {
			// deliberately empty
		}
	}
}

func main() {
	seq := fib1()
	for n := range seq {
		if n > 10 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 100 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 1000 {
			break
		}
		fmt.Printf("%d ", n)
	}
}

Output:

0 1 1 2 3 5 8 
13 21 34 55 89 
144 233 377 610 987

In plain terms, the iterator “remembers” where in the sequence of Fibonacci numbers the caller last stopped and, when the caller resumes ranging over it, the iterator picks up right where it left off; because the sequence produced by this iterator cannot be rewound but can be resumed, the iterator could perhaps be described as “single-use” but “resumable”.

A problematic classification ¶

The classification of iterators suggested in the documentation is not ideal, in my opinion.

First, its lack of a term for iterators that I described earlier as “pure” is puzzling. Among all iterators, “pure” iterators arguably distinguish themselves as the easiest ones to reason about; as such, don’t they deserve their own qualifier? A parallel with dynamical systems comes to mind: linear dynamical systems are singled out from nonlinear ones precisely because they’re regarded as simpler and much easier to apprehend.

Second, I find the description of the second category confusingly imprecise and ambiguous. In particular, whether the term “single-use” is intended to encompass only some or all “impure” iterators is unclear to me… and to other Gophers too, if the results of an informal poll that I recently ran on Gophers Slack are to be believed. If the “single-use” qualifier is intended for only a subcategory of “impure” iterators, I’m not sure why this subcategory should deserve a particular focus in the documentation. And if the “single-use” qualifier is intended for all “impure” iterators, it strikes me as a misnomer; after all, many forms of “impure” iterators are possible, not all of which could reasonably be described as “single-use”, as we shall see.

A wondrous zoo of “impure” iterators ¶

In the program below, the iterator resulting from fib2 (playground) could reasonably be described as “usable twice” and “non-resumable” (or, perhaps, restarting):

package main

import (
	"fmt"
	"iter"
)

func fib2() iter.Seq[int] {
	var n int
	return func(yield func(n int) bool) {
		if n > 1 {
			return
		}
		for a, b := 0, 1; yield(a); a, b = b, a+b {
			// deliberately empty body
		}
		n++
	}
}

func main() {
	seq := fib2()
	for n := range seq {
		if n > 10 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 100 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 1000 {
			break
		}
		fmt.Printf("%d ", n)
	}
}

Output:

0 1 1 2 3 5 8 
0 1 1 2 3 5 8 13 21 34 55 89

And, in the program below, the iterator resulting from fib3 (playground) could reasonably be described as “usable twice” and “resumable”:

package main

import (
	"fmt"
	"iter"
)

func fib3() iter.Seq[int] {
	var n int
	a, b := 0, 1
	return func(yield func(n int) bool) {
		if n > 1 {
			return
		}
		for ; yield(a); a, b = b, a+b {
			// deliberately empty body
		}
		n++
	}
}

func main() {
	seq := fib3()
	for n := range seq {
		if n > 10 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 100 {
			break
		}
		fmt.Printf("%d ", n)
	}
	fmt.Println()
	for n := range seq {
		if n > 1000 {
			break
		}
		fmt.Printf("%d ", n)
	}
}

Output:

0 1 1 2 3 5 8 
13 21 34 55 89

I’m sure you could think of more variations around this theme. As soon as function purity goes out the window, the possibilities are endless.

Should iterators be “pure” whenever possible? ¶

Another question arises: if “pure” iterators are easier to reason about than “impure” ones are, shouldn’t iterators be designed as “pure” whenever possible?

Performance considerations ¶

Perhaps they should, at least when we emphasise performance as a design criterion. “Pure” iterators indeed tend to incur fewer heap allocations than their “impure” counterparts do. Consider strings.Lines as a case study; shown below is its source code in Go 1.24.3:

// Lines returns an iterator over the newline-terminated lines in the string s.
// The lines yielded by the iterator include their terminating newlines.
// If s is empty, the iterator yields no lines at all.
// If s does not end in a newline, the final yielded line will not end in a newline.
// It returns a single-use iterator.
func Lines(s string) iter.Seq[string] {
	return func(yield func(string) bool) {
		for len(s) > 0 {
			var line string
			if i := IndexByte(s, '\n'); i >= 0 {
				line, s = s[:i+1], s[i+1:]
			} else {
				line, s = s, ""
			}
			if !yield(line) {
				return
			}
		}
		return
	}
}

The resulting iterator is “impure” because it mutates s, which happens to be its only free variable. As a consequence of escape analysis, s escapes to the heap:

$ go build -gcflags='-m=2' ./strings
-snip-
strings/iter.go:18:12: s escapes to heap:
-snip-

But strings.Lines could instead have been designed to produce a “pure” iterator:

 func Lines(s string) iter.Seq[string] {
        return func(yield func(string) bool) {
+               s := s // local copy!
                for len(s) > 0 {
                        var line string
                        if i := strings.IndexByte(s, '\n'); i >= 0 {
                                line, s = s[:i+1], s[i+1:]
                        } else {
                                line, s = s, ""
                        }
                        if !yield(line) {
                                return
                        }
                }
                return
        }
 }

In this alternative version of strings.Lines, its s parameter remains a free variable of the resulting iterator, but the iterator does not mutate that variable; instead, the iterator exclusively operates on a local copy of the source string. With this simple change, escape analysis no longer concludes that variable s needs escape to the heap, and one fewer memory allocation is required. See this related comment by Alan Donovan on GitHub.

However, performance is only one possible design criterion: as Ian Lance Taylor shrewdly pointed out to me, consistency in behaviour with closely linked iterator factories is another. For example, package bytes provides a function named Lines that’s analogous to strings.Lines; shown below is its source code in Go 1.24.3:

// Lines returns an iterator over the newline-terminated lines in the byte slice s.
// The lines yielded by the iterator include their terminating newlines.
// If s is empty, the iterator yields no lines at all.
// If s does not end in a newline, the final yielded line will not end in a newline.
// It returns a single-use iterator.
func Lines(s []byte) iter.Seq[[]byte] {
	return func(yield func([]byte) bool) {
		for len(s) > 0 {
			var line []byte
			if i := IndexByte(s, '\n'); i >= 0 {
				line, s = s[:i+1], s[i+1:]
			} else {
				line, s = s, nil
			}
			if !yield(line[:len(line):len(line)]) {
				return
			}
		}
		return
	}
}

As much as I’d like to redesign bytes.Lines so as to make its product “pure”, I can’t think of an efficient way to do so. Contrary to strings, slices are mutable; moreover, multiples slices can reference the same underlying array. Therefore, even if the resulting iterator were modified to operate on a local copy of the source slice, it still wouldn’t be “pure”, as demonstrated by the following program (playground):

package main

import (
	"bytes"
	"fmt"
	"iter"
	"os"
)

func main() {
	src := []byte("foo\nbar\nbaz\n")
	seq := Lines(src)
	for s := range seq {
		os.Stdout.Write(s)
	}
	for s := range seq { // pass that mutates the source slice
		if len(s) == 0 {
			continue
		}
		s[0] = 'a'
	}
	fmt.Println()
	for s := range seq {
		os.Stdout.Write(s)
	}
}

func Lines(s []byte) iter.Seq[[]byte] {
	return func(yield func([]byte) bool) {
		s := s // local copy
		for len(s) > 0 {
			var line []byte
			if i := bytes.IndexByte(s, '\n'); i >= 0 {
				line, s = s[:i+1], s[i+1:]
			} else {
				line, s = s, nil
			}
			if !yield(line[:len(line):len(line)]) {
				return
			}
		}
		return
	}
}

Output:

foo
bar
baz

aoo
aar
aaz

For bytes.Lines to produce a pure iterator, a “deep” copy of the source slice (i.e. a slice that would reference a clone of the source slice’s backing array) would be required (playground); and such an approach seems to me like it would defeat the purpose of iterators.

If bytes.Lines can’t easily be designed to produce a “pure” iterator, strings.Lines should arguably still return an “impure” iterator as well. In general, should iterator “purity” be pursued at all costs, or should consistency with related iterators be a factor? The jury is still out.

Conclusion ¶

As Austin Clements recently wrote:

Iterators are still young in Go […]

Conventions around iterators have not yet been firmly established, let alone percolated through the Go community; there is still room for experimentation and time to iron the terminology kinks out. The sooner, the better, though, if you ask me.

Acknowledgments ¶

Thanks to Valentin Deleplace for spurring me to write this post and for reviewing an early draft of it.

Appendix: refresher on closures ¶

A closure is a function that captures variables declared in an outer lexical scope. Such variables are called free variables of the function in question. In the following example, anonymous function f reads a variable named truth that is declared in the main function:

package main

import "fmt"

func main() {
	truth := true
	f := func() bool { return truth }
    fmt.Println(f())
}

Output:

true

So far, nothing to write home about. However, the power of closures lies in their ability to update their free variables! Such closures essentially are stateful functions: functions that maintain a state (composed of their free variables) from one invocation to the next.

In the following example (playground), factory function falseAfterN returns a closure that captures a local variable named truth, updates that variable during each invocation, and varies its boolean results on the basis of the free variable’s value:

package main

import "fmt"

func falseAfterN(n uint) func() bool {
	var count uint
	return func() bool {
		if count < n {
			count++
			return true
		}
		return false
	}
}

func main() {
	const n = 4
	f := falseAfterN(n)
	for i := range 2 * n {
		fmt.Printf("%d %t\n", i, f())
	}
}

If you’ve been writing Go for some time, you’ve likely used closures, perhaps even without realising that you were. In the classic example of a concurrent program (playground) shown below, the anonymous function is a closure, since it captures (and acts on) variables i and wg, which are both declared in an outer scope:

package main

import (
	"fmt"
	"sync"
)

func main() {
	var wg sync.WaitGroup
	for i := range 10 {
		wg.Add(1)
		go func() {
			defer wg.Done()
			fmt.Println(i)
		}()
	}
	wg.Wait()
}

Pure vs. impure iterators in Go

TL;DR ¶

The advent of iterators in Go ¶

The power of iterators ¶

Official guidance on iterator classification ¶

An example of a “pure” iterator ¶

Example of a “single-use” iterator ¶

A problematic classification ¶

A wondrous zoo of “impure” iterators ¶

Should iterators be “pure” whenever possible? ¶

Performance considerations ¶

Consistency with related iterators ¶

Conclusion ¶

Acknowledgments ¶

Appendix: refresher on closures ¶