Challenge: make this Go function inlinable and free of bounds checks

In this post, I challenge you to refactor a small Go function in such a way as to make it inlinable and free of bounds checks, for better performance.

Disclaimer: this post assumes version 1.24.2 of the (official) Go compiler; you may get different results with other versions of the Go compiler or with other implementations of the Go language.

Function inlining & bounds-check elimination ¶

Some familiarity with function inlining and bounds-check elimination is a prerequisite for attempting my challenge. The following three sections serve as a crash course on those topics. Feel free to skip straight to the challenge itself.

Function inlining 101 ¶

Inlining is a compiler strategy that can roughly be described as “substituting the body of a function for calls to that function”. Because it eliminates some function-call overhead, inlining often results in faster program execution. However, function inlining also increases the size of the resulting binaries, and indiscriminate inlining could cause compilers to produce prohibitively large binaries. Therefore, compilers typically restrict eligibility for inlining to functions that they deem “simple enough”.

For each function, the Go compiler allocates an inlining budget and computes an inlining cost. If a function’s inlining cost does not exceed its budget, the compiler deems the function eligible for inlining (a.k.a. inlinable) and inlines it; otherwise, it doesn’t. At the time of writing (Go 1.24.2), the default inlining budget is 80, but the compiler may, thanks to profile-guided optimisation (PGO), decide to increase the inlining budget for functions called on the hot path; for this post, let’s keep things simple and ignore PGO, though. However, the presence of some language constructs (such as recursion, “go” statements, “defer” statements, and calls to the recover function) in a function outright bars the latter from being inlined. The Go compiler’s inlining strategy is not specified by the language itself and may change from one release to the next.

The most straightforward way to determine whether a function is eligible for inlining is to run a command of the following form and inspect its output:

$ go build -gcflags '-m=2' <package-path> 2>&1 | grep <function-name>

Even if a function is inlinable, you may occasionally want to prevent it from being inlined; you can instruct the compiler not to inline a function by adding a //go:noinline directive above the function’s declaration. But you cannot force the compiler to inline arbitrary functions, for reasons explained above; a more thorough rationale for this limitation can be found in issue #21536.

By refactoring an initially non-inlinable function (and/or by relaxing its semantics), you can sometimes lower a function’s inlining cost to the point of making it eligible for inlining; however, unless you’re very familiar with the Go compiler’s inliner (and keep abreast of changes brought to it by maintainers of the Go project), you may (rightly) feel that making a function eligible for inlining is more an art than a science.

Bounds-check elimination 101 ¶

I touched upon bounds-check elimination (BCE) in a post published earlier this year on this blog; allow me to quote it:

[…] Go is said to be memory-safe; in particular, per the language specification, implementations must trigger a run-time panic if a slice-indexing operation is ever out of bounds. Such bounds checks are relatively cheap, but they are not free. When the compiler can prove that some slice access cannot be out of bounds, it may omit, for better performance, the corresponding bounds check from the resulting executable.

What I wrote about slices in that post is also true of strings, which, under the hood, are little more than slices of bytes.

Bounds-check elimination, like inlining strategies, remains an implementation detail; it is not specified by the language itself and may change from one release of the Go compiler to the next.

To identify where the Go compiler introduces bounds checks in a package, run a command of the following form:

$ go build -gcflags '-d=ssa/check_bce/debug=1' <package-path>

The presence of bounds checks is not systematically detrimental to performance, and eliminating all bounds checks from a program often proves impossible anyway. However, eliminating some bounds checks, perhaps by hoisting them out of loops, typically is conducive to faster program execution. You can find instances of hoisting bounds checks out of loops in Go’s standard library.

Despite frequent improvements to the Go compiler’s BCE logic, you sometimes need to refactor your code in order to convince the compiler to eliminate or displace some bounds checks. This too is more an art than a science, and typically requires some trial and error.

A delicate balancing act ¶

Function inlining and BCE are closely intertwined. Inlinability sometimes comes at the cost of a couple of additional bounds checks; conversely, eliminating some bounds checks may deprive a function from its eligibility for inlining. But in some lucky cases, you can have your cake and eat it too!

Here is a tip that I’ve learnt from experience: eliminating bounds checks first and achieving inlinability second tends to be easier than the other way around or doing both in one fell swoop.

Can you make this function inlinable and free of bounds checks? ¶

Consider the following source file (which I’ve uploaded to a Gist, for your convenience):

package headers

// TrimOWS trims up to n bytes of optional whitespace (OWS)
// from the start of and/or the end of s.
// If no more than n bytes of OWS are found at the start of s
// and no more than n bytes of OWS are found at the end of s,
// it returns the trimmed result and true.
// Otherwise, it returns some unspecified string and false.
func TrimOWS(s string, n int) (string, bool) {
	if s == "" {
		return s, true
	}
	s, ok := trimLeftOWS(s, n)
	if !ok {
		return "", false
	}
	s, ok = trimRightOWS(s, n)
	if !ok {
		return "", false
	}
	return s, true
}

func trimLeftOWS(s string, n int) (string, bool) {
	var i int
	for len(s) > 0 {
		if i > n {
			return "", false
		}
		if !isOWS(s[0]) {
			break
		}
		s = s[1:]
		i++
	}
	return s, true
}

func trimRightOWS(s string, n int) (string, bool) {
	var i int
	for len(s) > 0 {
		if i > n {
			return "", false
		}
		if !isOWS(s[len(s)-1]) {
			break
		}
		s = s[:len(s)-1]
		i++
	}
	return s, true
}

// See https://httpwg.org/specs/rfc9110.html#whitespace.
func isOWS(b byte) bool {
	return b == '\t' || b == ' '
}

The implementation of TrimOWS above incurs no bounds checks, as evidenced by the empty output of the following command:

$ go build -gcflags '-d=ssa/check_bce/debug=1'
$

However, it is not inlinable:

$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: cannot inline TrimOWS: function too complex: \
    cost 130 exceeds budget 80

Here is my challenge to you: can you refactor function TrimOWS so that it be inlinable (without relying on profile-guided optimisation) and free of bounds checks?

To allow you to catch bugs as you refactor the code, I’ve included a suite of unit tests; and to allow you to later measure performance changes between implementations, I’ve also included some benchmarks:

package headers

import "testing"

const maxOWS = 1 // number of OWS bytes tolerated on either side

var trimOWStests = []struct {
	desc string
	s    string
	want string
	ok   bool
}{
	{
		desc: "empty",
		s:    "",
		want: "",
		ok:   true,
	}, {
		desc: "no OWS",
		s:    "foo",
		want: "foo",
		ok:   true,
	}, {
		desc: "internal OWS",
		s:    "foo  \t\tbar",
		want: "foo  \t\tbar",
		ok:   true,
	}, {
		desc: "leading and trailing OWS",
		s:    "\tfoo ",
		want: "foo",
		ok:   true,
	}, {
		desc: "a tolerated amount of OWS around non-OWS",
		s:    " a ",
		want: "a",
		ok:   true,
	}, {
		desc: "a tolerated amount of OWS and nothing else",
		s:    "\t ",
		want: "",
		ok:   true,
	}, {
		desc: "non-OWS whitespace",
		s:    "\nfoo\t",
		want: "\nfoo",
		ok:   true,
	}, {
		desc: "too much leading OWS",
		s:    " \tfoo\t",
		ok:   false,
	}, {
		desc: "too much trailing OWS",
		s:    " foo\t ",
		ok:   false,
	}, {
		desc: "too much leading and trailing OWS",
		s:    " \tfoo\t ",
		ok:   false,
	},
}

func TestTrimOWS(t *testing.T) {
	for _, tc := range trimOWStests {
		f := func(t *testing.T) {
			allocs := testing.AllocsPerRun(10,
				func() { str, truth = TrimOWS(tc.s, maxOWS) },
			)
			if allocs > 0 {
				const tmpl = "TrimOWS(%q, %d) allocates %.2f objects"
				t.Errorf(tmpl, tc.s, maxOWS, allocs)
			}
			got, ok := TrimOWS(tc.s, maxOWS)
			if !tc.ok && ok {
				// In cases where TrimOWS must fail,
				// its string result is unspecified.
				const tmpl = "TrimOWS(%q, %d): _, %t; want _, %t"
				t.Fatalf(tmpl, tc.s, maxOWS, ok, tc.ok)
			}
			if tc.ok && (!ok || got != tc.want) {
				const tmpl = "TrimOWS(%q, %d): %q, %t; want %q, %t"
				t.Errorf(tmpl, tc.s, maxOWS, got, ok, tc.want, tc.ok)
			}
		}
		t.Run(tc.desc, f)
	}
}

func BenchmarkTrimOWS(b *testing.B) {
	for _, tc := range trimOWStests {
		f := func(b *testing.B) {
			for range b.N {
				str, truth = TrimOWS(tc.s, maxOWS)
			}
		}
		b.Run(tc.desc, f)
	}
}

var ( // sinks for avoiding dead-code elimination in benchmarks
	str   string
	truth bool
)

To control the benchmark loop, I’m deliberately relying on b.N rather than on b.Loop because b.Loop currently precludes inlining; see issue #73137.

Ready for some micro-optimisation? Clone the Gist and cd into your clone:

$ git clone https://gist.github.com/jub0bs/0aca3c6bd041e838fe4add58a060be35
$ cd 0aca3c6bd041e838fe4add58a060be35

And off you go!

Hints ¶

Here is series of hints that you might find useful in case you get stuck; click on each hint to reveal it, as needed.

Hint 0

Try eliminating unnecessary branching. In particular, the initial empty-string check in TrimOWS is redundant; it’s an attempt at micro-optimisation, but its performance benefits are moot. Moreover, TrimOWS doesn’t need to check the boolean results of trimRightOWS; instead, it could simply bubble up trimRightOWS’s results to the caller.

 func TrimOWS(s string, n int) (string, bool) {
-       if s == "" {
-               return s, true
-       }
        s, ok := trimLeftOWS(s, n)
        if !ok {
                return "", false
        }
-       s, ok = trimRightOWS(s, n)
-       if !ok {
-               return "", false
-       }
-       return s, true
+       return trimRightOWS(s, n)
 }

These changes don’t introduce any bounds check and somewhat reduce TrimOWS’s inlining cost:

$ go build -gcflags '-d=ssa/check_bce/debug=1'
$
$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: cannot inline TrimOWS: function too complex: \
    cost 121 exceeds budget 80

Hint 1

Although helper functions trimLeftOWS and trimRightOWS are themselves inlinable, manually inlining them in TrimOWS helps inlinability without hurting readability:

 func TrimOWS(s string, n int) (string, bool) {
-       s, ok := trimLeftOWS(s, n)
-       if !ok {
-               return "", false
-       }
-       return trimRightOWS(s, n)
-}
-
-func trimLeftOWS(s string, n int) (string, bool) {
        var i int
        for len(s) > 0 {
                if i > n {
                        return "", false
                }
                if !isOWS(s[0]) {
                        break
                }
                s = s[1:]
                i++
        }
-       return s, true
-}
-
-func trimRightOWS(s string, n int) (string, bool) {
-       var i int
+       i = 0
        for len(s) > 0 {
                if i > n {
                        return "", false
                }
                if !isOWS(s[len(s)-1]) {
                        break
                }
                s = s[:len(s)-1]
                i++
        }
        return s, true
 }

Not only do these changes not introduce any bounds checks, but they dramatically reduce TrimOWS’s inlining cost:

$ go build -gcflags '-d=ssa/check_bce/debug=1'
$
$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: cannot inline TrimOWS: function too complex: \
    cost 88 exceeds budget 80

Manually inlining my little isOWS function at its two call sites is tempting and does somewhat reduce TrimOWS’s inlining cost, but doing so isn’t ideal because it violates the DRY principle:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

Hint 2

Try traversing the string rather than repeatedly munching at it:

 func TrimOWS(s string, n int) (string, bool) {
        var i int
-       for len(s) > 0 {
+       for i = 0; i < len(s); i++ {
                if i > n {
                        return "", false
                }
-               if !isOWS(s[0]) {
+               if !isOWS(s[i]) {
                        break
                }
-               s = s[1:]
-               i++
        }
-       i = 0
-       for len(s) > 0 {
-               if i > n {
+       var j int
+       for j = len(s) - 1; j >= i; j-- {
+               if j < len(s)-1-n {
                        return "", false
                }
-               if !isOWS(s[len(s)-1]) {
+               if !isOWS(s[j]) {
                        break
                }
-               s = s[:len(s)-1]
-               i++
        }
-       return s, true
+       return s[i : j+1], true
 }

Unfortunately, these changes are somewhat counterproductive because, not only do they increase TrimOWS’s inlining cost, but they also introduce bounds checks!

$ go build -gcflags '-d=ssa/check_bce/debug=1'
 gist.github.com/jub0bs/0aca3c6bd041e838fe4add58a060be35
./ows.go:24:14: Found IsInBounds
./ows.go:28:10: Found IsSliceInBounds
$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: cannot inline TrimOWS: function too complex: \
    cost 91 exceeds budget 80

Fret not! Performance optimisation seldom is a straight line, and refactorings initially perceived as setbacks may in fact unlock new opportunities for improvement.

Hint 3

The compiler feels the need to introduce bounds checks because it does not (yet?) keep track of enough context to realise that j is always within bounds of s. To help the compiler, we can substring and re-assign s within the loops rather than later; this change also allows us to reduce the scope of the two integer variables controlling the loops and simply return the empty string if execution reaches the bottom of TrimOWS:

 func TrimOWS(s string, n int) (string, bool) {
-       var i int
-       for i = 0; i < len(s); i++ {
+       for i := 0; i < len(s); i++ {
                if i > n {
                        return "", false
                }
                if !isOWS(s[i]) {
+                       s = s[i:]
                        break
                }
        }
-       var j int
-       for j = len(s) - 1; j >= i; j-- {
-               if j < len(s)-1-n {
+       for i := len(s) - 1; i >= 0; i-- {
+               if i < len(s)-1-n {
                        return "", false
                }
-               if !isOWS(s[j]) {
-                       break
+               if !isOWS(s[i]) {
+                       return s[:i+1], true
                }
        }
-       return s[i : j+1], true
+       return "", true
 }

Bounds checks, be gone!

$ go build -gcflags '-d=ssa/check_bce/debug=1'
$

TrimOWS’s inlining cost has again moved in the wrong direction, though:

$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: cannot inline TrimOWS: function too complex: \
    cost 93 exceeds budget 80

Again, don’t let this temporary setback stop you. Small steps!

Hint 4

Use range-over-int loops wherever possible, as they tend to be cheaper (and easier to reason about!) than classic three-clause loops:

 func TrimOWS(s string, n int) (string, bool) {
-       for i := 0; i < len(s); i++ {
+       for i := range len(s) {
                if i > n {
                        return "", false
                }

No bounds checks have been introduced, and TrimOWS’s inlining cost is now at its lowest so far:

$ go build -gcflags '-d=ssa/check_bce/debug=1'
$
$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: cannot inline TrimOWS: function too complex: \
    cost 86 exceeds budget 80

Back on track!

Hint 5

Using naked returns can help reduce a function’s inlining cost:

-func TrimOWS(s string, n int) (string, bool) {
+func TrimOWS(s string, n int) (_ string, _ bool) {
        for i := range len(s) {
                if i > n {
-                       return "", false
+                       return
                }
                if !isOWS(s[i]) {
                        s = s[i:]
                        break
                }
        }
        for i := len(s) - 1; i >= 0; i-- {
                if i < len(s)-1-n {
-                       return "", false
+                       return
                }
                if !isOWS(s[i]) {
                        return s[:i+1], true
                }
        }
        return "", true
 }

Still no bounds checks to deplore, and TrimOWS’s inlining cost is lower than ever:

$ go build -gcflags '-d=ssa/check_bce/debug=1'
$
$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: cannot inline TrimOWS: function too complex: \
    cost 82 exceeds budget 80

Down to 82, now. We’re breathing rare air, here!

Hint 6

At this stage, you may be (as I was) running out of ideas for further reducing TrimOWS’s inlining cost. But try offsetting the loop variable of the second loop by one:

 func TrimOWS(s string, n int) (_ string, _ bool) {
        for i := range len(s) {
                if i > n {
                        return
                }
                if !isOWS(s[i]) {
                        s = s[i:]
                        break
                }
        }
-       for i := len(s) - 1; i >= 0; i-- {
-               if i < len(s)-1-n {
+       for i := len(s); i > 0; i-- {
+               if i < len(s)-n {
                        return
                }
-               if !isOWS(s[i]) {
-                       return s[:i+1], true
+               if !isOWS(s[i-1]) {
+                       return s[:i], true
                }
        }
        return "", true
 }

And voilà! TrimOWS still doesn’t incur any bounds checks but is now inlinable:

$ go build -gcflags '-d=ssa/check_bce/debug=1'
$
$ go build -gcflags '-m=2' 2>&1 | grep TrimOWS
./ows.go:9:6: can inline TrimOWS with cost 78 as: \
    func(string, int) (string, bool) { for loop; for loop; return "", true }

Mission accomplished. 🎉

My solution ¶

Below is my complete final implementation. As required, it is both inlinable and free of bounds checks; besides, it remains easy to read, in my opinion.

Final implementation

package headers

// TrimOWS trims up to n bytes of optional whitespace (OWS)
// from the start of and/or the end of s.
// If no more than n bytes of OWS are found at the start of s
// and no more than n bytes of OWS are found at the end of s,
// it returns the trimmed result and true.
// Otherwise, it returns some unspecified string and false.
func TrimOWS(s string, n int) (_ string, _ bool) {
	for i := range len(s) {
		if i > n {
			return
		}
		if !isOWS(s[i]) {
			s = s[i:]
			break
		}
	}
	for i := len(s); i > 0; i-- {
		if i < len(s)-n {
			return
		}
		if !isOWS(s[i-1]) {
			return s[:i], true
		}
	}
	return "", true
}

// See https://httpwg.org/specs/rfc9110.html#whitespace.
func isOWS(b byte) bool {
	return b == '\t' || b == ' '
}

You might have gotten there by a less circuitous way than I did, or you may have arrived at an entirely different implementation. However, if your implementation of TrimOWS is correct, inlinable, free of bounds checks, and has an inlining cost below 78 (without inlining the isOWS function), I want to hear from you; do let me know on Bluesky, Mastodon, Gophers Slack, or Reddit.

Benchmark results ¶

Were all these contortions worth the time and effort? As often with optimisation, it depends™. Here are some benchmark results pitting my final implementation against my initial one:

goos: darwin
goarch: amd64
pkg: gist.github.com/jub0bs/0aca3c6bd041e838fe4add58a060be35
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                                                     │     old     │                 new                 │
                                                     │   sec/op    │   sec/op     vs base                │
TrimOWS/empty-8                                        2.464n ± 4%   2.208n ± 3%  -10.41% (p=0.000 n=20)
TrimOWS/no_OWS-8                                       3.955n ± 1%   3.624n ± 8%   -8.36% (p=0.000 n=20)
TrimOWS/internal_OWS-8                                 3.946n ± 1%   3.638n ± 1%   -7.79% (p=0.000 n=20)
TrimOWS/leading_and_trailing_OWS-8                     5.283n ± 1%   5.890n ± 1%  +11.49% (p=0.000 n=20)
TrimOWS/a_tolerated_amount_of_OWS_around_non-OWS-8     6.666n ± 0%   5.915n ± 1%  -11.27% (p=0.000 n=20)
TrimOWS/a_tolerated_amount_of_OWS_and_nothing_else-8   5.138n ± 1%   6.901n ± 1%  +34.30% (p=0.000 n=20)
TrimOWS/non-OWS_whitespace-8                           4.876n ± 1%   4.697n ± 1%   -3.66% (p=0.000 n=20)
TrimOWS/too_much_leading_OWS-8                         4.243n ± 1%   3.911n ± 0%   -7.82% (p=0.000 n=20)
TrimOWS/too_much_trailing_OWS-8                        7.001n ± 2%   6.024n ± 1%  -13.96% (p=0.000 n=20)
TrimOWS/too_much_leading_and_trailing_OWS-8            4.233n ± 1%   3.887n ± 1%   -8.17% (p=0.000 n=20)
geomean                                                4.603n        4.445n        -3.43%

The latter is overall faster than the former. You could dismiss the final implementation by arguing that it saves only a fraction of a nanosecond in each benchmark case, but you should pay closer attention to the relative difference than to the absolute difference; and such seemingly insignificant improvements may matter in a performance-critical scenario.

Can this speedup of execution really be ascribed to inlinability, though? To find out, I slapped a //go:noinline directive and re-ran the benchmarks. According to the results, inlinability helps a lot: my final implementation, when disallowed from being inlined, turns out to be overall slower than my initial, non-inlinable implementation.

You may have noticed one fly in the ointment: although the final implementation’s benchmark results are positive overall, they consist in a mix of improvements and regressions. I can think of one ultimate refactoring that, without introducing bounds checks or compromising eligibility to inlining, speeds up TrimOWS’s execution in all cases. I’m including some benchmark results below as a teaser, but I’ll leave this mysterious change as an exercise.

goos: darwin
goarch: amd64
pkg: gist.github.com/jub0bs/0aca3c6bd041e838fe4add58a060be35
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                                                     │     old     │               latest                │
                                                     │   sec/op    │   sec/op     vs base                │
TrimOWS/empty-8                                        2.464n ± 4%   1.541n ± 4%  -37.46% (p=0.000 n=20)
TrimOWS/no_OWS-8                                       3.955n ± 1%   3.401n ± 1%  -14.01% (p=0.000 n=20)
TrimOWS/internal_OWS-8                                 3.946n ± 1%   3.401n ± 0%  -13.81% (p=0.000 n=20)
TrimOWS/leading_and_trailing_OWS-8                     5.283n ± 1%   5.046n ± 2%   -4.49% (p=0.000 n=20)
TrimOWS/a_tolerated_amount_of_OWS_around_non-OWS-8     6.666n ± 0%   4.897n ± 2%  -26.54% (p=0.000 n=20)
TrimOWS/a_tolerated_amount_of_OWS_and_nothing_else-8   5.138n ± 1%   3.668n ± 5%  -28.61% (p=0.000 n=20)
TrimOWS/non-OWS_whitespace-8                           4.876n ± 1%   4.093n ± 2%  -16.06% (p=0.000 n=20)
TrimOWS/too_much_leading_OWS-8                         4.243n ± 1%   3.971n ± 3%   -6.40% (p=0.000 n=20)
TrimOWS/too_much_trailing_OWS-8                        7.001n ± 2%   7.096n ± 1%        ~ (p=0.172 n=20)
TrimOWS/too_much_leading_and_trailing_OWS-8            4.233n ± 1%   3.946n ± 2%   -6.78% (p=0.000 n=20)
geomean                                                4.603n        3.861n       -16.12%

Can and should regressions be automatically detected? ¶

Manually verifying that a function remains inlinable and free of bounds checks quickly becomes tedious, to the point that I (and other Gophers) sometimes wish for compiler directives that would automatically alert me to such regressions. For instance, you could imagine a function directive for marking a function as “intended for inlining” and that would cause compilation to fail if the compiler in fact deemed the function ineligible for inlining:

//go:inlinable
func TrimOWS(s string, n int) (_ string, _ bool) { /* ... */ }

You could also imagine a compiler directive that would cause compilation to fail if the compiler detected bounds checks on a given line:

		if !isOWS(s[i-1]) { //go:nobc

Alternatively, you could imagine a more granular compiler directive that would fail compilation if a line of interest incurred a prohibitive (and configurable) number of bounds checks:

func Cut(s, sep string) (before, after string, found bool) {
	if i := Index(s, sep); i >= 0 {
		return s[:i], s[i+len(sep):], true //go:maxbce:1
	}
	return s, "", false
}

However, I think that such directives are unlikely to ever be supported by the Go compiler. Recall that neither the inlining logic nor the bounds-check-elimination logic are enshrined in the language specification; they’re mere implementation details of the compiler and, as such, they may (and do, in my experience) evolve. The introduction of such compiler directives as discussed above would open the undesirable possibility that existing programs break as a result of changes to the compiler, a possibility that would violate Go 1.0’s compatibility promise.

Third-party tools, such as Jordan Lewis’s gcassert command, purport (among other goals) to fill the gap left by the absence of such compiler directives. However, I must admit that I’m reluctant to pepper my code with third-party annotations; doing so would conflict with the minimalism that I’ve grown accustomed to since I took my first step in the world of Go.

And perhaps a good suite of benchmarks is all I need. After all, some inlinability regressions and bounds-check-elimination regressions may only have a negligible effect on performance and may not be worth attending to.

Takeaways ¶

Eliminating most bounds checks from a function while also making it eligible for inlining can unlock an execution speedup.
Through trial and error, a series of simple refactorings can lower a function’s inlining cost and reduce the number of expensive bounds checks deemed necessary by the Go compiler.
Micro-optimisation sometimes is a futile endeavour but often proves intellectually rewarding.

If you’d like to learn more about inlining and bounds-check elimination, I recommend you peruse Tapir Liu’s Go Optimizations 101 e-book and watch the talk that Agniva de Sarker gave at GopherCon 2020.

I hope that you’ve enjoyed this challenge. Till next time.

Acknowledgments ¶

Thanks to Laurent Demailly for kindly alerting me to the presence of a bug, which I’ve now fixed, and for his constructive feedback about the design of this challenge.

Edit (2025/08/07): Hugo Chargois has since identified an infelicity in my spec and test suite and came up with an alternative that further reduces the inlining cost (down to 74). Do check out his blog post about his solution.