Skip to main content
Should you use Kotlin Sequences for Performance?

Should you use Kotlin Sequences for Performance?

·1016 words·5 mins
Table of Contents

In Kotlin, we’re spoilt for choice in how we model collections of items. At a high-level, we have…

Collections
#

First up, we have Collections (big C), which are your standard data structures for day to day use, such as List, Set and Map. When you want to iterate over one of these, it will nearly always result in an Iterator being created and used.

Even if you use the higher order functions, such as filter, map and any , they’ll be using a loop and iterator underneath. These operations are eagerly executed, meaning each function call immediately produces a new filtered list.

Sequences
#

Next up we have Sequences. These take the Iterator concept, but make them a first-class citizen. An Iterator is backed by a Collection, but a Sequence is a self-contained thing. Unlike collections, sequences execute lazily, processing elements only when needed. The result of a filter call on a sequence is another sequence to be executed later.

Flow
#

The last one I’ll mention (briefly) is Flow. Conceptually, I find sequences and flows similar, the difference being the suspending and concurrent nature of Flows being built on top of Coroutines, whereas sequence is single-thread and blocking.

Flows are great when you need to process data-sets asynchronously and/or concurrently. For example, fetching paginated data from an API, and processing it in chunks.

Applying transformations
#

So why have I just told you about some core Kotlin things which you probably already know? Well I’ve been under the impression that using a sequences to perform chained operations on a Collection is more performant than chained operations on a list.

The reason for hypothesis was that the majority of the higher order functions on collections will return a new Collection every time. If we look at the filter function, it is implemented like so:

public inline fun <T> Iterable<T>.filter(predicate: (T) -> Boolean): List<T> {
    return filterTo(ArrayList<T>(), predicate)
}

You can see that it filters into a new ArrayList. The more chained operations you use, the more intermediary (and garbage) collections are created.

Sequences avoid creating intermediate collections, meaning they should reduce memory overhead and improve performance in long transformation chains. IntelliJ even suggests using them for efficiency, so I assumed they were the best choice. But is that actually true?

IntelliJ code suggestion to convert chained operation to use a sequence
IntelliJ code suggestion to convert chained operation to use a sequence

This blog post was originally written as an ode to Sequence… right until I wrote a benchmark and measured it.

Benchmark
#

Let’s zoom out though, and think about a common use case when handling collections of data. Imagine that we have a database query, which returns a collection of items. In a typical application, you will get that data, apply some filtering, maybe sort the items, and finally map it to a local model class.

So let’s put this to the test. The following code structure is probably something which you’ve written hundreds of times:

object Db {
    fun getItems(): List<DbModel>
}

object DataRepository {
    fun getItemsList(): List<UiModel> {
        return Db.getItems()
            .filter { it.isEnabled }
            .map { UiModel(...) }
    }
}

You can see that the DataRepository.getItemsList() function is very simple. It literally just filters the list, and then maps the entries to some other model class.

Now let’s add in a Sequence’d version. This version converts the list to a Sequence, performs the same operations, and finally converts it to a List to return.

fun getItemsListUsingSequence(): List<UiModel> {
    return Db.getItems()
        .asSequence()
        .filter { it.isEnabled }
        .map { UiModel(it.id) }
        .toList()
}

And here’s one using Flow, although this isn’t realistic, so take this one with a pinch of salt 🧂:

suspend fun getItemsListUsingFlow(): List<UiModel> {
  return Db.getItems()
    .asFlow()
    .filter { it.isEnabled }
    .map { UiModel(it.id) }
    .toList()
}

I then wrote a quick benchmark test, using kotlinx-benchmark. The test runs each function repeatedly for a total duration of 1 second, over 10 iterations:

Sequence vs List benchmark
Sequence vs List benchmark. GitHub Gist: instantly share code, notes, and snippets.
Gist

Benchmark Results: Sequences can be slow
#

Here’s the results on my MacBook Pro M1, running on Temurin JDK 21:

TestOperations per second
List1,636,222
Sequence1,491,436
Flow1,192,928

So here’s me eating humble pie: using a sequence for simple chained operations is about 9% slower than not.

So I went ahead and tweaked each function to be more extreme, and perform a bunch more filtering and mapping. I count 7 intermediary collections created in this example, but the Flow and Sequence versions should still be creating zero. With this in mind, I expected the sequence version to pull ahead…

fun getItemsList(): List<UiModel> {
return Db.getItems()
    .filter { it.isEnabled }
    .map { UiModel(it.id) }
    .filter { true }
    .map { UiModel(it.id) }
    .filter { true }
    .map { UiModel(it.id) }
    .filter { true }
    .map { UiModel(it.id) }
}

But, no:

TestOperations per second
List663,391
Sequence364,947
Flow671,243

The sequence version is 45% slower than collection operations. 🤯

Surprisingly, Flow pulled ahead in the extreme test case. This suggests Flow may optimize and merge chained operations better than expected.

Lessons Learnt
#

  • Sequences can be slower due to per-element function call overhead. I’d go as far to say that they are nearly always slower today. The more complex your operation, the higher the cost.
  • Flows can optimize some chained operations better than expected, but don’t use them for that. Use them for their asynchronousity.
  • Collections are often the best fastest choice for performance.

Eating humble pie 🥧
#

Apologies for the many times I’ve asked my coworkers in code reviews to use asSequence to improve performance.

Update 1: Using Immutable collections
#

Kaushik made a suggestion of trying ImmutableArray in Pods4k, and indeed it was faster than Kotlin’s built-in List:

TestOperations per second
List641,356
Sequence344,254
Flow651,493
ImmutableArray772,859

Update 2: Large data set
#

As there were few people commenting to the effect of “that list is too small”, I re-ran the benchmark using 100,000 items (instead of 100). The differences grew…

TestOperations per second
List623
Sequence245
Flow792
ImmutableArray757