Java Streams
Java streams have been around since Java 81. Java streams have been extensively covered by tutorials, and documentations 234 and are heavily used in the wild5. This blog post exposes why java streams are so important for the java ecosystem.
The main point developed is that, compared to imperative for
and if
,
Java streams produce “better” code at very little cost.
Streams Complexity
Before entering into the main argument, it is important to note that Java streams complexify Java. It complexifies the code in the sense that it adds a DSL language on top of existing Java. To new programmers, it makes Java harder; it adds something new to learn before being comfortable in the ecosystem. As demonstrated by the ubiquity of Java stream use in popular java projects5, this complexity is there to stay. It needs to be embraced as a major part of the Java programming language. In 2024, each Java programmer must be very comfortable with the streaming API.
Streams are cheaply composible
One of the best features of Java stream is that they compose with little overhead. The overhead is little because streams are lazy and only do the computation when a “terminal operation” is called6. How a programmer builds the pipeline has little effect on the performance of the pipeline itself.
To illustrate the point, let’s look at the below code that extracts recent Taylor Swift songs from a list of albums.
List<Song> newTaylorSwift = new ArrayList<>();
for(Album album: albums){
if(album.releaseYear() > 2020){
for(Song song: album.songs()){
if(song.composers().contains("taylor swift")){
newTaylorSwift.add(song);
}
}
}
}
The above example has high cyclomatic complexity and could be improved by using methods; one method to extract recent albums, and one method to extract Taylor Swift’s songs.
public List<Album> getRecentAlbums(List<Album> albums){
List<Album> ret = new ArrayList<>();
for(Album album: albums){
if(album.releaseYear() > 2020){
ret.add(album);
}
}
return ret;
}
public List<Album> getTaylorSwiftSongs(List<Album> albums){
List<Song> ret = new ArrayList<>();
for(Album album: albums){
for(Song: song: album.songs()){
if(song.composers().contains("taylor swift")){
ret.add(song);
}
}
}
return ret;
}
List<albums> recentAlbums = getRecentAlbums(albums);
List<Song> newTaylorSwift = getTaylorSwiftSongs(recentAlbums);
An issue in the above code is the creation of the recentAlbums
collection.
The recentAlbums
list is only used to extract Taylor Swift’s
songs but could have a large memory footprint.
Using streams in similar method extractions will improve the memory footprint of the application.
public Stream<Album> getRecentAlbums(Stream<Album> albums){
return albums.filter(album -> album.releaseYear() > 2020);
}
public Stream<Song> getTaylorSwiftSongs(Stream<Album> albums){
return albums
.flatMap(album -> album.songs().stream())
.filter(song -> song.composers().contains("taylor swift"));
}
Stream<albums> recentAlbums = getRecentAlbums(albums.stream());
List<Song> newTaylorSwift = getTaylorSwiftSongs(recentAlbums).toList();
In the above code, the memory footprint of recentAlbums
is negligible
compared to its collection counterpart. In addition, avoiding the
intermediate collection’s creation could speed up the processing.
Streams produces code that is easier to read
Everyone agrees that “code readability” is subjective. It is easy to find multiple posts from people arguing that Java streams are easier or harder to read. Nonetheless, some data points indicates that streams are actually easier to read.
Streams Are Generally Well Used
First, according to Nostas et al., 2021 5, Java streams are usually used in a simple way. Parallel streams are rarely used and the majority of the used apis are simple maps, filters, and flat maps. Moreover, Khatchadourian et al., 20207 found that stream pipelines are usually not causing side effects. Putting these two facts together makes a strong argument that reading stream pipelines should be easy for an average Java programmer.
Streams Have Low Cognitive Complexity
The second strong argument behind the readability of Java streams is that measurable metrics are usually better with Java streams compared to their imperative analogs. The following discussion will be made around Sonar’s cognitive complexity8 but could have been made with other similar metrics such as the cyclomatic complexity9.
The three examples above display a good example of how streams diminish cognitive complexity8. The pure imperative implementation has a cognitive complexity of 10. The implementation that breaks down the main loop into multiple methods brings down the total cognitive complexity to 9. The total cognitive complexity of the stream implementation is zero. Clearly, these numbers do not say everything, but, for sure, they say something.
Streams Are Strongly Typed
A subjective argument is that one can see Java streams as type-checked loops. When reading a Java stream pipeline, it is clear from the start what the output of the pipeline is. When writing a Java stream pipeline, the compiler checks the validity of the pipeline. Java is a strongly-ish typed programming language and a programmer should leverage its type system.
Performance
No arguments will be made around Java stream performences. The blog’s author tried to make some benchmarks and they have all been inconclusive10; benchmark results varied vastly depending of the used JDK and hardware. Nonetheless, three things must be said about Java stream performance.
First, Java streams are part of the Java standard library and have been implemented by, arguably, the best Java programmers11. The implementation of Java streams is as fast as it can get.
Second, the performance of Java streams will not alter the “big O”12 of a loop. An iterator that is $O(n)$ will not become $O(n^2)$ because it is iterated on via streams. It will add an extrat $k_1$ to the loop itself plus and extra $k_2$ to each iteration in the loop.
Third, pretending that “iterating” is a bottleneck is an extraordinary statement that requires extraordinary evidence.
Despite the above points, it is important to state that Java streams will create more objects and, in particular situations, it might put pressure on the garbage collector.
Good practices
As with everything in programming, tools come with good practices that should generally be followed and sometimes be broken.
Clearly, the main reason to use stream pipeline over imperative programming is to make the program easier to read for other programmers13. It is fair to assume the “other programmer” is familiar with Java stream but it is not fair to assume the “other programmer” had a good night of sleep and has a lot of time to read the code.
The below rules try to codify how to make stream pipelines readable.
Good Pratice 1: Stream Pipeline Should Be Short
There’s no reason to break Uncle’s Bob rule of “Functions should be small. The second rule of functions is that they should be smaller than that”. A stream pipeline should be very short and its length should not cause barriers to be read rapidly.
Sometimes, it is possible to have a “long” stream pipeline but a “long” pipeline should never be “wide” and a “wide” stream pipeline should never be “long”.
public IntStream leapYear(){
return IntStream.range(-5000, 5000)
.filter(year -> year != 0)
.filter(year -> year % 4 == 0)
.filter(year -> year % 100 != 0);
}
Good Pratice 2: Use Lambda Expression Only For Trivial Functions
Lambda expressions (arrow functions) should be kept for trivial
usage such as (x) -> x > 18
or to use a closure
(person) -> this.isAdult(person, country)
. Never use lambda for fancy stuff.
For instance the above leapYear
method
contains a bug. The bug should not be fixed by making the lambda expression
more complexe but by delegated to a dedicated method.
public boolean is100YearLeapYear(int year){
return year % 400 != 0 && year % 100 == 0;
}
public IntStream leapYear(){
return IntStream.range(-5000, 5000)
.filter(year -> year != 0)
.filter(year -> year % 4 == 0)
.filter(this::is100YearLeapYear);
}
Good Pratice 3: Streams Should Be Stateless
Streams should not modify state. Each function called by the stream pipeline should be pure or only modify the passed element.
Consequently, the use of .forEach
should be avoided. forEach
shouts “side effects” as do
in Clojure14.
Good Pratice 4: Becareful When Using Parallel Streams
People are not used to parallel streams, becareful with them.
Conclusions
The point has been made that Java streams are an important tool in the Java programmer toolbox. Now, the curious programmer will be interresting into how similar pipelines are built in other programming languages.
Many other programming languages has opted out for a different approach to
“iteration” namely using a yield
statements151617.
Golang programmers can use channels18. The coolest of all approach is
the Clojure’s tranducers19. Rich Hickey’s
presentation about
transducers worth every minute.
https://en.wikipedia.org/wiki/Java_version_history#Java_SE_8 ↩︎
https://reflectoring.io/comprehensive-guide-to-java-streams/ ↩︎
https://www.researchgate.net/publication/353738678_How_do_developers_use_the_Java_Stream_API ↩︎ ↩︎ ↩︎
https://ia903201.us.archive.org/14/items/oapen-20.500.12657-37725/2020_Book_FundamentalApproachesToSoftwar.pdf, page 97 ↩︎
https://www.sonarsource.com/docs/CognitiveComplexity.pdf ↩︎ ↩︎
https://gitlab.com/all-dressed-programming/stream-examples/ ↩︎
https://github.com/openjdk/jdk/blame/master/src/java.base/share/classes/java/util/stream/Stream.java ↩︎
https://www.amazon.ca/Effective-Java-3rd-Joshua-Bloch/dp/0134685997 ↩︎
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/yield ↩︎
https://realpython.com/introduction-to-python-generators/ ↩︎
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/yield ↩︎