User Tools

Site Tools


Sidebar

Dave Orme muses about data-first development.

My current work emphasizes data engineering and analysis using Kubernetes, Clojure, Scala, Eclipse, and Google Cloud Platform or AWS.


Blog

The Cloud

Scala, Clojure, and FP

Data-First Development

Agile

Older work

Coconut Palm Software home


Donate Bitcoin:

1Ecnr9vtkC8b9FvmQjQaJ9ZsHB127UzVD6

Keywords:

Kubernetes, Docker, Streaming Data, Spark, Scala, Clojure, OSGi, Karaf, GCP, AWS, SQL

Disclaimer:

Everything I say here is my own opinion and not necessarily that of my employer.

blog:lazy_lazy_iterables

Lazy, Lazy Iterables

In Java, Iterable<T> is almost a Monad. Even so, it's still useful in a lot of ways that a monad is useful–such as for abstracting transformations from collections of one type to collections of another.

(Whaaaaaiiiiit! You used the word “MONAD”! Okay, I admit it; I did. But guess what? Although I'm going to talk about Monads here a bit, I'm NOT going to assume that you know what they are in order to get value from this post. Feel better yet?)

In Java 8, the Streams API helps a lot, but even without Java 8 (which I can't use just yet), it's still a really useful technique. One place I've found it particularly useful has been as an abstraction for data input.

Suppose you have a cluster and want to run commands remotely through SSH and collect the output. Assuming that you have set up SSH keys properly, something like the following is really useful:

SshClient remoteServer = new SshClient("username", "host", 22);
Iterable<String> lines = remoteServer.relay("tail /var/log/serverlog");

Traditionally, a construct like this would SSH to the server, run the specified command on the remote machine, collect the results, and return the resulting collection as an Iterable<String> of lines.

Simple and useful, yes?

But we can make it better than that! How? What about this?

SshClient remoteServer = new SshClient("username", "host", 22);
Iterable<String> lines = remoteServer.relay("tail -f /var/log/serverlog");

“Wait?” you ask, “That's exactly the same code as before except that the 'relay' command won't ever terminate!”

Not so fast.

What if that “relay” command returned immediately after initiating the remote session and then returned lines in the “lines” Iterable as they became available?

In other words, what if “relay” implemented an infinite lazy list when appropriate?

“Then you would never close the SSH connection and might leak resources,” someone might object.

Fair comment. But what if that “lines” Iterable also implemented Closeable? I think this solves all of the technical challenges…

Then all that's needed is something like Guava's “Iterables” class to finish the Monad implementation for Java's Iterables so we can transform, transformAndFlatten, and so on, directly over an Iterable…

Some will point out that the Guava engineers already use lazy Iterables.

You're right. I'm writing this because I want to tip my hat to them and give the idea some more visibility.

Here's the link: https://code.google.com/p/guava-libraries/wiki/CollectionUtilitiesExplained

Now go have fun!

~~LINKBACK~~

~~DISQUS~~

blog/lazy_lazy_iterables.txt · Last modified: 2014/10/19 22:35 by djo