User Tools

Site Tools


Dave Orme muses about data-first development.

My current work emphasizes data engineering and analysis using Kubernetes, Clojure, Scala, Eclipse, and Google Cloud Platform or AWS.


The Cloud

Scala, Clojure, and FP

Data-First Development


Older work

Coconut Palm Software home

Donate Bitcoin:



Kubernetes, Docker, Streaming Data, Spark, Scala, Clojure, OSGi, Karaf, GCP, AWS, SQL


Everything I say here is my own opinion and not necessarily that of my employer.


Table of Contents


Write all the code examples mentioned below.


Introduce trigram exercise. For our purposes, we'll just examine the trigram-generating part of the algorithm.

Introduce TransformAndConcat as a part of solution. Use single-threaded List-based solution

  • Benefits
    • More testable code
    • Abstracts away the iteration; the algorithm becomes more apparent


Recap previous solution Notice that we

  • Take up a lot of memory
  • Process entire lists at once
  • Could do better by processing a single element of step 1, a single element of step 2, and so on
  • As long as our underlying collection supports a producer/consumer model and its iterables don't throw CoModificationException…

Might have to implement a custom collection based on BlockingArrayQueue or somesuch. @see:

Fix/improve this part of the solution (still single-threaded)

But wait! Now that we have a producer-consumer model, we can make a multithreaded IterableHelper that accepts as a parameter the size of the thread pool it should create. We'll cover that next time.

Notice that except for switching to a ProducerConsumerIterable for the initial Lines object, all of our client-side logic (in the function objects) has remained identical.


Make IterableHelper multithreaded.

Run on “War and Peace”; demonstrate speedup / scaling


  • No explicit synchronization anywhere in our client code
  • Throughout the entire series, client code has remained identical

Why can we do this? List / BlockingQueue obey the following invariants:

Explain in English:

  • Left identity over List/Queue contents
  • Right identity over List/Queue contents
  • Associativity over List/Queue contents Wait: Doesn't work with the “words2trigrams” function? Think about this more…

Say it again in math

What have we done?

  • We have defined a class of iterables that can also be used automatically in a parallel manner.
  • Why/how?
  • If they implement the transformAndConcat / transformAndFlatten / flatMap / bind interface.
  • If they obey the above three invariants / laws.

This design pattern has a name. Unfortunately, the name is rather obscure and has nothing obvious to do with what it buys us, but you've undoubtably heard that name before:

Monad. Sort of. To be explained in the next blog.


Notice that Possible/Option only encapsulates a value. What if we also want to encapsulate some status information suitable for logging?

Introduce the LoggedResult monad.

LoggedResult<T>(T result, List<IStatus> log)

Notice, we can't implement Iterable over this without losing information: either the result or the log.

But we can implement the following method:

private final A contents;
public <B, IterableB extends Iterable<B>> LoggedResult<B> transformAndFlatten(F<A,IterableB> transformer) {
  IterableB result = transformer.apply(contents);
  // How does one flatten IterableB into a new contents variable?


What if Observable<T> were Iterable-over-time?

Introduce Observable<T>

Introduce an implementation of Observable<T> that uses continuations (JYield) to become Iterable using the Consumer/Producer pattern described previously.

Is there an Observable mouse event? Down/Move/Up?

If so, rewrite the droppable widget snippet to use foreach rather than separate events…?

Now we can foreach over Observables; notice how algorithms become clearer?

java/trigram_generator_series_notes.txt · Last modified: 2014/10/17 22:08 (external edit)