User Tools

Site Tools


Sidebar

Dave Orme muses about data-first development.

My current work emphasizes data engineering and analysis using Kubernetes, Clojure, Scala, Eclipse, and Google Cloud Platform or AWS.


Blog

The Cloud

Scala, Clojure, and FP

Data-First Development

Agile

Older work

Coconut Palm Software home


Donate Bitcoin:

1Ecnr9vtkC8b9FvmQjQaJ9ZsHB127UzVD6

Keywords:

Kubernetes, Docker, Streaming Data, Spark, Scala, Clojure, OSGi, Karaf, GCP, AWS, SQL

Disclaimer:

Everything I say here is my own opinion and not necessarily that of my employer.

java:trigram_generator_series_notes

Table of Contents

Prereq

Write all the code examples mentioned below.

1

Introduce trigram exercise. For our purposes, we'll just examine the trigram-generating part of the algorithm.

Introduce TransformAndConcat as a part of solution. Use single-threaded List-based solution

  • Benefits
    • More testable code
    • Abstracts away the iteration; the algorithm becomes more apparent

2

Recap previous solution Notice that we

  • Take up a lot of memory
  • Process entire lists at once
  • Could do better by processing a single element of step 1, a single element of step 2, and so on
  • As long as our underlying collection supports a producer/consumer model and its iterables don't throw CoModificationException…

Might have to implement a custom collection based on BlockingArrayQueue or somesuch. @see:

Fix/improve this part of the solution (still single-threaded)

But wait! Now that we have a producer-consumer model, we can make a multithreaded IterableHelper that accepts as a parameter the size of the thread pool it should create. We'll cover that next time.

Notice that except for switching to a ProducerConsumerIterable for the initial Lines object, all of our client-side logic (in the function objects) has remained identical.

3

Make IterableHelper multithreaded.

Run on “War and Peace”; demonstrate speedup / scaling

http://www.gutenberg.org/ebooks/2600

Notice:

  • No explicit synchronization anywhere in our client code
  • Throughout the entire series, client code has remained identical

Why can we do this? List / BlockingQueue obey the following invariants:

Explain in English:

  • Left identity over List/Queue contents
  • Right identity over List/Queue contents
  • Associativity over List/Queue contents Wait: Doesn't work with the “words2trigrams” function? Think about this more…

Say it again in math

What have we done?

  • We have defined a class of iterables that can also be used automatically in a parallel manner.
  • Why/how?
  • If they implement the transformAndConcat / transformAndFlatten / flatMap / bind interface.
  • If they obey the above three invariants / laws.

This design pattern has a name. Unfortunately, the name is rather obscure and has nothing obvious to do with what it buys us, but you've undoubtably heard that name before:

Monad. Sort of. To be explained in the next blog.

4

Notice that Possible/Option only encapsulates a value. What if we also want to encapsulate some status information suitable for logging?

Introduce the LoggedResult monad.

LoggedResult<T>(T result, List<IStatus> log)

Notice, we can't implement Iterable over this without losing information: either the result or the log.

But we can implement the following method:

private final A contents;
//...
public <B, IterableB extends Iterable<B>> LoggedResult<B> transformAndFlatten(F<A,IterableB> transformer) {
  IterableB result = transformer.apply(contents);
  for???
  // How does one flatten IterableB into a new contents variable?
}

5

What if Observable<T> were Iterable-over-time?

Introduce Observable<T>

Introduce an implementation of Observable<T> that uses continuations (JYield) to become Iterable using the Consumer/Producer pattern described previously.

Is there an Observable mouse event? Down/Move/Up?

If so, rewrite the droppable widget snippet to use foreach rather than separate events…?

Now we can foreach over Observables; notice how algorithms become clearer?

java/trigram_generator_series_notes.txt · Last modified: 2014/10/17 22:08 (external edit)