User Tools

Site Tools


Sidebar

Dave Orme muses about data-first development.

My current work emphasizes data engineering and analysis using Kubernetes, Clojure, Scala, Eclipse, and Google Cloud Platform or AWS.


Blog

The Cloud

Scala, Clojure, and FP

Data-First Development

Agile

Older work

Coconut Palm Software home


Donate Bitcoin:

1Ecnr9vtkC8b9FvmQjQaJ9ZsHB127UzVD6

Keywords:

Kubernetes, Docker, Streaming Data, Spark, Scala, Clojure, OSGi, Karaf, GCP, AWS, SQL

Disclaimer:

Everything I say here is my own opinion and not necessarily that of my employer.

blog:enterprise_clojure_and_specs

'Enterprise Clojure' and Specs

Over the past two years I have been using Clojure to deliver an Extract-Transform-Load (ETL) pipeline. During this time, I have spoken with a number of other developers who use Clojure to deliver larger-scale applications, and among some of these developers a consensus has arisen: “Clojure is too hard to use at larger scales.”

In what ways might they be right? Is there anything we can do to improve this situation?

This post looks at one concern I have seen raised and proposes a direction the community could take toward a solution.

Naturally, feedback is welcome.

Awhile back I needed to parse a string into words–except that single or double quoted substrings must function as a single word. Nested quotes are not supported.

This is similar to the way command line arguments function in Unixish shells:

$ java com.hello.Hello 'Hello world'    # 'Hello world' is parsed as a single entity

Here is the Clojure code I initially wrote to parse this. When I recently revisited this code, I found it more challenging than I expected to deeply understand again. (I'll get into why in a bit.)

(^:private def delimiters [\'])
(^:private def delimiter-set (set delimiters))
 
(defn merge-strings
  "Given a vector of strings, merge strings beginning/ending with quotes into
  a single string and return a vector of standalone words and quoted strings.
  Nested / unbalanced quotes will return undefined results."
  [[result delimiter merging] next]
 
  (let [start (first (seq next))
        end   (last (seq next))]
    (cond
      (and ((set delimiters) start)
           ((set delimiters) end))   [(conj result next) nil ""]
      ((set delimiters) start)       [result start next]
      ((set delimiters) end)         [(conj result (str merging " " next)) nil ""]
      (nil? delimiter)               [(conj result next) nil ""]
      :else                          [result delimiter (str merging " " next)])))
 
 
(defn delimited-words
  "Split a string into words, respecting single or double quoted substrings.
  Nested quotes are not supported.  Unbalanced quotes will return undefined
  results."
  [s]
  (let [words (str/split s #"\s")
        delimited-word-machine (reduce merge-strings [[] nil ""] words)
        merged-strings (first delimited-word-machine)
        remainder (last delimited-word-machine)
        delimiter (second delimited-word-machine)]
    (if (empty? remainder)
      merged-strings
      (conj merged-strings (str remainder delimiter)))))

At the time I wrote the code, it made perfect sense to me. But I had a need to revisit it recently in order to write/update tests and needed to understand it again.

And I found myself staring at the parameter list and code of the merge-strings function, trying to understand what values each parameter could take. I was surprised at how non-obvious it was to me a few short months later.

To my thinking, this illustrated a common pain point my colleagues have expressed about Clojure, namely…

As complexity increases, the data expected in a function's parameters can quickly become non-obvious

Even though I had written a docstring for the merge-strings function, notice that because this function is a reducer, is used as internal implementation detail, and is not API, I had not rigorously described each parameter's possible values and usage within the function.

This week, I decided to use the upcoming Specs library from Clojure 1.9 to document each parameter's possible values and see if this helped with the readability and maintainability of this particular example.

(The rest of this blog assumes you've read through the Specs documentation linked above. Go ahead; I'll wait. ;-) )

Back to the code here… In my case I wanted to use Specs according to the following form from the documentation:

(defn person-name
  [person]
  {:pre [(s/valid? ::person person)]
   :post [(s/valid? string? %)]}
  (str (::first-name person) " " (::last-name person)))
 
(person-name 42)
;;=> java.lang.AssertionError: Assert failed: (s/valid? :my.domain/person person)
 
(person-name {::first-name "Elon" ::last-name "Musk" ::email "elon@example.com"})
;; Elon Musk

After trying this in a few places, I became dissatisfied with the repetitiveness of manually calling s/valid? for each (destructured) parameter value, so I wrote a macro to DRY this pattern up. (The code is in the clj-foundation project.) With the macro, the above defn can be rewritten in either of the following two ways:

(=> person-name [::person] string?
  [person]
  (str (::first-name person) " " (::last-name person)))
 
;; Or:
 
(defn person-name
  [person]
  (str (::first-name person) " " (::last-name person)))
 
(=> person-name [::person] string?)

To my eyes, this significantly enhanced the readability of the spec information added to the person-name function, so I applied the macro to my string parsing functions. That code now reads as follows:

(^:private def delimiters [\'])
(^:private def delimiter-set (set delimiters))
 
(s/def ::word-vector     (s/coll-of string?))
(s/def ::maybe-delimiter #(or (delimiter-set %)
                              (nil? %)))
(s/def ::merge-result    (s/tuple ::word-vector ::maybe-delimiter string?))
 
 
(=> merge-strings [::word-vector ::maybe-delimiter string? string?] ::merge-result
  "Given a vector of strings, merge strings beginning/ending with quotes into
  a single string and return a vector standalone words and quoted strings.
  Nested / unbalanced quotes will return undefined results."
  [[result delimiter merging] next]
 
  (let [start (first (seq next))
        end   (last (seq next))]
    (cond
      (and ((set delimiters) start)
           ((set delimiters) end))   [(conj result next) nil ""]
      ((set delimiters) start)       [result start next]
      ((set delimiters) end)         [(conj result (str merging " " next)) nil ""]
      (nil? delimiter)               [(conj result next) nil ""]
      :else                          [result delimiter (str merging " " next)])))
 
 
(=> delimited-words [string?] ::word-vector
  "Split a string into words, respecting single or double quoted substrings.
  Nested quotes are not supported.  Unbalanced quotes will return undefined
  results."
  [s]
  (let [words (str/split s #"\s")
        delimited-word-machine (reduce merge-strings [[] nil ""] words)
        merged-strings (first delimited-word-machine)
        remainder (last delimited-word-machine)
        delimiter (second delimited-word-machine)]
    (if (empty? remainder)
      merged-strings
      (conj merged-strings (str remainder delimiter)))))

With this code, it becomes easy to see that the result (destructured) parameter contains a word-vector, which must be a collection of strings. Similarly, without reading the function body, one can immediately note that the delimiter parameter is a character from the delimiter-set or nil.

And skipping to the end (of the line), one can immediately see that the entire function returns a merge-result, which is defined as a Tuple containing a word-vector, delimiter (or nil), and String–all without reading the function body.

I found that this bit of up-front information made (re)reading the body of the merge-strings function much easier, even after only a few days away from it.

Retrospective

With this in mind, I would like to offer the following thoughts about this experiment:

  • I felt the experiment was successful. I believe the code I wound up with explains the original author's intentions better than the original code.
  • Only time will validate the => macro, and I'm sure it will evolve over time. But I sincerely hope something like it makes it into Specs in the end.
  • More generally, I feel that this code illustrates how even quite straightforward functions can become opaque very quickly, and how providing explicit specifications describing what data a function accepts/provides can significantly enhance communication.

...and a word from our sponsors

In closing, I'm available for new gigs right now. If this kind of thinking and expertise is welcome on your team or on your project (whatever the language), feel free to email me using the address on the “Contacts” page off my home page.

blog/enterprise_clojure_and_specs.txt · Last modified: 2017/05/25 18:38 by djo