User Tools

Site Tools


Sidebar

Dave Orme muses about data-first development.

My current work emphasizes data engineering and analysis using Kubernetes, Clojure, Scala, Eclipse, and Google Cloud Platform or AWS.


Blog

The Cloud

Scala, Clojure, and FP

Data-First Development

Agile

Older work

Coconut Palm Software home


Donate Bitcoin:

1Ecnr9vtkC8b9FvmQjQaJ9ZsHB127UzVD6

Keywords:

Kubernetes, Docker, Streaming Data, Spark, Scala, Clojure, OSGi, Karaf, GCP, AWS, SQL

Disclaimer:

Everything I say here is my own opinion and not necessarily that of my employer.

start

'Enterprise Clojure' and Specs

Over the past two years I have been using Clojure to deliver an Extract-Transform-Load (ETL) pipeline. During this time, I have spoken with a number of other developers who use Clojure to deliver larger-scale applications, and among some of these developers a consensus has arisen: “Clojure is too hard to use at larger scales.”

In what ways might they be right? Is there anything we can do to improve this situation?

This post looks at one concern I have seen raised and proposes a direction the community could take toward a solution.

Naturally, feedback is welcome.

Awhile back I needed to parse a string into words–except that single or double quoted substrings must function as a single word. Nested quotes are not supported.

This is similar to the way command line arguments function in Unixish shells:

$ java com.hello.Hello 'Hello world'    # 'Hello world' is parsed as a single entity

Here is the Clojure code I initially wrote to parse this. When I recently revisited this code, I found it more challenging than I expected to deeply understand again. (I'll get into why in a bit.)

(^:private def delimiters [\'])
(^:private def delimiter-set (set delimiters))
 
(defn merge-strings
  "Given a vector of strings, merge strings beginning/ending with quotes into
  a single string and return a vector of standalone words and quoted strings.
  Nested / unbalanced quotes will return undefined results."
  [[result delimiter merging] next]
 
  (let [start (first (seq next))
        end   (last (seq next))]
    (cond
      (and ((set delimiters) start)
           ((set delimiters) end))   [(conj result next) nil ""]
      ((set delimiters) start)       [result start next]
      ((set delimiters) end)         [(conj result (str merging " " next)) nil ""]
      (nil? delimiter)               [(conj result next) nil ""]
      :else                          [result delimiter (str merging " " next)])))
 
 
(defn delimited-words
  "Split a string into words, respecting single or double quoted substrings.
  Nested quotes are not supported.  Unbalanced quotes will return undefined
  results."
  [s]
  (let [words (str/split s #"\s")
        delimited-word-machine (reduce merge-strings [[] nil ""] words)
        merged-strings (first delimited-word-machine)
        remainder (last delimited-word-machine)
        delimiter (second delimited-word-machine)]
    (if (empty? remainder)
      merged-strings
      (conj merged-strings (str remainder delimiter)))))

At the time I wrote the code, it made perfect sense to me. But I had a need to revisit it recently in order to write/update tests and needed to understand it again.

And I found myself staring at the parameter list and code of the merge-strings function, trying to understand what values each parameter could take. I was surprised at how non-obvious it was to me a few short months later.

To my thinking, this illustrated a common pain point my colleagues have expressed about Clojure, namely…

As complexity increases, the data expected in a function's parameters can quickly become non-obvious

Even though I had written a docstring for the merge-strings function, notice that because this function is a reducer, is used as internal implementation detail, and is not API, I had not rigorously described each parameter's possible values and usage within the function.

This week, I decided to use the upcoming Specs library from Clojure 1.9 to document each parameter's possible values and see if this helped with the readability and maintainability of this particular example.

(The rest of this blog assumes you've read through the Specs documentation linked above. Go ahead; I'll wait. ;-) )

Back to the code here… In my case I wanted to use Specs according to the following form from the documentation:

(defn person-name
  [person]
  {:pre [(s/valid? ::person person)]
   :post [(s/valid? string? %)]}
  (str (::first-name person) " " (::last-name person)))
 
(person-name 42)
;;=> java.lang.AssertionError: Assert failed: (s/valid? :my.domain/person person)
 
(person-name {::first-name "Elon" ::last-name "Musk" ::email "elon@example.com"})
;; Elon Musk

After trying this in a few places, I became dissatisfied with the repetitiveness of manually calling s/valid? for each (destructured) parameter value, so I wrote a macro to DRY this pattern up. (The code is in the clj-foundation project.) With the macro, the above defn can be rewritten in either of the following two ways:

(=> person-name [::person] string?
  [person]
  (str (::first-name person) " " (::last-name person)))
 
;; Or:
 
(defn person-name
  [person]
  (str (::first-name person) " " (::last-name person)))
 
(=> person-name [::person] string?)

To my eyes, this significantly enhanced the readability of the spec information added to the person-name function, so I applied the macro to my string parsing functions. That code now reads as follows:

(^:private def delimiters [\'])
(^:private def delimiter-set (set delimiters))
 
(s/def ::word-vector     (s/coll-of string?))
(s/def ::maybe-delimiter #(or (delimiter-set %)
                              (nil? %)))
(s/def ::merge-result    (s/tuple ::word-vector ::maybe-delimiter string?))
 
 
(=> merge-strings [::word-vector ::maybe-delimiter string? string?] ::merge-result
  "Given a vector of strings, merge strings beginning/ending with quotes into
  a single string and return a vector standalone words and quoted strings.
  Nested / unbalanced quotes will return undefined results."
  [[result delimiter merging] next]
 
  (let [start (first (seq next))
        end   (last (seq next))]
    (cond
      (and ((set delimiters) start)
           ((set delimiters) end))   [(conj result next) nil ""]
      ((set delimiters) start)       [result start next]
      ((set delimiters) end)         [(conj result (str merging " " next)) nil ""]
      (nil? delimiter)               [(conj result next) nil ""]
      :else                          [result delimiter (str merging " " next)])))
 
 
(=> delimited-words [string?] ::word-vector
  "Split a string into words, respecting single or double quoted substrings.
  Nested quotes are not supported.  Unbalanced quotes will return undefined
  results."
  [s]
  (let [words (str/split s #"\s")
        delimited-word-machine (reduce merge-strings [[] nil ""] words)
        merged-strings (first delimited-word-machine)
        remainder (last delimited-word-machine)
        delimiter (second delimited-word-machine)]
    (if (empty? remainder)
      merged-strings
      (conj merged-strings (str remainder delimiter)))))

With this code, it becomes easy to see that the result (destructured) parameter contains a word-vector, which must be a collection of strings. Similarly, without reading the function body, one can immediately note that the delimiter parameter is a character from the delimiter-set or nil.

And skipping to the end (of the line), one can immediately see that the entire function returns a merge-result, which is defined as a Tuple containing a word-vector, delimiter (or nil), and String–all without reading the function body.

I found that this bit of up-front information made (re)reading the body of the merge-strings function much easier, even after only a few days away from it.

Retrospective

With this in mind, I would like to offer the following thoughts about this experiment:

  • I felt the experiment was successful. I believe the code I wound up with explains the original author's intentions better than the original code.
  • Only time will validate the => macro, and I'm sure it will evolve over time. But I sincerely hope something like it makes it into Specs in the end.
  • More generally, I feel that this code illustrates how even quite straightforward functions can become opaque very quickly, and how providing explicit specifications describing what data a function accepts/provides can significantly enhance communication.

...and a word from our sponsors

In closing, I'm available for new gigs right now. If this kind of thinking and expertise is welcome on your team or on your project (whatever the language), feel free to email me using the address on the “Contacts” page off my home page.

2017/05/25 16:17 · djo

Introducing Clojure Foundation, Infrastructure, and Dev on GitHub

At Brad's Deals, I've spent the last year learning Clojure, using it to help rewrite their extract-transform-load data warehouse system on AWS, and have developed a few opinions about how to most effectively use it. In addition, we have several other production systems built using Clojure.

Recently, we have begun extracting common useful utilities from these systems into a set of core libraries on GitHub:

  • clj-foundation – Code to extend or enhance the Clojure environment itself.
  • clj-infrastructure – Infrastructure helpers: AWS, database, etc.
  • clj-dev – Libraries to enhance the development experience.

In future posts, I'll explore some things we have learned that we have shared in these libraries.

2016/09/17 15:58 · djo

Combining Eclipse Way with XP and Scrum -- a Case Study

In our previous installment, I gave Eclipse another shout out for their amazing perfect track record of shipping on time for 13 years. I then described how I had the privilege of working with a team that, by intentionally adopting practices from Eclipse Way into our XP/Scrum process, achieved a similar perfect track record over a period of four years, shipping every six weeks, with only a single emergency bug fix release.

Here I'd like to describe the main practices from Eclipse Way that we incorporated into our XP and Scrum process that I believed played a crucial role in our success. These practices are, in the order I will discuss them:

  1. The Eclipse Modularity Story
  2. 6-week Release Cadence
  3. Ship Each Milestone
  4. Strut your Successes

Let's look at each in turn.

The Eclipse Modularity Story

Eclipse is built by multiple distributed teams numbering nearly 700 concurrent engineers (as-of the 2014 release). In order to enable this many engineers to work together without constantly breaking each others' code, Eclipse designed a plug-in system into the architecture from the beginning.

A plug-in is a Java Jar that is divided into two kinds of APIs:

  • Internal APIs that are not for external usage
  • External APIs that can be used by anybody and will be supported as a “contract” forever.

Eclipse Platform commits to binary class level compatibility for all APIs that are marked External. This provides a strong contract that customers can rely on to ensure that code they write today will work tomorrow.

Steve Northover, the “father of SWT”, (the Eclipse GUI library) likes to say, “API is forever and @deprecated is a lie; somebody will always depend on your code”.

The article How to use the Eclipse API, while deprecated as-of this writing in my opinion still provides valuable perspective on API best practices. In addition, Eclipse's wiki contains several articles on Evolving Java-based APIs with detailed information for practicioners interested in building strong API guarantees into their products, even if they are not building on top of Eclipse or OSGi infrastructure.

API is a deep topic and worth much more than this brief introduction. If there is demand, I can write more about that.

In Eclipse, plug-ins run inside Eclipse's OSGi container. OSGi adds an additional benefit to Eclipse's plug-in and API story:

In addition to allowing plug-in jars to declare what packages are APIs, the Eclipse container allows plug-ins to declare their dependencies similar to the way Maven projects declare dependencies. And the container enforces these dependencies at run-time.

We obtained substantial benefit by using Eclipse's OSGi container. Non-Eclipse projects may wish to consider Apache Karaf as an alternative.

However, while we realized substantial gains by using an OSGi container, I believe that disciplined adoption of Eclipse's API conventions can realize significant modularity gains, even when implemented outside of an OSGi container.

Lastly, this blogger would encourage the Java community to evolve a solution that unifies the current dichotomy between Maven's dependency architecture and OSGi/P2's.

To summarize, one of our critical success factors was our adoption of Eclipse's philosophy of strong APIs with strong API guarantees. This enabled us to build our own product as a set of small modules that could be evolved independently. And it enabled us to evolve independently to our customers without fear of breaking them.

6-week Release Cadence

Scrum or XP projects typically utilize a two-week sprint / iteration as their basic unit of product delivery. Stories are planned at the beginning of the sprint and are demoed/delivered at the end. The team's velocity is measured over time and this measurement is used to determine how much work can be committed in subsequent sprints.

Eclipse Way is different here, but not in an incompatible manner. Eclipse uses 6 week iterations ended by a formal milestone release.

There are multiple ways one can combine Eclipse Way with Scrum/Agile. In our project, we simply overlaid the Eclipse Way 6-week release cadence over the top of our two-week sprints. This had several interesting effects.

  • Scrum stories could be defined at developer granularity, rather than as strict user stories. Here is the distinction:
    • User stories are a discrete bit of user-visible functionality
    • Developer granularity is a task that is a part of a larger “theme” or group of user-visible functions

In Eclipse Way, user stories only need to be delivered every six weeks–on the release schedule–as opposed to at the end of each sprint.

This encouraged us to group related user stories together as “themes” to be delivered in a single milestone. This improved our estimations by enabling us to subdivide user stories to a small enough granularity that they could easily be estimated. And this reduced end-of-sprint stress by eliminating the need to get the burn-down chart to absolute zero at the end of the sprint.

In addition, since the commitment we made to the customer was at the level of themes, (which are imprecise groupings of stories that deliver substantially on a particular set of features) we had flexibility about the exact amount of functionality that was delivered as a part of that theme in each release. Consequently, we found that delivering on themes rather than specific user stories tended to be less stressful and easier to negotiate with the customer.

In practice, it is often unnecessary to pin down the exact two-week period during which a specific user story will be delivered. A six-week window is usually more than enough precision for most customers, and many only require once-per-quarter precision.

Lastly, adopting a six-week release schedule does not preclude introducing traceability into one's process, where customer requirements can be traced directly through themes, stories, and tasks to lines of code.

Ship Each Milestone

Shipping software should be like a good airplane trip–boring and uneventful. However, all too often, shipping software is stressful and painful.

At Eclipse, shipping is a core value. Software has no value unless it is delivered to customers. So, in order to reduce and/or eliminate the pain of shipping, Eclipse practices shipping often.

At Eclipse, each six-week milestone build is verified and delivered as a full shipped product.

For the development team, this requires turning the “ship product crank” every six weeks. This has several benefits:

  • If a team ships often, they have a strong incentive to automate as much as possible about the entire process. This goes beyond automated unit and integration test suites to automating the release process itself to the greatest extent possible.
  • Successful shipping teams create release checklists and execute the checklist like a seasoned airline pilot readying an airplane for takeoff.
  • If a team ships often long enough, they will create and document contingency plans such that even catastrophic events like failure of crucial build servers can be taken in stride without jeopardizing the ship date.

As a team matures in its shipment process, shipping software can become truly the most dull part of the sprint. And to this engineer, that's a good thing. :)

But the benefits don't stop there. Shipping themes of user stories on a six-week schedule in a form the customer can immediately use (as opposed to the Chinese Water Torture method of shipping every two weeks) provides the customer with a coherent view of how multiple features fit together and helps develop the relationship with the customer.

The customer sees visible progress; this inspires confidence; this results in more progress.

None of this supersedes the Agile value of an on-site customer. Again, in our experience, it was valuable to layer this practice on top of the other Agile values and top adapt them all to work cohesively together.

Ship often. Make shipping as dull as a smooth airplane flight. Have happy developers and happy customers.

Strut Your Successes

Eclipse milestone builds are accompanied by a document labelled “New and Noteworthy” from all of the core Eclipse teams. This document is akin to the “sprint demo” in Scrum–an opportunity to strut what's new and exciting.

No matter the form this takes, we found it important to publicize our successes and tell our customers what clear benefits they would obtain by adopting the latest release.

In my experience, the six-week milestone release interval of Eclipse can be a much better granularity for producing New and Noteworthy documents and/or sprint demos. There simply is more that can be demoed. And since the new features are grouped together into themes, it is much easier to demo a theme of related functionality every six weeks (and to tell a compelling story around a theme) than around a few isolated user stories every two.

Conclusion

In this article, I have reviewed four key Eclipse Way practices that in my experience work very nicely in tandem with practices from Scrum and XP. In a previous job, these practices synergized into a process enabling our team to consistently deliver quality software with new valuable features, on time, every six weeks, without stress. I have delivered this as a case study because I believe that this success can be replicated, and in combination with best practices from XP and Scrum can result in high-value software delivered consistently on-time, at or under budget.

~~~DISQUS~~~

2014/11/18 22:03 · djo

Agile Success, the Eclipse Way

Eclipse's success has become boring. And that's a problem.

I can't think of another organization (or company) that has shipped successfully every year for 13 years using a world-wide distributed team of hundreds of developers–that has a perfect 13 year track record.

So let's learn from Eclipse. Can Eclipse's success be duplicated in a commercial environment? My experience is that the answer to this question can be a resounding “YES!”

Background Success

Several years ago I had the privilege of serving on the Eclipse Platform project for two years as we developed the Eclipse Databinding framework.

Immediately after this experience, I had the privilege of working with another organization on an Agile project. Drawing on Eclipse's success, we decided to intentionally adopt what we could from Eclipse's own processes.

This new commercial project experienced similar success to Eclipse itself. In five years of development, after a year of team building and process refinement we achieved four years of perfectly-scheduled production releases (delivered on a six-week schedule) with only a single emergency patch release over the lifetime of the project.

Lessons

How did Eclipse do it? How did we do it? What are the main lessons we learned from Eclipse that enabled us to achieve similar success?

In my next blog post I'll outline the main values we took from the Eclipse organization that I feel enabled us to replicate Eclipse's success.

~~DISQUS~~

~~LINKBACK~~

2014/11/16 20:59 · djo

Eclipse Way - Or how to never deliver late!

There is an agile software development organization and methodology that has produced an amazing string of successes, yet nobody seems to be talking about them:

  • Never shipped late in 13 years; has never had to make an emergency patch release of a production build
  • The 2014 Luna release's achievements included
    • 61 million lines of code 1)
    • 65 companies, composed of nearly 700 engineers distributed worldwide 2)
    • With a low stress process as required for voluntary participation

If a company had this kind of track record, they would be talking about it. Loudly! But this organization is Eclipse, an open-source project.

Why has Eclipse been so successful? And can we apply lessons learned from Eclipse's success to commercial projects?

I believe so. And with this article, I intend to start a series discussing my experiences, both positive and negative, applying Eclipse Way–because in one project in particular, I credit Eclipse Way and its lessons for playing a major role in five years of consistent on-time production releases, on a schedule sync'ed with Eclipse's milestone releases. And I'd like to invite anyone else interested in this topic to jump in and comment and share their insights too.

Let's have fun and learn together!

~~DISQUS~~

~~LINKBACK~~

2014/11/08 21:08 · djo

Distributed Pair Programming

Distributed pair programming has come a long way since I first tried it ten years ago. Even back then, it could work surprisingly well. Now, there are a huge variety of choices for how to do it.

This afternoon I had a need to do it and here's what we did:

  1. Created a Google Hangout that we used to share screens.
  2. Dialed each other; one of us had a mobile phone + headset, the other was on a VOIP system.
  3. Went to work!

It worked surprisingly well!

If you try this, here are a few things to be wary of:

  • Background noise coupled with headsets that don't do noise cancellation.
  • Any generally distracting environment.

For reference, here is a modern list of cross-platform ways to share screens. The old standby, VNC, still works. But these days there are almost always better alternatives.

http://askubuntu.com/questions/335158/share-desktop-via-web-browser

~~DISQUS~~

~~LINKBACK~~

2014/10/27 22:17 · djo

Eclipse Sirius

For most of the past year I've been working quite a bit with Twitter Storm parallel streaming data clusters on Amazon Web Services.

One of Storm's weaknesses is the lack of a graphical editor for Storm Topologies…so I built one using Eclipse Sirius in literally a couple of days.

So I'd like to tip my hat to the Sirius team for an amazing tool!

~~LINKBACK~~

~~DISQUS~~

2014/10/25 13:02 · djo

Older entries >>

start.txt · Last modified: 2014/10/20 15:40 (external edit)