Save Effort: Build a Bash One-Liner

Luke Worth profile image Luke Worth 2024-08-22

Preliminary disclaimer: the author firmly believes that an operating system makes for a perfectly good IDE; he won’t be taking questions about this.

The Bellroy tech team has a longstanding goal to eliminate our Ruby codebase1. To help us reach that goal, the whole team can see a graph of LOC (Lines Of Code) currently stored in git for each programming language we use, plotted over time. This is updated once per week, and watching the Ruby line go downward encourages us to reduce the amount of Ruby we add each week, and (more importantly) to increase the amount we delete. However, like most metrics, it doesn’t quite directly measure progress toward our goal: if someone splits a complicated line of code into multiple lines the code becomes easier to understand, and thus easier to remove, but the total LOC moves away from zero. This problem was brought to the forefront recently when we instated automatic formatting of Ruby code using the syntax_tree tool (which, like all our other formatters, has almost no configuration and so avoids bike-shedding about code layout). If this tool can’t fit a single Ruby statement into one line it will usually break it into many lines: when we run it on every Ruby file of the bellroy.com codebase we gain about 6,700 lines of Ruby, which looks like we are adding Ruby code, which makes us sad.

To deal with this problem I looked for an alternative metric that would be easy to apply to our codebase, and robust to re-formatting. I quickly discovered the ABC Software Metric which is calculated from the number of Assignments, Branches, and Conditionals in the code, and which looked like a much better proxy for how much functionality lives in Ruby. And it turns out that RuboCop contains an implementation of this metric! But RuboCop is designed only to alert you when that metric is exceeded - at present it provides no easy way to display the raw ABC size.

At this point, the obvious thought crossed my mind: perhaps I could fork RuboCop, add the functionality I need, and submit a PR (because I am a good person). But another thought also crossed my mind: that sounds like hard work that I’d prefer to avoid. And of course, because I use UNIX2 (the world’s best IDE), wouldn’t it be easier to use Rubocop as-is and use a bunch of duct tape and WD-403 to get the outcome I want? The short answer is “yes, much easier.” Let’s work through this together!

First we must ask the most basic question: is the raw metric exposed by RuboCop at all? A quick search reveals what seems to be an error message format string, referencing something promisingly called complexity. Let’s see this error message in action:

$ nix run nixpkgs#rubocop -- --only Metrics/AbcSize config/application.rb
Inspecting 1 file
.

1 file inspected, no offenses detected

Ah, the documentation says default Max value is 17. Let’s see what happens if we set it to 0:

$ cat >.rubocop.yml <<<'Metrics/AbcSize: {Max: 0}'
$ nix run nixpkgs#rubocop -- --no-display-cop-names --only Metrics/AbcSize config/application.rb
...snip...
Offenses:

...snip...
config/application.rb:59:3: C: Assignment Branch Condition size for settings is too high. [<2, 4, 1> 4.58/0]
  def self.settings ...
  ^^^^^^^^^^^^^^^^^
...snip...

There it is! 4.58 is indeed greater than 0. RuboCop displays one of these messages for each method, and now the end is in sight. Now all we have to do is run this for every file in the codebase, extract all the complexity numbers, and sum them up.

POSIX comes with a number of raw text processing tools but, no matter how powerful they are, it is simply easier to deal with structured data. I’m talking about JSON, a data format that began life as a quick way to transmit data to and from web browsers, which is now the de-facto standard for encoding structured data in plaintext; and RuboCop knows how to produce it.

$ nix run nixpkgs#rubocop -- --no-display-cop-names --only Metrics/AbcSize --format json config/application.rb
{"metadata":{"rubocop_version":"1.62.1","ruby_engine":"ruby","ruby_version":"3.1.6","ruby_patchlevel":"260","ruby_platform":"arm64-darwin23"},"files":[{"path":"config/application.rb","offenses":[..snip..,{"severity":"convention","message":"Assignment Branch Condition size for settings is too high. [<2, 4, 1> 4.58/0]","cop_name":"Metrics/AbcSize","corrected":false,"correctable":false,"location":{"start_line":59,"start_column":3,"last_line":61,"last_column":5,"length":114,"line":59,"column":3}},..snip..]}],..snip..}

Looks worse to you and I, but for jq it is a delectable treat. Let’s pull out the bit we care about:

$ nix run nixpkgs#rubocop -- --no-display-cop-names --only Metrics/AbcSize --format json config/application.rb | jq -r '.files[].offenses[].message'
..snip..
Assignment Branch Condition size for settings is too high. [<2, 4, 1> 4.58/0]
..snip..

Getting closer, but it’s time for a short Q&A:

  • Q: What the heck is nix run nixpkgs#rubocop -- and why don’t you use bundle exec rubocop like a normal person?

    A: nix is awesome. We use it all the time, on Linux, macOS, and Windows. Invoking RuboCop this way means I don’t have to install it, nor add it to the Gemfile, nor modify the code at all. Anyone can run this command on any Rails codebase and it’ll just work.

  • Q: What was that <<< thing before?

    A: That is a here string. In Bash, this basically means “send the adjoining text to standard input”. Bash is full of hidden gems. I recommend periodically reading a random section of the reference manual.

  • Q: Isn’t that command getting long?

    A: Yes, it is. That doesn’t matter because nobody else is going to see this (except in my case, where I’m publishing it onto the internet). We want to move quickly, not produce the “best” code.

  • Q: How good is jq?

    A: I know, right?!

Unfortunately, for our ABC size calculator, we have eliminated all of the available JSON so jq is no longer useful; but some folks predicted this problem in the 1970s, and gave us AWK. Let’s try to pick out just the complexity number (which is the 4.58 we keep seeing):

$ MESSAGE='Assignment Branch Condition size for settings is too high. [<2, 4, 1> 4.58/0]'
$ awk '{print $13}' <<<"$MESSAGE"
4.58/0]

And let’s get rid of that /0]:

$ awk '{gsub(/\/0]/, "", $13); print $13}' <<<"$MESSAGE"
4.58

Putting it all together:

$ nix run nixpkgs#rubocop -- --no-display-cop-names --only Metrics/AbcSize --format json config/application.rb | jq -r '.files[].offenses[].message' | awk '{gsub(/\/0]/, "", $13); print $13}'
..snip..
6.16
3.74
4.58
3
3.74

That’s a list of ABC sizes of all methods in config/application.rb! We can use AWK to sum them up:

$ nix run nixpkgs#rubocop -- --no-display-cop-names --only Metrics/AbcSize --format json config/application.rb | jq -r '.files[].offenses[].message' | awk '{gsub(/\/0]/, "", $13); sum += $13} END {print sum}'
21.22

“Do it on the whole codebase,” you yell. I yell back, “OK, give me a minute” and run:

$ find . -name '*.rb'
..big list of every .rb file including vendored gems and other junk..

This is not quite what you wanted, and a bit unprincipled. We don’t want the files on the filesystem, we want the ones in the codebase. The canonical list of codebase files lives in Git, so let’s ask Git instead:

$ git ls-files | grep '\.rb$'
..exactly the list of Ruby files in our codebase..

Good. Let’s plumb this into our “program”:

$ nix run nixpkgs#rubocop -- --no-display-cop-names --only Metrics/AbcSize --format json $(git ls-files | grep '\.rb$') | jq -r '.files[].offenses[].message' | awk '{gsub(/\/0]/, "", $13); sum += $13} END {print sum}'
9910.94

Voilà! The ABC size of bellroy.com’s Rails backend4.

I want to share this one-liner so that anyone can run it in any Ruby codebase, but it still depends on the .rubocop.yml we deposited near the beginning. One way around this is to simply include the cat step we did earlier, but that would overwrite any existing .rubocop.yml, and would leave junk behind. Let us reach into our bag of fancy Bash features that everyone should know, and pull out process substitution. If you pass --config some_file.yml to RuboCop, it will read its configuration from some_file.yml instead of .rubocop.yml. If you pass --config <(echo 'Metrics/AbcSize: {Max: 0}'), RuboCop will read its configuration from a temporary named pipe which appears to contain the desired configuration. That lets us distribute a truly stand-alone tool that can run anywhere and leaves no trace, and we didn’t need to write any Ruby to do it. Here’s the final version, formatted for readability:

nix run nixpkgs#rubocop -- \
        --config <(echo 'Metrics/AbcSize: {Max: 0}') \
        --no-display-cop-names \
        --only Metrics/AbcSize \
        --format json \
        $(git ls-files | grep '\.rb$') \
        | jq -r '.files[].offenses[].message' \
        | awk '{gsub(/\/0]/, "", $13); sum += $13} END {print sum}'

  1. We’ve written about this before:

    ↩︎
  2. I use macOS but anything vaguely POSIX-compatible should work.↩︎

  3. Clearly this is a metaphor. In this post I’ll be using nix, bash, jq, and gawk, but if you want to follow along with actual duct tape and WD-40, be my guest.↩︎

  4. Two weeks ago this number was 11574.6!↩︎