Making sense of SBT

SBT is probably the most popular (as in “most used”) build tool for Scala yet people (including me) experienced a hard time figuring out why it doesn’t perform as they expect.

Originally known as the “Simple Build Tool” it has later been renamed to “Scala Build Tool” (maybe to acknowledge the fact that it was not so simple?)

Part of the reason is that it’s not really intuitive. True, there is documentation available but how many developers don’t bother reading it (at least until the effort overcomes the pain of using SBT) plus it’s much easier to copy/paste code snippets without completely understanding what they do.

That being said SBT is not magic, so let’s try to understand how it works so that next time we’re not reduced to copy/paste cryptic code snippets until it works.

A build DSL

If you look at a simple build.sbt file it may look something like this:

organization := "my.company"
name := "demo"

licenses += "Apache-2.0" -> url("http://www.apache.org/licenses/LICENSE-2.0")

libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.0-RC1"

At first glance it looks similar to a Maven POM file but simpler as it doesn’t use the more verbose XML syntax.
We can find similar things like

  • declaration of the project name and organisation
  • declaration of the dependencies

So if it’s not XML what language is it? Well, this is plain Scala code. In fact this is actually a Scala DSL for describing a build and, of course, as it’s Scala code it has to be compiled before you can build your project

This brings up to the lifecycle of the build:

  • Compile the build project
  • Execute the build project to create the build definition
  • Interpret the build definition to create a task graph for the build
  • Execute the task graph to actually build your project

Let’s get back to each of them in more details:

Compiling the build project

This is where the build.sbt file is compiled (Remember that although it looks like a declarative language this is actually plain Scala code and therefore has to be compiled).

There is a special project (called project) which can also contain some Scala code. project code is part of the build itself and not of your project code (It starts to get confusing, doesn’t it?).

This is typically in this folder that you add the plugins needed for the build. You can also declare your dependencies in this folder or any custom tasks (more on that later).

project is itself recursive. You can nest a project folder inside project and add some code (E.g. dependencies declaration) in it. This will be compiled before compiling the parent project.

Basically the project is a build inside your build that knows how to build your build. And it’s recursive so project/project is a build that knows how to build the build of your build (as explained here).

A build compilation failure (not the project compilation) will prevent SBT to start with this kind of error:

[error] (*:update) sbt.ResolveException: unresolved dependency: xxx#yyy;x.y.z: not found
Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore?

In this case it indicates a problem with the dependencies of the build itself (and failed because it couldn’t compile the build project itself).

This shouldn’t be mistaken with a problem with your project dependencies. A project dependencies issues won’t prevent SBT to start. Here you should have a look of dependencies of the build itself (e.g. check the plugin declarations).

The difference between the build.sbt and a scala file is that build.sbt already import the following

import sbt._
import Process._
import Keys._

This allows you to start defining your build straight away. When using a scala file you need to explicitly import them if you need.

Execute the build project to create the build definition

I believe this is the most confusing stage as this phase just doesn’t exists in the Maven world. In Maven it’s simple you parse the XML to create the build definition and then you execute the build.

Here it’s different we need to execute the Scala code in the build to create the build definition and only then we can run the build itself.

Moreover setting up the build is done in 2 steps:

  • Compile the scala code to create the build definition (a set of projects containing some settings)
  • Interpret the build definition to create the “build execution plan” (a task graph that models dependencies between tasks)

Only then the task graph can be executed to run the build.

But let’s get back to what the build definition is. Simply put the build definition is just a list list of key-value pairs. And as SBT support multi-projects (a.k.a. submodules) these key-value pairs are grouped by project.

lazy val root = (project in file("."))
  .settings(
    name := "demo",
    scalaVersion := "2.12.4"
  )

If you have a single project there is no need to declare the root project you can place your key-value pairs directly in the build.sbt.

These key-value pairs are known as Setting. Setting keys are typed and can there are 3 different types:

  • SettingKey[T] evaluated once when SBT starts (or with reload command)
  • TaskKey[T] is evaluated every time a command is run
  • InputKey[T] is for tasks taking arguments (e.g. testOnly *.SomeSpec)

The value itself is known as the task body and needs to return a value of type T (the same type as declared in the key). The body can contain any Scala code.

Of course there are many predefined keys but it’s also possible to define your own.

You can find more information on this phase over here.

Interpret the build definition to create a task graph for the build

We’ve seen that SBT provides a DSL to defined tasks (Setting can be considered as a Task that runs only once).

This makes SBT a Task engine but in order to run the tasks correctly (i.e. in the right order) SBT must analyse the dependencies between tasks.

Let’s take an example. Imagine a task “buildInfo” that generates Scala code (a Scala object) with some information about the build. (This is a very basic example – there is a real plugin for that: https://github.com/sbt/sbt-buildinfo)

version := "1.0.3"

// define the task key
lazy val buildInfo = taskKey[Seq[File]]("Generates basic build information")

// define the task body
buildInfo := {
  val f = sourceManaged.value / "BuildInfo.scala"
  val v = version.value
  val i = java.time.Instant.now()
  IO.write(f, 
    s"""
      |import java.time.Instant
      |
      |object BuildInfo {
      |   val version: String = "$v"
      |   val time: Instant = Instant.ofEpochMilli(${i.toEpochMilli}L)
      |}
     """.stripMargin
  )
  // returns a Seq[File] as declared in the key
  f :: Nil
}

// add the task to the list of source generators
sourceGenerators in Compile += buildInfo

Inside the task body there are 2 variables f and v that actually depends on other tasks (or settings). You can see that to retrieve the values for the version and sourceManaged folder we couldn’t use them directly (because they are tasks and not values) so we have to call .value to retrieve the value of the task.

This .value is the trick used by SBT to build the dependency graph. Behind the scenes it triggers a macro that allows SBT to lift the dependencies outside of the task body.

You can query the dependency graph by using the inspect tree command.

sbt:demo> inspect tree buildInfo
[info] *:buildInfo = Task[scala.collection.Seq[java.io.File]]
[info]   +-*:sourceManaged = target/scala-2.12/src_managed
[info]   | +-*:crossTarget = target/scala-2.12
[info]   |   +-*/*:crossPaths = true
[info]   |   +-*:pluginCrossBuild::sbtBinaryVersion = 1.0
[info]   |   | +-*/*:pluginCrossBuild::sbtVersion = 1.0.3
[info]   |   |
[info]   |   +-*/*:sbtPlugin = false
[info]   |   +-*/*:scalaBinaryVersion = 2.12
[info]   |   +-*:target = target
[info]   |     +-*:baseDirectory =
[info]   |       +-*:thisProject = Project(id tmp, base: /private/tmp, configurations: List(compile, runtime, test, provided, optional), plugins: List(<none>), autoPlugins: List(sbt.plugins.CorePlugin,..
[info]   |
[info]   +-*:version = 1.0.3
[info]

We can see that buildInfo depends on sourceManaged and version.

The official documentation is here.

Execute the task graph to actually build the project

SBT is now ready to execute a task by running as many dependent tasks in parallel as it could (according to the graph). If task C depends on tasks A and B (and A and B don’t depend on each other) then SBT can run A and B in parallel and then C.

So far we’ve covered quite a lot of grounds. We’ve learned the lifecycle of a build, how to define tasks or settings and how to check the dependencies of a task.

By now we should start to feel more confortable working with SBT but there is still something quite puzzling that we need to understand to leverage all the power of SBT: scopes.

Scopes

So far we’ve seen that a setting or task key is always linked to a single value or task body. In fact it’s not entirely true. A key also depends on a context.

E.g. The project name (key name) is different inside every sub-project. So can be the scala version (scalaVersion), …

SBT calls these context dependencies: Scope axes. There are 3 different axes needed to fully qualified a task:

  • The project axis
  • The configuration axis
  • The task axis

The project axis

This is the axis we used as an example. You can obviously use different values (tasks) across sub-projects so the project is one of the axis.

The configuration axis

Similarly inside a project the sources are different when you compile the project or when you compile the tests. The sources task depends on the “Configuration” (Probably not the most suitable name but this is the terminology used in SBT).

You can think of the configuration axis as something similar to the maven scope (Remember when we specify that a dependency is already provided or only used in test).

In addition there is also a relation between the configurations. E.g. the Test configuration extends the Runtime configuration which extends the Compile configuration. (It kind of makes sense because when you run the tests you want everything available at runtime plus the things needed for testing).

The task axis

The last axis is the task axis. This is when the value depends on the currently running task.

E.g. there are 3 packaging tasks: packageBin, packageSrc, PackageDoc. They all depends on packageOptions but we can have different values for packageOptions for any of these 3 tasks. This is done by using the task axis.

Specifying scopes

When you create a task (or setting) in your build it is always scoped even when you don’t specify any axes. In this case SBT uses the following scopes by default:

  • the project axis is set to the current project
  • the configuration axis is set to Global
  • the task axis is set to Global

But what is this Global scope? Well, you can think of it as the default or fallback scope. If there is no value defined for the current scope SBT will try to find a value using a more general scope until it reaches the Global scope.

This brings up to the scope resolution topic. Remember that we have 3 different scopes available: project, configuration and tasks. In theory you can combine any possible values along these 3 axes to define a task. Think of it like a cube or 3d-matrix. That’s a log of possible values but in practice many of them don’t make any sense at all. So there is no need to define a task body for these “impossible” combinations.

Then there are several scopes that actually need to use the same task body or value. In this case you want to define the task only once. Typically for the most generic scope and having the more specific scopes fall back to it when SBT tries to resolve the task scope.

This is how the scope resolution (or scope delegation) works:

  • On the task axis: if the task is undefined under the current task fallback to Global
  • On the configuration axis: if the task is undefined under the current configuration, tries the parent configuration (if no parents fallback to Global).
  • On the project axis: if the task is undefined under the current project, tries the special scope ThisBuild, then the Global scope.
  • If several scopes are resolved the project axis takes precedence over the configuration axis over the task axis

More information and examples can found in the official documentation.

Something that might look like a recursive task is actually not recursive but delegating to a more generic scope:

lazy val lines = settingKey[List[String]]("Demonstrate scope delegation")
// the initial list
lines in (Global) := "line in scope (*,*,*)" :: Nil
// prepend to the initial list
lines := "line in scope(ThisBuild,*,*)" :: lines.value

If you run this in SBT you’ll get:

sbt> lines
[info] * line in scope(ThisBuild,*,*)
[info] * line in scope (*,*,*)

lines.value on the last line resolved to the lines defined in the global scope (*,*,*) (* means Global and {.} means ThisBuild).

As you can see resolving a task might be rather tricky. Fortunately SBT provide some help with the inspect command

sbt:tmp> inspect lines
[info] Setting: scala.collection.immutable.List[java.lang.String] = List(line in scope(ThisBuild,*,*), line in scope (*,*,*))
[info] Description:
[info] 	Demonstrate scope delegation
[info] Provided by:
[info] 	{file:/tmp/}tmp/*:lines
[info] Defined at:
[info] 	/tmp/build.sbt:3
[info] Delegates:
[info] 	*:lines
[info] 	{.}/*:lines
[info] 	*/*:lines
[info] Related:
[info] 	*/*:lines

You can invoke a specifically scoped task or setting using SBT command line by specifying the scope as follow:

sbt> project/Configuration:Task::command

Of course you don’t have a complete scope, it’s possible to omit any of the scope axes.

Chaining tasks

So far we’ve seen that we can use the value returned by a task into another task. It works fine but it might be hard to manage. Consider the case where you have a task that depends on 2 other tasks.

lazy val A = taskKey[Unit]("Task A")
A in Global := { println("Task A") }

lazy val B = taskKey[Unit]("Task B")
B in Global := { println("Task B") }

lazy val C = taskKey[Unit]("Task C")
C := {
  A.value
  B.value
}

When you run task C, tasks A and B run too (because C depends on both of them). However A and B don’t depend on each other so SBT runs them in parallel. There is nothing wrong with it but it might not be what you want.

Sequential tasks

Let’s change the definition of A to make it fail.

lazy val A = taskKey[Unit]("Task A")
A in Global := {
  println("Task A")
  throw new Exception("Oh no!")
}

You may think that B runs only when A succeed but it’s not the case. For that we need to run the tasks in sequence. This is done by using Def.sequential.

C := Def.sequential(A, B).value

Rewiring with dynamic task

Dynamic task (Def.taskDyn) can also be used to chain tasks together. It allows to return a task instead of a value.

C := (Def.taskDyn {
  val a = A.value
  Def.task {
    B.value
    a
  }
}).value

We execute task A and then a task that executes B and return the result from A. What’s cool is that since we return the result from A we can rewire A directly:

A := (Def.taskDyn {
  val a = A.value
  Def.task {
    b.value
    a
  }
}).value

For a more concrete example you can check the documentation where compile is rewired to run both compile and scalastyle (and still returning the result of compile).

Commands

If you find yourself having to type too many sbt commands to perform only one operation you probably want to group them together using Command.

Imagine that you’re using both scalastyle and scalafmt on your project and you want to run them across all configurations (compile, test, it). That’s 6 commands to type in. You can easily group them together:

commands += Command.command("validate") { state =>
  "compile:scalastyle" ::
    "test:scalastyle" ::
    "it:scalastyle" ::
    "compile:scalafmt::test" ::
    "test:scalafmt::test" ::
    "it:scalafmt::test" ::
    "sbt:scalafmt::test" ::
    state
}

Conclusion

This pretty long post on SBT should have cover enough to make you understand most of SBT’s cryptic syntax. Hopefully you should now be ready to understand most of SBT files and no longer afraid to try and experiment for yourself.

SBT plugins are obviously missing but there’re nothing more than some code to create additional settings on your projects (and you should now be able to understand that too).