Generating protobuf formats with scala.meta macros

Today’s focus is on scalameta. In this introduction post we’re going to see how to create a macro annotation to generate protobuf formats for case classes.

The idea is to be able to serialise any case classes to protobuf just by adding a @PBSerializable annotation to the case class declaration.

Then behind the scene the macro will generate implicit formats in the companion object. These implicit formats can then be used to serialise the case class to/from protobuf binary format.

This is quite similar to Json formats of play-json.

In this post we’re going to cover the main principles of scalameta and how to apply them to create our own macros.

Scala.meta

Setup

Getting started with scalameta is quite straightforward. You only need to add a dependency in your build.sbt:

libraryDependencies += "org.scalameta" %% "scalameta" % "1.7.0"

Then in your code all you have to do is

import scala.meta._

Macro setup

The setup to write a macro is slightly more involved. First you need to separate repos as it’s not possible to use the macros annotations in the same project where they are defined. The reason is that the macros annotations must be compiled before they can be used.

Once compiled you don’t even need a dependency to scalameta to use your macros annotations, you only need a dependency to the project that declares the annotations.

The setup for the macros definition project is slightly more complex as you need to enable the macroparadise plugin but it’s just a single line to add to your build.sbt.

addCompilerPlugin("org.scalameta" % "paradise" % "3.0.0-M8" cross CrossVersion.full)

Of course you can use sbt subprojects to create one subproject for the macro definition and one subproject for the application that uses the macros annotations.

lazy val metaMacroSettings: Seq[Def.Setting[_]] = Seq(
  addCompilerPlugin("org.scalameta" % "paradise" % "3.0.0-M8" cross CrossVersion.full),
  scalacOptions += "-Xplugin-require:macroparadise",
  scalacOptions in (Compile, console) := Seq(), // macroparadise plugin doesn't work in repl yet.
  sources in (Compile, doc) := Nil // macroparadise doesn't work with scaladoc yet.
)

lazy val macros = project.settings(
  metaMacroSettings,
  name := "pbmeta",
  libraryDependencies += "org.scalameta" %% "scalameta" % "1.7.0"
)

lazy val app = project.settings(
  metaMacroSettings,
  libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" % Test
).dependsOn(macros)

Parsing

At the heart of scalameta is a high-fidelity parser. The scalameta parser is able to parse scala code capturing all the context (comments, word position, …) hence the high-fidelity.

It’s easy to try out:

scala> import scala.meta._

scala> "val number = 3".parse[Stat]
res1: scala.meta.parsers.Parsed[scala.meta.Stat] = val number = 3

scala> "Map[String, Int]".parse[Type]
res2: scala.meta.parsers.Parsed[scala.meta.Type] = Map[String, Int]

scala> "number + 2".parse[Term]
res3: scala.meta.parsers.Parsed[scala.meta.Term] = number + 2

scala> "case class MyInt(i: Int /* it's an Int */)".parse[Stat]
res4: scala.meta.parsers.Parsed[scala.meta.Stat] = case class MyInt(i: Int /* it's an Int */)

Tokens

As you can see the parser captures all the details (including the comments). It’s easy to get the captured tokens:

scala> res4.get.tokens
res5: scala.meta.tokens.Tokens = Tokens(, case,  , class,  , MyInt, (, i, :,  , Int,  , /* it's an Int */, ), )

Scalameta also captures the position of each token.

Trees

The structure is captured as a tree.

scala> res4.get.children
res6: scala.collection.immutable.Seq[scala.meta.Tree] = List(case, MyInt, def this(i: Int /* it's an Int */), )

scala> res6(2).children
res7: scala.collection.immutable.Seq[scala.meta.Tree] = List(, i: Int)

Transform

This is nice but it’s not getting us anywhere. It’s great to capture all these details but we need to transform the tokens in order to generate some code. This is where the transform method comes in.

scala> "val number = 3".parse[Stat].get.transform {
     |   case q"val $name = $expr" =>
     |     val newName = Term.Name(name.syntax + "Renamed")
     |     q"val ${Pat.Var.Term(newName)} = $expr"
     | }
res8: scala.meta.Tree = val numberRenamed = 3

Quasiquotes

Here we have transformed a Tree into another Tree but instead of manipulating the Tree directly (which is possible as well) we have use quasiquotes to both deconstruct the existing Tree in the pattern match and construct a new Tree as a result.

Quasiquote makes it much more convenient to manipulate Trees. The difficulty (especially at the beginning) is too get familiar with all the scalameta ASTs. Fortunately there is a very useful cheat sheet that summarises them all.

Macros

With all this knowledge we’re now ready to enter the world of metaprogramming and write our first macro. Writing a macros is quite similar to the transformation we did above.

In fact only the declaration changes but the principle remains: we pattern match on the parsed tree using quasiquotes, apply some transformation and return a modified tree.

import scala.collection.immutable.Seq
import scala.meta._

class Hello extends scala.annotation.StaticAnnotation {
  inline def apply(defn: Any): Any = meta {
    defn match {
      case cls@Defn.Class(_, _, _, ctor, template) =>
        val hello = q"""def hello: Unit = println("Hello")"""
        val stats = hello +: template.stats.getOrElse(Nil)
        cls.copy(templ = template.copy(stats = Some(stats)))
    }
  }
}

Here we just create an @Hello annotation to add a method hello (that prints "Hello" to the standard output) to a case class.

We can use it like this:

@Hello case class Greetings()
val greet = Greetings
greet.hello // prints "Hello"

Congratulations! If you understand this, you understand scalameta macros. You can head over to the scalameta tutorial for additional examples.

PBMeta

Now that you understand scalameta macros we are reading to discuss the PBMeta implementation as it is built on these concepts.

It defines an annotation @PBSerializable to add implicit PBReads and PBWrites into the companion object of the case class.

The pattern match is used to detect if the companion objects already exists or if we have to create it. The third case is for handling Scala enums.

defn match {
  case Term.Block(Seq(cls@Defn.Class(_, name, _, ctor, _), companion: Defn.Object)) =>
    // companion object exists
    ...
  case cls@Defn.Class(_, name, _, ctor, _) =>
    // companion object doesn't exist
    ...
  case obj@Defn.Object(_, name, template) if template.parents.map(_.syntax).contains("Enumeration()") =>
    // Scala enumeration
    ...
}

Note how we check that the object extends Enumeration. We don’t have all the type information available at compile time (there is no typer phase run as part of the macro generation – that’s why scalameta is quite fast). As we don’t have the whole type hierarchy available the only check we can do is if the object extends Enumeration directly. (If it does indirectly we’re not going to catch it! – probably something we can do with the semantic API).

All the remaining code is here to generate the PBReads and PBWrites instances.

PBWrites

PBWrites trait defines 2 methods:

  • write(a: A, to: CodedOutputStream, at: Option[Int]): Unit writes the given object a to the specified output stream to at index at. The index is optional and is used to compute the tag (if any).
  • sizeOf(a: A, at: Option[Int]): Int computes the size (number of bytes) needed to encode the object a. If an index at is specified the associated tag size is also added into the result.

Quasiquotes are used to generate these methods:

q"""
  implicit val pbWrites: pbmeta.PBWrites[$name] =
    new pbmeta.PBWrites[$name] {
      override def write(a: $name, to: com.google.protobuf.CodedOutputStream, at: Option[Int]): Unit = {
        at.foreach { i =>
          to.writeTag(i, com.google.protobuf.WireFormat.WIRETYPE_LENGTH_DELIMITED)
          to.writeUInt32NoTag(sizeOf(a))
        }
        ..${params.zipWithIndex.map(writeField)}
      }
      override def sizeOf(a: $name, at: Option[Int]): Int = {
        val sizes: Seq[Int] = Seq(..${params.zipWithIndex.map(sizeField)})
        sizes.reduceOption(_+_).getOrElse(0) +
        at.map(com.google.protobuf.CodedOutputStream.computeTagSize).getOrElse(0)
      }
    }
 """

In case you’re wondering what the ..$ syntax is, it’s just how to deal with sequences in quasiquotes.
Here a create a collection of Term.Apply to write each field into the CodedOutputStream. The ..$ syntax allows us to directly insert the whole sequence into the quasiquote.

(Similarly there is a ...$ syntax to deal with sequences of sequences).

PBReads

PBReads instances are generated in a similar way. The idea is to generate code that will extract field values from the CodedInputStream and create a new instance of the object with the extracted field at the end.

val fields: Seq[Defn.Var] = ctor.paramss.head.map(declareField)
val cases: Seq[Case] = ctor.paramss.head.zipWithIndex.map(readField)
val args = ctor.paramss.head.map(extractField)
val constructor = Ctor.Ref.Name(name.value)
q"""
  implicit val pbReads: pbmeta.PBReads[$name] =
    new pbmeta.PBReads[$name] {
      override def read(from: com.google.protobuf.CodedInputStream): $name = {
        var done = false
        ..$fields
        while (!done) {
          from.readTag match {
            case 0 => done = true
            ..case $cases
            case tag => from.skipField(tag)
          }
        }
        new $constructor(..$args)
      }
    }
"""

IDE Support and debugging

In theory macros extension are supported in IntelliJ Idea. From what I experienced while developing PBMeta it works great with simple cases (e.g. adding a method to an existing case class) and it’s great as it allows you to expand to annotated class and see the generated code. Of course it’s great to debug and see what code is actually executed.

However it fails in more complex situations (e.g. creating a companion object):

In this case you’re left with inserting debug statements (i.e. println) in the generated code. It’s simple and powerful but don’t forget to clean them up when debugging is over.

Conclusion

Scalameta is an amazing tool, it makes meta-programming easy and enjoyable. However there are some shortcomings you need to be aware of:

  • You need to get familiar with all the quasi quote paraphernalia. There are many different terms but once you start to know them things got much easier. Plus you can try things out in the console.
  • IDE support is great … when it works. When it doesn’t debugging isn’t easy and you’re left with generating println statement in your code. Not ideal!
  • Scalameta doesn’t provide all the type analysis performed by the compiler. Yet we can do amazing things with the available information. Plus it’s fast (no heavy type inference needed)!

I used PBMeta as an introduction to Scalameta and without any knowledge I managed to build all the functionality I wanted. I even managed to add custom field position with the @Pos annotation. The only thing I missed is the support for sealed trait mapping to protobuf oneOf structure.

For more details you can head over to PBMeta, try it out and let me know what you think in the comments below.