Refined types, what are they good for?

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Share on Reddit

Type refinement is all about making the types more precise. But why would do that? Because using the correct types makes your program safer as you reduce the possibility to introduce bugs.

First let’s think about types. For instance String is a type we use all the time. A variable of type String can have many different values (in theory an infinity of values) but it’s quite unlikely that all these values make sense in our application.

Domain model

Be more precise with strings

Consider the case where we need to tag an asset with a version.

def tag(asset: String, version: String) = ???
// and then
tag("image.png", "v0.8")

We could also have written:

tag("v0.8", "image.png")

which introduces a subtle bug that would only cause exceptions at runtime (or worse corrupt the data).

One solution is to wrap those values into case classes:

case class Version(value: String)
case class Asset(name: String)

def tag(asset: Asset, version: Version) = ???

Now it’s no longer possible to switch the arguments. However it’s still possible to build a wrong version:

val version = Version("image.png")

To solve the problem we can add validation using smart constructors

sealed abstract case class Version(value: String)
object Version {
  val version = raw"v\d+(\.\d+(\.\d+)?)?".r
  def fromString(value: String): Option[String] =
    version match {
      case version(_*) => Some(new Version(value) {})
      case _ => None
    }
}

That’s much better we can now only construct valid Version objects. But as not all strings map to a valid version number our function can’t map any strings to a version number. This is indicated by the Option in the return type.

In fact only a subset of strings are valid version numbers so how about having a string sub-type that says String Refined MatchesRegex[W.`"""v\\d+(\\.\\d+(\\.\\d+)?)?"""`.T]? Well the Scala Refined is a library that allows to write just that!

Limiting the number values

Same thing applies to numbers. There’s an infinity of number values (in theory) but not all values make sense. For instance to index an array we need an integer not a decimal. In this case we don’t use the Number type but the Integer (or Int) type. It’s more precise and avoids the possibility that someone will use meaningless values (e.g. 2.54). But still not all integers are valid here. Only positive values should be allowed (unless negative values are used to index from the end of the array). Let’s consider what happens with this code:

val arr: Array[Int] = Array(4,3,5,7,2,1)
arr(-3)

Exactly we get an error (an IndexOutOfBoundsException) when we execute the program. It’s a runtime error. The compilation was successful although the error could have been caught if the compiler knew the Int should be positive.

Now imagine that we have a type PositiveInt that represents all the integers values from 0: 0, 1, 2, 3, … Something like

type PositiveInt = Int Refined Positive

We could then have defined our method like this:

def elemAt[A](arr: Array[A], index: PositiveInt): A =
  arr(index)

and now it is no longer possible to call elemAt(arr, -2). This error fails at compile time.

Refined allows us to define such types.

import eu.timepit.refined._
import eu.timepit.refined.api.Refined
import eu.timepit.refined.auto._
import eu.timepit.refined.numeric._

type PositiveInt = Int Refined Positive

Great! but we have a problem as 0 is not included in this type:

elemAt(arr, 0)

gives the compilation error: Predicate failed: (0 > 0).

In fact we should define our PositiveInt as all the integers that are null or positive (or equivalently all the integers that are not less than 0).

The problem is that we need to encode 0 at the type level, 0 should be part of the PositiveInt definition. Well thinking about it PositiveInt is a sub-type (or sub-set) of Int and so is 0. In this sense 0 is a perfectly valid type that can only take a single value 0. Such types are called singleton types and as they are not currently supported by the compiler (they will be in Dotty) we must encode them in specific way using Shapeless Witness type (or Shapeless natural number types).

import eu.timepit.refined.boolean._
import shapeless.nat._
import shapeless.{Witness => W}

// using Nat numbers
type PositiveInt = Int Refined Not[Less[_0]]
// or using Witness
type PosInt = Int Refined GreaterEqual[W.`0`.T]

Obviously PositiveInt is a subtype of Int. All PositiveInts are also Ints.

That’s much better but IndexOutOfBoundsException can still be thrown if the index is outside of the array. We can change the return type to Option[A] (or Either[String, A] to give some context about the error).

def elementAt[A](arr: Array[A], index: PositiveInt): Option[A] =
  if (index < arr.length) Some(arr(index))
  else None

It’s safe but it forces all the calling code to deal with the Option.

We could avoid that if we could encode the array’s size at the type:

trait Size[L]
val c = Array("a", "b", "c") with Size[_3]

type PositiveIntLessThan[N] = Int Refined Interval.ClosedOpen[_0, N]

def elementAt[A, L](arr: Array[A] with Size[L], index: PositiveIntLessThan[L]): A

Compile-time vs Run-time

Anyway most of the time the array size is not known at compile time. Which makes all this ‘type-safetyness’ useless.

This brings up the question: How useful are refined types? (because most of the data a program has to deal with is only known at runtime).

The idea is to use refined types as much as possible inside your business logic. E.g. If you need to compute the average price as part of your business logic you might have something like:

def average(values: List[Double]): Double =
  values.sum / values.size

However we fell in the same trap as before as the function is actually a partial function. It applies only to non-empty lists. So why not reflect this fact in the method signature (We can use refined or cats NonEmptyList for that matter).

def average(values: List[Double] Refined NonEmpty): Double =
  values.sum / values.size
// or with cats NonEmptyList
def avergage(values: NonEmptyList[Double]): Double =
  values.toList.sum / values.toList.size

Now where does values comes from? Probably from some user inputs at runtime. And the earliest we can validate the user input is at runtime at the edge of the program. This is where user input validation should occur.

As soon as the data enters the program (e.g. calling an HTTP endpoint, reading a file, …) we must validate that it’s valid and warn the user if it’s not.

It can mean adding validation to JSON readers to return a NonEmptyList instead of a List, adding an extra validation layer to the protobuf deserialisation or even validating the configuration values on startup.

Fortunately we can also use refined to validate the data at runtime but this time we’ll have to deal with the return type Either[String, T Refined R] that may contain the error (on the left side).

import eu.timepit.refined._
import eu.timepit.string._
// let's pretend this is known at runtime
val input: String = "v1.0.3"

type ValidVersion = MatchesRegex[W.`"""v\\d+(\\.\\d+(\\.\\d+)?)?"""`.T]

val version: Either[String, String Refined ValidVersion] =
  refinedV[ValidVersion](input)

We can then define our version type as

type Version = String Refined ValidVersion

And then use Version everywhere we need a version in the business logic

def tag(asset: Asset, version: Version) = ???

Conclusion

Refined types looks very promising in theory but in practice it’s very much limited as most of the data is only known at compile time.
However it still makes sense to use in the business logic as it avoids dealing with some kind of errors.
By doing this the validation is pushed to the edges of the system (reading configuration values and user inputs).

Fortunately Refined provides runtime validation out-of-the box.

You can check Ciris for configuration validation using refined types. Ciris is an opinionated library that validates configuration using refined types.

To validate JSON inputs you can easily use refineV to add validation to JSON deserialisation. E.g. with play-json:

import play.api.libs.json._

implicit def refinedReads[T, P](
  implicit reads: Reads[T], v: Validate[T, P]
): Reads[T Refined P] =
  Reads[T Refined P] { json =>
    reads
      .reads(json)
      .flatMap { t: T =>
        refineV[P](t) match {
          case Left(error) => JsError(error)
          case Right(value) => JsSuccess(value)
        }
      }
  } 

On this note you can check play-json-refined, a tiny library that defines implicit Reads and Writes for refined types.

  • NickR

    I love the idea of Refined types though I haven’t looked into it much yet.

    I find it very analagous to declaring a Constraint Satisfaction Problem or an Optimisation Problem (e.g. solve/minimize/maximize Expr such that Constraints a,b,c are met).

  • Yurii Ostapchuk

    really really nice