Type refinement is all about making the types more precise. But why would do that? Because using the correct types makes your program safer as you reduce the possibility to introduce bugs.
First let’s think about types. For instance String
is a type we use all the time. A variable of type String
can have many different values (in theory an infinity of values) but it’s quite unlikely that all these values make sense in our application.
Domain model
Be more precise with strings
Consider the case where we need to tag an asset with a version.
def tag(asset: String, version: String) = ??? // and then tag("image.png", "v0.8")
We could also have written:
tag("v0.8", "image.png")
which introduces a subtle bug that would only cause exceptions at runtime (or worse corrupt the data).
One solution is to wrap those values into case classes:
case class Version(value: String) case class Asset(name: String) def tag(asset: Asset, version: Version) = ???
Now it’s no longer possible to switch the arguments. However it’s still possible to build a wrong version:
val version = Version("image.png")
To solve the problem we can add validation using smart constructors
sealed abstract case class Version(value: String) object Version { val version = raw"v\d+(\.\d+(\.\d+)?)?".r def fromString(value: String): Option[String] = version match { case version(_*) => Some(new Version(value) {}) case _ => None } }
That’s much better we can now only construct valid Version
objects. But as not all strings map to a valid version number our function can’t map any strings to a version number. This is indicated by the Option
in the return type.
In fact only a subset of strings are valid version numbers so how about having a string sub-type that says String Refined MatchesRegex[W.`"""v\\d+(\\.\\d+(\\.\\d+)?)?"""`.T]
? Well the Scala Refined is a library that allows to write just that!
Limiting the number values
Same thing applies to numbers. There’s an infinity of number values (in theory) but not all values make sense. For instance to index an array we need an integer not a decimal. In this case we don’t use the Number
type but the Integer
(or Int
) type. It’s more precise and avoids the possibility that someone will use meaningless values (e.g. 2.54). But still not all integers are valid here. Only positive values should be allowed (unless negative values are used to index from the end of the array). Let’s consider what happens with this code:
val arr: Array[Int] = Array(4,3,5,7,2,1) arr(-3)
Exactly we get an error (an IndexOutOfBoundsException
) when we execute the program. It’s a runtime error. The compilation was successful although the error could have been caught if the compiler knew the Int should be positive.
Now imagine that we have a type PositiveInt
that represents all the integers values from 0: 0, 1, 2, 3, … Something like
type PositiveInt = Int Refined Positive
We could then have defined our method like this:
def elemAt[A](arr: Array[A], index: PositiveInt): A = arr(index)
and now it is no longer possible to call elemAt(arr, -2)
. This error fails at compile time.
Refined allows us to define such types.
import eu.timepit.refined._ import eu.timepit.refined.api.Refined import eu.timepit.refined.auto._ import eu.timepit.refined.numeric._ type PositiveInt = Int Refined Positive
Great! but we have a problem as 0 is not included in this type:
elemAt(arr, 0)
gives the compilation error: Predicate failed: (0 > 0)
.
In fact we should define our PositiveInt
as all the integers that are null or positive (or equivalently all the integers that are not less than 0).
The problem is that we need to encode 0 at the type level, 0 should be part of the PositiveInt
definition. Well thinking about it PositiveInt
is a sub-type (or sub-set) of Int
and so is 0. In this sense 0 is a perfectly valid type that can only take a single value 0. Such types are called singleton types and as they are not currently supported by the compiler (they will be in Dotty) we must encode them in specific way using Shapeless Witness
type (or Shapeless natural number types).
import eu.timepit.refined.boolean._ import shapeless.nat._ import shapeless.{Witness => W} // using Nat numbers type PositiveInt = Int Refined Not[Less[_0]] // or using Witness type PosInt = Int Refined GreaterEqual[W.`0`.T]
Obviously PositiveInt
is a subtype of Int
. All PositiveInt
s are also Int
s.
That’s much better but IndexOutOfBoundsException
can still be thrown if the index is outside of the array. We can change the return type to Option[A]
(or Either[String, A]
to give some context about the error).
def elementAt[A](arr: Array[A], index: PositiveInt): Option[A] = if (index < arr.length) Some(arr(index)) else None
It’s safe but it forces all the calling code to deal with the Option
.
We could avoid that if we could encode the array’s size at the type:
trait Size[L] val c = Array("a", "b", "c") with Size[_3] type PositiveIntLessThan[N] = Int Refined Interval.ClosedOpen[_0, N] def elementAt[A, L](arr: Array[A] with Size[L], index: PositiveIntLessThan[L]): A
Compile-time vs Run-time
Anyway most of the time the array size is not known at compile time. Which makes all this ‘type-safetyness’ useless.
This brings up the question: How useful are refined types? (because most of the data a program has to deal with is only known at runtime).
The idea is to use refined types as much as possible inside your business logic. E.g. If you need to compute the average price as part of your business logic you might have something like:
def average(values: List[Double]): Double = values.sum / values.size
However we fell in the same trap as before as the function is actually a partial function. It applies only to non-empty lists. So why not reflect this fact in the method signature (We can use refined or cats NonEmptyList
for that matter).
def average(values: List[Double] Refined NonEmpty): Double = values.sum / values.size // or with cats NonEmptyList def avergage(values: NonEmptyList[Double]): Double = values.toList.sum / values.toList.size
Now where does values comes from? Probably from some user inputs at runtime. And the earliest we can validate the user input is at runtime at the edge of the program. This is where user input validation should occur.
As soon as the data enters the program (e.g. calling an HTTP endpoint, reading a file, …) we must validate that it’s valid and warn the user if it’s not.
It can mean adding validation to JSON readers to return a NonEmptyList
instead of a List
, adding an extra validation layer to the protobuf deserialisation or even validating the configuration values on startup.
Fortunately we can also use refined to validate the data at runtime but this time we’ll have to deal with the return type Either[String, T Refined R]
that may contain the error (on the left side).
import eu.timepit.refined._ import eu.timepit.string._ // let's pretend this is known at runtime val input: String = "v1.0.3" type ValidVersion = MatchesRegex[W.`"""v\\d+(\\.\\d+(\\.\\d+)?)?"""`.T] val version: Either[String, String Refined ValidVersion] = refinedV[ValidVersion](input)
We can then define our version type as
type Version = String Refined ValidVersion
And then use Version
everywhere we need a version in the business logic
def tag(asset: Asset, version: Version) = ???
Conclusion
Refined types looks very promising in theory but in practice it’s very much limited as most of the data is only known at compile time.
However it still makes sense to use in the business logic as it avoids dealing with some kind of errors.
By doing this the validation is pushed to the edges of the system (reading configuration values and user inputs).
Fortunately Refined provides runtime validation out-of-the box.
You can check Ciris for configuration validation using refined types. Ciris is an opinionated library that validates configuration using refined types.
To validate JSON inputs you can easily use refineV
to add validation to JSON deserialisation. E.g. with play-json:
import play.api.libs.json._ implicit def refinedReads[T, P]( implicit reads: Reads[T], v: Validate[T, P] ): Reads[T Refined P] = Reads[T Refined P] { json => reads .reads(json) .flatMap { t: T => refineV[P](t) match { case Left(error) => JsError(error) case Right(value) => JsSuccess(value) } } }
On this note you can check play-json-refined, a tiny library that defines implicit Reads
and Writes
for refined types.