Reducing type class boilerplate with Shapeless

If you followed our previous post on forging a DSL using type classes, you surely notice that writing type class instances is a rather repetitive task.

In today’s post we’re going to get rid of this by derivating type class instances automatically using Shapeless. We also use this post as an excuse to experiment with Shapeless and try to understand all the “magic” that’s happening. Continue reading “Reducing type class boilerplate with Shapeless”

Forging a DSL using Scala type classes

In this post we’re going to explore how to build a DSL (Domain Specific Language) with a user-friendly syntax while maintaining as much type-safety as possible. We want that any operations that is not allowed by the business rules fail at compile time. This would be really nice as it makes sure that no one writes such forbidden logic (even by mistake).

More over Scala provides really nice syntactic sugar that can make a DSL syntax pretty neat.

If you don’t know what type classes or don’t feel very comfortable with this concept, follow along as we’ll also explore how we can use them to dissociate data and behaviours (always a good practice). Continue reading “Forging a DSL using Scala type classes”

Querying Cassandra from Scala

When it comes to accessing Cassandra from Scala there are 2 possible approaches:

  • The official Java driver
  • A custom DSL like Quill or Phantom

Custom-DSL are nice as they provide all the type-safety you need against your data schema. However in this post I will focus only on the Java driver. Why? Because it’s both a simple and decent solution in my opinion.

The bad thing is that you lose any type-safety as all the queries are just plain strings. On the other hand you don’t have to learn a new DSL because your queries are just CQL. Add a thorough test coverage and you have a viable solution.

Moreover the Java driver provides an async API backed by Guava’s futures and it’s not that difficult to turn these futures into Scala futures – which makes a quite natural API in Scala.

There are still some shortcomings that you’d better be aware of when consuming a result set but overall I think that it’s still a simple solution that is worth considering. Continue reading “Querying Cassandra from Scala”

Understanding Cassandra tombstones

We recently deployed in production a distributed system that uses Cassandra as its persistent storage.

Not long after we noticed that there were many warnings about tombstones in Cassandra logs.

WARN  [SharedPool-Worker-2] 2017-01-20 16:14:45,153 ReadCommand.java:508 - 
Read 5000 live rows and 4771 tombstone cells for query 
SELECT * FROM warehouse.locations WHERE token(address) >= token(D3-DJ-21-B-02) LIMIT 5000 
(see tombstone_warn_threshold)

We found it quite surprising at first because we’ve only inserted data so far and didn’t expect to see that many tombstones in our database. After asking some people around no one seemed to have a clear explanation on what was going on in Cassandra.

In fact, the main misconception about tombstones is that people associate it with delete operations. While it’s true that tombstones are generated when data is deleted it is not the only case as we shall see. Continue reading “Understanding Cassandra tombstones”

Akka persistence

The actor model allows us to write complex distributed applications by containing the mutable state inside an actor boundary. However with Akka this state is not persistent. If the actor dies and then restarts all its state is lost.

To address this problem Akka provides the Akka Persistence framework. Akka Persistence is an effective way to persist an actor state but it’s integration needs to be well thought as it can greatly impact your application design. It fits nicely with the actor model and distributed system design – but is quite different from what a “more classic” application looks like.

In this post I am going to gloss over the different components of Akka Persistence and see how they influence the design choices. I’ll also try to cover some of the common pitfalls to avoid when building a distributed application with Akka Persistence.

Although Akka Persistence allows you to plug in various storage backends in this post I mainly discuss using the Cassandra backend. Continue reading “Akka persistence”

Markov Decision Process

Now that we know about Markov chain, let’s focus on a slightly different process: the Markov Decision Process.

This process is quite similar to a Markov chain but adds more concept into it: Actions and Rewards. Having a reward means that it’s possible to learn which action yield the best rewards. This type of learning is also known as reinforcement learning.

In this post we’re going to see what exactly is a Markov decision process and how to solve it in an optimal way. Continue reading “Markov Decision Process”

Hidden Markov Model

Last time we talk about what is a Markov chain. However there is one big limitation:

A Markov chain implies that we can directly observe the state of the process. (e.g the number of people in the queue).

Many times we can only access an indirect representation or noisy measure of the state of the system. (e.g. we know the noisy GPS coordinates of a robot but we want to know it’s real position).

Trellis diagram

In this post we’re going to focus on the second point and see how to deal with HMMs. In fact HMM can be useful every time that we don’t have direct access to the system state. Let’s take some motivational examples first before we dig into the maths. Continue reading “Hidden Markov Model”

Markov chain

I’ve heard the term “Markov chain” a couple of times but never had a look at what it actually is. Probably because it sounded too impressive and complicated.

It turns out it’s not as bad as it sounds. In fact the whole idea is pretty intuitive and once you get a feeling of how things work it’s much easier to get your head around the mathematics.

Andrey Markov was a Russian mathematician who studied stochastic processes (a stochastic process is a random process) and specially systems that follow a suite of linked events. Markov found interesting results about discrete processes that form a chain. Continue reading “Markov chain”

The free monad without the boilerplate

The free monad is really neat for creating DSL as it allows to completely separate the business logic from the implementation.

It leaves a great freedom for the implementation choices and should you change your implementation you just need to rewrite your interpreter without changing any of the business logic.

That’s great! However after playing a little around with the free monad I was a bit skeptical about the amount of boilerplate required to plug everything together. Writing smart constructors and injectors is not the most trivial thing (depending on how your team is familiar with functional programming).

To conclude this serie on the free monad we’ll have a look at some solutions to remove this boilerplate code, especially freasy and freek.
Continue reading “The free monad without the boilerplate”

The free monad in (almost) real life

Now that we have a fairly good understanding of what the free monad is and how it works. Let’s see what it looks like to write a real program based on this concept. This time we won’t implement Free ourselves but use the cats library for that. If you don’t know all the theory behind the free monad and just want to see how to use it then read on as this post builds an application from scratch.

In this post we’ll write simple DSLs, lift them into the free monad and even combine them into one program. There is some boiler plate around but I want to get a feeling of what it looks like to combine more DSLs and see how we can architecture a whole application around the free monad. Continue reading “The free monad in (almost) real life”