Kafka concepts and common patterns

Many people see Kafka as a messaging system but in reality it’s more than that. It’s a distributed streaming platform. While it can be used as a traditional messaging platform it also means that it’s more complex.

In this post we’ll introduce the main concepts present in Kafka and see how they can be used to build different application from the traditional publish/subscribe all the way up to streaming applications. Continue reading “Kafka concepts and common patterns”

Generating protobuf formats with scala.meta macros

Today’s focus is on scalameta. In this introduction post we’re going to see how to create a macro annotation to generate protobuf formats for case classes.

The idea is to be able to serialise any case classes to protobuf just by adding a @PBSerializable annotation to the case class declaration.

Then behind the scene the macro will generate implicit formats in the companion object. These implicit formats can then be used to serialise the case class to/from protobuf binary format.

This is quite similar to Json formats of play-json.

In this post we’re going to cover the main principles of scalameta and how to apply them to create our own macros. Continue reading “Generating protobuf formats with scala.meta macros”

Building anti-corruption layers with Akka

Akka actors fits nicely with DDD (Domain Driven Design) to design an application. E.g. It’s quite natural to model entities as individual actors who can be persisted, …

One of the key aspect in DDD is the notion of bounded context. A bounded context is simply a “self-content” component. It can interact with other components but it is coherent on its own. Each bounded context has its own domain model which belongs only to itself and should not leaked or be influenced by other bounded context.

Anti-corruption layers (aka translation layers or adapter layers) are used to enforce this principle. Basically their role is to translate the core domain objects into/from another domain that is used for communication or persistence.

In this blog post we’re going to try to follow the DDD principles to build a small (contrived) application using Akka and try to figure out the best way to build efficient anticorruption layers. Continue reading “Building anti-corruption layers with Akka”

PBDirect – Protobuf without the .proto files

Protocol Buffer (aka Protobuf) is an efficient and fast way to serialise data into a binary format. It is much more compact than Java serialisation or any text-based format (Json, XML, CSV, …).

Protobuf is schema based – it needs a description (in a .proto file) of the data structures to be serialised/deserialised.

On the JVM, protoc (the Protobuf compiler) reads the .proto description files and generates corresponding classes.
For Scala there is a very good sbt plugin “scalaPB” that follows the same process and generates case classes corresponding to the .proto files definitions.

The .proto files are an easy way to describe a protocol between 2 components (e.g. services). However there are some cases (e.g. writing to persistent storage) where the .proto files definition are just unnecessary and add superfluous complexity. (Who likes to read auto-generated code?).

In such cases it would be much easier to serialise an object directly into protobuf (using its class definition as a schema). Afterall this is what the protobuf java binding does: it serialises (auto-generated) java classes into protobuf binary format.

To that matter, let me introduce – PBDirect – a scala library to directly encode scala objects into protobuf. Continue reading “PBDirect – Protobuf without the .proto files”

Reinforcement learning

It’s been a while we haven’t covered any machine learning algorithm. Last time we discussed the Markov Decision Process (or MDP).

Today we’re going to build our knowledge on top of the MDP and see how we can generalise our MDP to solve more complex problems.

Reinforcement learning really hit the news back in 2013 when a computer learned how to play a bunch of old Atari games (like Breakout) just by observing the pixels on the screen. Let’s find out how this is possible! Continue reading “Reinforcement learning”

Rethinking logging on the JVM with Logoon

Logging has been around on the JVM for a while now. It all started with Log4J back in 2001. Log4J was the first logging framework and it is still around today (in its version 2). It provides a simple and efficient API (compare to System.out.println that was in use before).

  1. Get a logger for a class
  2. Use that logger to log messages
val logger = Logger.getLogger(classOf[MyClass])
...
logger.log(Level.DEBUG, "I am doing something right now")
...
logger.error("Oops, something went wrong", theException)
...

Today there are a few more frameworks on the JVM but they all provide similar APIs as Log4J:

  • JUL(2002): java.util.logging provides a standardisation of Log4J and of course provides a similar API
  • Commons-logging (2002): Apache project providing a façade over Log4J, JUL, … still the same API
  • SLF4J (2005): Another façade over Log4J (1&2), JUL, JCL, … no much changes in the API
  • Logback (2006): Brings structured logging with an API compatible (and similar) to SLF4J (and Log4J)
  • Log4J2 (2012): Rewrite of Log4J inspired by Log4J and Logback with improved performances. The API does not change much though.

As you can see the logging APIs available on the JVM haven’t changed much over the last 15 years. The most interesting additions are structured logging and the Mapped Dependent Context (MDC) as we shall see later.

In this post I am going to look at the current limitations of these APIs and see how we can overcome them while still relying on this frameworks to actually write the logs. Continue reading “Rethinking logging on the JVM with Logoon”

Fluent – A deep dive into Shapeless and implicit resolution

As promised in my previous post we’re going to explore to internal of Fluent and how it uses Shapeless and implicit resolution to transform case classes.

Fluent started as an experiment (and still is), the code is rather small (about 300 lines of code) and yet I am still impressed by the variety of cases it can handle.

Before working with Shapeless I’ve often heard that is pure magic and I got the impression that most people (including me) don’t really know how it works. It turns out that the principles used in Shapeless are not really difficult to understand – especially if you read the well-written Type Astronaut’s guide to Shapeless.

Understanding how Shapeless works doesn’t mean it’s easy to work with. Actually Shapeless makes a heavy use of implicits and working with implicits is hard. Remember that implicits resolution is performed at compile time so when it fails, there is nothing to debug, no log messages or stack trace. We are just left with rather blunt messages like could not find implicit value for parameter ...

In this post I am going to explain the concept used in Fluent, the problem I faced during implementation and hopefully by the end of the post, you’ll know enough to understand and edit the code (Pull requests welcomed!). Continue reading “Fluent – A deep dive into Shapeless and implicit resolution”

Introducing Fluent – the seamless translation layer

In Domain Driven Design (DDD) it is recommended to introduce a translation layer (aka anticorruption layer) between 2 bounded contexts. The role of the anticorruption layer is to avoid any concepts to leak from one domain into the other.

This is a sound idea as it keeps the domains isolated from each other ensuring they can evolve independently. After having implemented several anticorruption layers I realised that, although useful, they also introduced a lot of boilerplate code that doesn’t add much value to the business.

To this extent, let me introduce Fluent, a library that aims at getting rid of this boilerplate code by leveraging all the power of Shapeless and its generic programming. Continue reading “Introducing Fluent – the seamless translation layer”

Akka Streams patterns

Streams processing have been around for a while and encompasses a great number of applications:

  • HTTP servers handling stream of incoming HTTP requests
  • Message streams: Twitter hose, user posts, …
  • Time-series messaging: stream from IoT sensors
  • Database querying: result set contains a stream of record
  • ….

Most interestingly reactive streams have gain traction over the past few years. They bring back-pressure into the game in order to avoid having the destination stream over flooded by messages from the source stream.

This post focuses on AkkaStream, a reactive stream implementation based on Akka actors. Unlike actors which are untyped, AkkaStreams provides type safety at every stage of the stream pipeline and also comes with a nice and fluent API. However the documentation is sometimes lacking or not easy to search when someone needs to implement common patterns. This post tries to cover the most common ones in a clear and concise way. Continue reading “Akka Streams patterns”

The Cassandra Java Driver

Cassandra drivers are not just a dumb piece of software that sends CQL strings to a Cassandra node and waits for responses.

They are actually quite smart and are architectured in a way that should make your life easier while still attempting to get the most performance out of Cassandra.

In this post I am going to focus on the Java driver, have a quick look at its architecture and on some of the features it offers. Continue reading “The Cassandra Java Driver”