Categories
Computing

Akka Streams patterns

Streams processing have been around for a while and encompasses a great number of applications:

  • HTTP servers handling stream of incoming HTTP requests
  • Message streams: Twitter hose, user posts, …
  • Time-series messaging: stream from IoT sensors
  • Database querying: result set contains a stream of record
  • ….

Most interestingly reactive streams have gain traction over the past few years. They bring back-pressure into the game in order to avoid having the destination stream over flooded by messages from the source stream.

This post focuses on AkkaStream, a reactive stream implementation based on Akka actors. Unlike actors which are untyped, AkkaStreams provides type safety at every stage of the stream pipeline and also comes with a nice and fluent API. However the documentation is sometimes lacking or not easy to search when someone needs to implement common patterns. This post tries to cover the most common ones in a clear and concise way.

Categories
Computing

Akka persistence

The actor model allows us to write complex distributed applications by containing the mutable state inside an actor boundary. However with Akka this state is not persistent. If the actor dies and then restarts all its state is lost.

To address this problem Akka provides the Akka Persistence framework. Akka Persistence is an effective way to persist an actor state but it’s integration needs to be well thought as it can greatly impact your application design. It fits nicely with the actor model and distributed system design – but is quite different from what a “more classic” application looks like.

In this post I am going to gloss over the different components of Akka Persistence and see how they influence the design choices. I’ll also try to cover some of the common pitfalls to avoid when building a distributed application with Akka Persistence.

Although Akka Persistence allows you to plug in various storage backends in this post I mainly discuss using the Cassandra backend.

Categories
Computing

Synchrone computation in Akka actor

The problem

Last time we’ve seen how to deal with future inside an actor. However some times you just need to wait for a future to finish before processing the next message.

Categories
Computing

Futures in actor

Future and Actor are two different paradigms to express concurrent computations. Both of them are perfectly valid abstractions. However one must be careful when it comes to mixing them up.

Categories
Computing

The return of the typed Actor

People regularly complains about the lack of type safety within Akka actors. After 2 rather unsuccessful attempts (using byte-code generation at runtime for the first and java proxies for the second) the third attempt seems much more promising.

Let’s start with regular non-typed actors to implement a very basic toy example and then move on towards a fully typed actor.

Categories
Computing

Kafka streams

Stream computing is one of the hot topic at the moment. It’s not just hype but actually a more generic abstraction that unifies the classical request/response processing with batch processing.

Stream paradigm

The request/response is a 1-1 scheme: 1 request gives 1 response. On the other hand the batch processing is an all-all scheme: all requests are processed at once and gives all response back.

Stream processing lies in between where some requests gives some responses. Depending on how you configure the stream processing you lie closer to one end than the other.

Categories
Computing

Introduction to Alluxio

Continuing my tour of the Spark ecosystem today’s focus will be on Alluxio, a distributed storage system that integrates nicely with many compute engines – including Spark.

What is Alluxio ?

The official definition of Alluxio is (or at least that’s how one of its author presents it):

Alluxio is an open source memory speed virtual distributed storage

Let’s see what each of these terms actually means:

Categories
Computing

Apache Spark data structures

Apache Spark is a computation engine for large scale data processing. Over the past few months a couple of new data structures have been available. In this post I am going to review each data structure trying to highlight their forces and weaknesses.

I also compares how to express a basic word count example using each data structure.