Cassandra lightweight transactions

We all know that Cassandra is a distributed database. However there’re situations where one needs to perform an atomic operation and for such cases a consensus must be reached between all the replicas.

For instance when dealing with payments we might require that we only insert the row once.
Continue reading “Cassandra lightweight transactions”

DSEGraph under the hood

Today we’ll dig deeper inside DSEGraph and see how it uses Cassandra to distribute the graph over the cluster.

We’ll illustrate this post with a classic toy example: a movie graph.

Our model is very simple. I consists in “persons” who “acted in”  “movies” which “belongs to” “genres”.

The graph schema used in the post
The graph schema used in the post

Continue reading “DSEGraph under the hood”

Multi-Paxos

Last week we’ve seen how basic Paxos works. Today we’re going to extend it in order to run a distributed state machine  – a state machine with the same state on all the nodes.

The idea is to use a distributed log to run the state machine. Each entry in the log is an operation to apply to the state machine. If the log is the same on every node then the same operations are applied in the same order on all the nodes and therefore the state machines are all in the same state.

2 nodes in cluster using paxos to maintain a distributed log to replicate a state-machine
2 nodes in cluster using paxos to maintain a distributed log to replicate a state-machine

The question is :

How to make sure the log is the same on all the nodes?

The solution is to run basic-Paxos for every log entry (plus add some tweaks to solve some issues and improve performance)
Continue reading “Multi-Paxos”

Nd4j – Numpy for the JVM

I have spent years programming in Java and one thing (among others) that I found frustrating is the lack of mathematical libraries (not to say Machine learning framework) on the JVM.

In fact if you’re a little interested in machine learning you’ll notice that all the cool stuffs are written in C++ (for performance reasons) and most often provide a Python wrapper (because who wants to program in C++ anyway).
Continue reading “Nd4j – Numpy for the JVM”

TF-IDF

The idea from this blog post came after finishing the lab on TF-IDF of the edx Spark specialisation courses.

EDX - CS110x - Big data analysis with Spark

In this course the labs follow a step-by-step approach where you need to write some lines of code at every step. The lab is very detailed and easy to follow. However I found that focusing on a single step at a time I was missing the big picture of what’s happening overall.
Continue reading “TF-IDF”

The return of the typed Actor

People regularly complains about the lack of type safety within Akka actors. After 2 rather unsuccessful attempts (using byte-code generation at runtime for the first and java proxies for the second) the third attempt seems much more promising.

Let’s start with regular non-typed actors to implement a very basic toy example and then move on towards a fully typed actor.
Continue reading “The return of the typed Actor”