Alfredo Di Napoli's Tech Blog

How to backup and store your GPG private key (semi) securely

2018-12-20T00:00:00Z

Summary: splitting your key into multiple pieces and do something different with each piece is a not-too-shabby way to secure your GPG private key.

Disclaimer: I am not a security expert, and there is no such thing as “perfect security”, thus the witty title of my blog post. You are encouraged to try out what I am describing here, but I am not responsible for any misuse or data loss occurring while trying out the steps below.

I am guilty of discovering the power and versatility of GPG keys only in my late 30s (I know, I know). It happened by pure chance as it was a required step for my previous job. It’s a really nice way to bind and verify digital identities to some random bits sitting in your computer. Plus, it opens up new exciting possibilities, like securing all your passwords with something like pass. However, this has the (expected) consequence that now your GPG key becomes a single point of failure and if it gets lost or, worse, stolen, it can be quite a disaster.

To mitigate this, saving it somewhere safe is paramount. This begs the question: “how?”. I usually have (enough but not too much) trust in cloud providers like Dropbox, Google Drive & co, but ultimately you are sending your precious data over the network (hoping the connection is secure and the traffic not spoofed) and if one of these services gets compromised, the integrity and security of your data also is.

To mitigate this, I have been researching ways of securely storing my private GPG key, and I think what I came up with is decent (at least for my use case).

First attempt

My first attempt was - as suggested around - to use something like qrencode to turn my GPG key into a QR code I could then print and store in a closet. Then I could use something like zbarimg to retrive my key. However, this approach releaved to be not practical as my key is too big and it doesn’t fit into a QR code.

The final solution

I ended up splitting my key into 4 parts, and I have then encoded as a QR code only the first part; the other three have been encrypted with a symmetric key (i.e. gpg -c your-file). By doing that, I can store the encrypted parts inside Dropbox, and print the first part and store it physically in a safe place.

Why?

That’s a fair question; in principle, simply encrypting my GPG key with a symmetric key and upload it to Dropbox might have been enough, but I feel like this schema opens up more possibilities. I especially like the fact I am not dependant on just one cloud provider and that, even if Dropbox is ever compromised, and even if by some quantum-computer-miracle my symmetric key is bruteforced, the attacker would still miss the last piece of the puzzle (no pun intended).

Last but not least, here I have only used a single symmetric key for all the three parts of my key, but nobody is preventing me from using two or even three different ones. This way, even if one symmetric key is leaked somehow, the attacker couldn’t still have access to my full GPG key (provided he could get his hands on the QR code, of course).

The scripts

I have two scripts that I have whipped up: the first one exports my GPG key and split it into chunks:

#!/usr/bin/env bash

# First export your gpg key like so:
#
# gpg --export-secret-keys -a -o mykey.asc
#
# Then this script will generate 4 qr codes for your key. At this point
# it's up to you what to do with these images.

split -b 2800 $1 mykey-

for file in mykey-??; do
    <"$file" qrencode -o "$file".png
done

Running this script giving as input the filename of your exported key would first split it into multiple parts and finally for each of them run qrencode. At this point you would be left with an .asc file (which you might want to delete now) and a bunch of files like, for example:

mykey-aa
mykey-aa.png
mykey-ab
mykey-ab.png
...

At this point you can delete all the .png but the first one, and conversely delete the first mykey-aa keeping the other around. Now, for each of the “chunks” (except the one and only .png, of course) proceed to the gpg encryption with a symmetric key of your choice. For example:

gpg -c mykey-ab

This will spawn pinetry and allow you to insert the password. At the end of this operation, you should have multiple .gpg files: these are the ones you can store on Dropbox. Your files should be looking somewhat like this:

mykey-aa.png
mykey-ab.gpg
mykey-ac.gpg
mykey-ad.gpg

Reconstructing the key

Once the time will come to reconstruct your key, you can use this other script:

#!/usr/bin/env bash

RESULT=mykey.asc

zbarimg --raw $1-aa.png | perl -pe 'chomp if eof' - > $RESULT

for f in $1-ab.gpg $1-ac.gpg $1-ad.gpg; do
  echo $f;
  gpg -dq $f >> $RESULT;
done

# Sanity check
gpg --dearmor $RESULT >/dev/null

You have to supply to this script the basename of your files (in our example would be mykey) and the script would first read the QR code, then decrypt the other chunks (in my case they are just 3 and are hardcoded in the script, your mileage might vary) and dulcis in fundo it will perform a sanity check on the reconstructed key.

Voilà, your key has been reconstructed! You can now import the key again inside your GPG keychain by doing gpg --import .

I have tested this approach myself when migrating the key from a laptop to another and it worked just fine, so hopefully this will be useful to you as well.

(Update 2018-12-23) A possible improvement

My friend Edsko pointed out a few quirks in my original scheme. In particular, assuming the possibility that Dropbox can actually be violated and that the last piece of the puzzle standing between the full key is the QR code, using the first chunk as our unencrypted QR code is a poor choice. First of all, we are still subject to bruteforce attacks as now all it takes for the attacker to get knowledge of the key is to bruteforce 1/4 of it. Not only that, but crucially the first part of the key includes a few bytes for the header (the -----BEGIN PGPG PRIVATE KEY BLOCK-----) which means there are actually even fewer bytes to bruteforce.

A better scheme here would be to:

Encrypt with a symmetric key all the chunks;
Turn into QR codes the chunks in the middle.

This way, we are not only eliminating the problem of the header, but also never storing something unencrypted. The obvious consequence is that now we need an extra step to recover our key.

Other alternatives

Somebody also suggested to use something like paperbak, which is ultimately possible but I was put off by what seemed to be quite a careful procedure for restoring the data from the printed bitmap. In particular the readme explains how is important to have a good printer etc etc, so I didn’t want to take the risk.

About MonadBaseControl

2017-05-06T00:00:00Z

This is a literate Haskell post. You can play with the examples in ghci, in a stack playground, calling:

stack ghci --package transformers --package transformers-base --package monad-control --package distributed-process --package distributed-process-monad-control

Let’s start with some imports:

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Control.Monad.Base
import Control.Monad.State hiding (StateT, runStateT, execStateT, evalStateT, get)
import Control.Monad.Trans.Control
import Control.Monad.Trans.State.Strict
import Control.Distributed.Process
import Control.Distributed.Process.MonadBaseControl

Consider this monad:

data Ctx = Ctx () -- some kind of env, not important.
newtype RemotePure m a = RemotePure { runRemote :: StateT Ctx m a }
                         deriving (Functor, Applicative, Monad, MonadState Ctx, MonadIO)
type RemoteM = RemotePure Process

instance MonadBase IO (RemotePure Process) where
  liftBase = RemotePure . liftBase

You now would like to write an instance for MonadBaseControl.

This is the definition of MonadBaseControl and RunInBase (at the time of writing, April 2017):

class MonadBase b m => MonadBaseControl b m | m -> b where
    type StM m a :: *
    liftBaseWith :: (RunInBase m b -> b a) -> m a
    restoreM :: StM m a -> m a

type RunInBase m b = forall a. m a -> b (StM m a)

A legitimate question is: how can I write the correct right hand side for StM? As you might know, when the type keyword is being used in a type class definition we are not dealing with a type synonym but with a type family. A type family is essentially a function with operates on types, not values, like “normal” functions. But how can I pick the correct RHS? How would I know? There are two approaches: one seems to be the most popular, but it requires the use of UndecidableInstances, the other was found in this small nugget of wisdom ¹ over at Stack Overflow.

The first one is this:

instance MonadBaseControl IO RemoteM where
  type StM RemoteM a = StM (StateT Ctx Process) a
  liftBaseWith f = RemotePure $ liftBaseWith $ \q -> f (q . runRemote)
  restoreM = RemotePure . restoreM

Why this works? Also, it might not be immediately clear why StM appears on the right hand side. You might ask yourself “Did he just pull out the type StM out of thin air and reused it?”. It’s allright, I have been there myself. Haskell notation is so dense sometimes it’s easy to get lost. The key insigth is this: StM is NOT a type, is a type-level FUNCTION! Here, all we are doing is calling StM on the RHS, effectively offloading computing the result and hoping that somebody already defined in the stack the final solution. So we are effectively applying StM to (StateT Ctx Process) a as an argument. And this is exactly why GHC asks us to enable UndecidableInstances. It cannot guarantee, without compiling the program, that GHC will terminate. Effectively (if I recall correctly what Andres Loh once told us in an Haskell course in London), as scary as the name might sound, UndecidableInstances simply tells us “hey, it’s probably going to be fine, but there is a chance the typechecking might not terminate”. This is (very loosely speaking) because the RHS doesn’t “reduce” as it has the same number of terms of the LHS, so GHC gets suspicious.

The other approach is to simply ask GHC for the result of the type family. How? Let’s fire up ghci, and let’s type this:

ghci> :set -XRankNTypes
ghci> import Control.Monad.Trans.Control
ghci> :kind! forall a. StM RemoteM a
forall a. StM RemoteM a :: *
= (a, Ctx)

Wow, can you believe how easy it was? It’s equally easy to convince ourselves why this result makes sense: this is nothing more of the result of applying Stm to StateT Ctx Process. Let’s find out:

ghci> import Control.Monad.Trans.State.Strict
ghci> import Control.Distributed.Process
ghci> :kind! forall a. StM (StateT Ctx Process) a
forall a. StM (StateT Ctx Process) a :: *
= (a, Ctx)

And the best part is that now we don’t need UndecidableInstances, and we are much more confident we are computing StM the right way. Our definition becomes:

instance MonadBaseControl IO RemoteM where
  type StM RemoteM a = (a, Ctx)
  liftBaseWith f = RemotePure $ liftBaseWith $ \q -> f (q . runRemote)
  restoreM = RemotePure . restoreM

Thanks, Daniel Wagner!↩︎

The simplest Haskell Priority Queue implementation I know of

2017-04-07T00:00:00Z

Nothing about what I’m going to say is novel or particularly mind-blowing, but yet useful, especially on programming competitions websites like HackerRank. This implementation is shamelessly stolen from Okasaki’s book.

A Priority Queue can be easily implemented in an imperative setting but is not totally obvious how that could efficiently translate into a functional language, especially in a pure language like Haskell.

See this blog post for another excellent implementation, but in OCaml (always based on Okasaki).

This is a Literate Haskell post. Let’s begin with the usual importing fandango:

module PriorityQueue where

import Data.List hiding (insert)

“Leftist heaps always keeps the left branches of all roots being the longer and in worst case, they are as long as the right branches. In other word, all right branches of all roots are shortest. In order to maintain this property, each node has a rank, which indidates the length of the path between the node and the right most leaf.” – ¹

type Rank = Int

data Heap a = Tip | Node {-# UNPACK #-} !Rank a (Heap a) (Heap a) deriving Show

-- Rank: the length of the path between the node and the right most leaf.
rank Tip = 0
rank (Node r _ _ _) = r

fromList :: Ord a => [a] -> Heap a
fromList [] = Tip
fromList (x:xs) = foldl' (\hp val -> insert val hp) (singleton x) xs

makeHeap is straightforward: we compare the two ranks, and we preserve the leftist property by setting as the rank the min of the two, and storing as the right child the smallest (aka with smallest rank) child.

makeHeap :: a -> Heap a -> Heap a -> Heap a
makeHeap x a b = if rank a >= rank b then Node (rank b + 1) x a b
                                     else Node (rank a + 1) x b a

empty :: Heap a
empty = Tip

singleton :: a -> Heap a
singleton x = Node 1 x Tip Tip

insert :: Ord a => a -> Heap a -> Heap a
insert x h = merge (singleton x) h

-- | Merge two heaps together, preserving the leftist property via `makeHeap`.
-- Runs in O(log n).
-- "The key insight behind leftist heaps is that two heaps can be merged by
-- merging their right spines as you would merge two sorted lists, and then
-- swapping the children of nodes along this path as necessary to restore
-- the leftist property." -- Okasaki
merge :: Ord a => Heap a -> Heap a -> Heap a
merge l Tip = l
merge Tip r = r
merge h1@(Node _ x l1 r1) h2@(Node _ y l2 r2) =
  if x <= y then makeHeap x l1 (merge r1 h2)
            else makeHeap y l2 (merge h1 r2)

-- | O(1).
peekMin :: Heap a -> Maybe a
peekMin Tip = Nothing
peekMin (Node _ x _ _) = Just x

-- | O(1), but evaluating the second element of the tuple has same complexity
-- of `merge`.
extractMin :: Ord a => Heap a -> Maybe (a, Heap a)
extractMin Tip = Nothing
extractMin (Node _ x a b) = Just (x, merge a b)

http://typeocaml.com/2015/03/12/heap-leftist-tree/↩︎

Deploying Haskell on AWS Lambda

2017-03-16T00:00:00Z

Originally published in IRIS Connect’s Engineering blog.

Background

At work I was put on a new project involving a low-volume application where a customer would upload some CSV files inside an S3 bucket, and this event would trigger some sort of server-side computation, to parse and validate the CSV files and produce some output. As soon as I looked at the requirements, AWS Lamba came to mind, especially because we didn’t want to maintain a server infrastructure due to the nature of the project. What tipped the scale was the fact AWS supports Lambda as one of the notification mediums to respond to an S3 event; upon seeing that my mind was set!

In a nutshell, this is the flow of the app:

Image courtesy of Amazon

AWS Lambda is capable of running any binary, as long as it’s statically linked and compiled for x86 Linux, but at the same time it doesn’t offer first-class support for Haskell, but it’s possible to spawn an external process via the Node.js API. So the trick is to:

Use Docker to generate a Linux executable
Create a JS Shim to spawn a process calling our executable
Bundle everything into a zip file (including 3rd party dependencies)
Upload to AWS Lambda
Watch out for any pitfalls

Needless to say that we need the Docker step because we are working on an OSX machine, whilst targeting Linux. Cross-compiling GHC for Linux is a bit of a pain, so it’s definitely way quicker to use Docker.

(Optional step) Reduce the size of the output binary

When I was working remotely from my parent’s house in Rome, I did have at my disposal only a humble domestic DSL connection, and definitely every byte mattered when it came to transfer data from my laptop into S3. This is why I’ve got into the habit of compressing my executables with UPX. The rest of the post takes that into the account, but note how such compression step is not required to deploy on AWS Lambda.

Use Docker to generate a Linux executable

We can use a script from the process outlined in my personal blog. It basically creates a new Docker Image called “ghc-linux-6.5-builder” using Build.plan as the Docker file. Build.plan itself is quite simple:

FROM fpco/stack-build:lts-6.5

ADD .  /usr/lib/haskell
WORKDIR /usr/lib/haskell
USER root
# Pass SSH_KEY as argument.
ARG SSH_KEY

# Specify a specific private key.
RUN echo "    IdentityFile /root/.ssh/id_rsa" >> /etc/ssh/ssh_config

# Skip host verification for GitHub
RUN echo "Host github.com\n\tStrictHostKeyChecking no\n" >> /etc/ssh/ssh_config

# Create SSH_KEY inside container.
RUN echo "$SSH_KEY" >> /root/.ssh/id_rsa

# Give private key correct permissions.
RUN chmod 0600 /root/.ssh/id_rsa
# Give private key correct permissions.
RUN chmod 0600 /root/.ssh/id_rsa

CMD ["stack"]

We are starting from the lts-6.5 Docker image FPComplete is providing us (which already includes all the system libraries that all the LTS 6.5 deps will need to link against) plus adding our own .ssh/id_rsa to make sure we can clone stuff from GitHub (this passage is optional and should only be required if we clone stuff from private repositories). Note that we are using a somewhat outdated LTS revision as with newer ones using GHC 8 I was getting an error whilst installing GHC (seemed a problem related to the toolchain), which unfortunately I didn’t have time to investigate further. Once everything has built correctly, we can simply alias stack-linux to be an invocation of the stack command via this newly built Docker image:


#!/usr/bin/env bash

eval $(docker-machine env)

docker run --rm \
       -v $PWD/linux-dist:/root/.local \
       -v $PWD/.stack-work:/usr/lib/haskell/.stack-work \
       -v $HOME/.stack:/root/.stack \
       ghc-linux-6.5-builder:latest stack --allow-different-user $@

You can see we are targeting an OSX machine, due to the call to docker-machine. Mapping host directories like $HOME/.stack will also ensure we will be caching packages built from one invocation to the other, making the process much quicker overall. So, to summarise, when we call build_linux what really happens is that we are first creating the new Docker image (or updating it), and finally calling stack-linux install as we would normally do on our local machine, but the cool thing here is that the build is going to happen on the Docker container and due to the fact we mounted the output of the install to linux-dist on our local machine, at the end of the process we will have a nice Linux binary ready to be deployed on AWS Lambda!

Here is how build_linux.sh is structured:

#!/usr/bin/env bash

eval $(docker-machine env)

docker build --build-arg SSH_KEY="$(cat ~/.ssh/id_rsa)" -t ghc-linux-6.5-builder -f Build.plan .

./bin/stack-linux install
upx linux-dist/bin/ma-csv-proc

After everything ran successfully, we have our final executable ready to be deployed:

☁  mathematica-csv-processor [issue-4] du -h linux-dist/bin/ma-csv-proc
2.6M    linux-dist/bin/ma-csv-proc

So 2.6MB is not bad at all for a full Haskell app which deserialise JSON from the network and parse CSV files!

Create a JS Shim (the Main Handler)

In order for AWS Lambda to run our code, we need an entrypoint, which we cannot write in Haskell as AWS Lambda doesn’t have first-class support for it. What we can do, though, is to create one using JavaScript and Node, and use the Node.js process API to spawn our executable:

const spawn = require('child_process').spawn;

exports.handler = function(event, context) {
    process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT']
    process.env['LD_LIBRARY_PATH'] = process.env['LAMBDA_TASK_ROOT']
    const main = spawn('./ma-csv-proc', { stdio: ['pipe', 'pipe', process.stderr] });

    main.stdout.on('data', function(data) {
        console.log(data.toString());
        context.done(null, data.toString());
    });

    main.on('close', function(code) {
        console.log('child process pipes closed with code '+ code);
        context.done(null, code);
    });

    main.on('exit', function(code){
        console.error('exit: ' + code);
        context.done(null, code);
    });

    main.on('error', function(err) {
        console.error('error: ' + err);
        context.done(null, err);
    });

    main.stdin.write(JSON.stringify({
        'event': event,
        'context': context
    }) + '\n');
}

Note something important: we are passing the event and the context that AWS sends us as a JSON into the stdin of our Haskell app. This ensure we can deserialise it on the Haskell side and grab the event AWS sent us, so we can react to it:

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Main where

import Data.Aeson as JSON
import qualified Data.Text as T

data RawInput = RawInput {
    event   :: JSON.Value
  , context :: JSON.Value
  } deriving Show

deriveFromJSON defaultOptions ''RawInput

type Result = Either MathematicaException ()

main :: IO ()
main = do
  raw <- T.hGetLine stdin
  case JSON.eitherDecode (toS raw) of
    Left e                 -> do
      putStrLn "Error reading Lambda input."
      putStrLn $ "Input was " <> toS raw
      putStrLn $ "Error was " <> e
      exitFailure
    Right (ri :: RawInput) -> -- Do stuff

Bundle everything into a zip file (including 3rd party deps)

Lambdas are running (as you would expect) in a sandboxed environment, so it comes as no surprise they don’t have all the executables you might need during your program’s execution. But not all is lost, as it’s entirely possible to ship them as part of the final .zip we’ll deploy. The only 2 constraints is that they should be self-contained (statically linked or otherwise) and built for Linux x86. Believe it or not Lambda doesn’t come with the excellent aws-cli, installed by default, so I had to package it (running it is not a problem as AWS Lambda uses IAM roles so permissioning is taken care of automatically). There is an excellent post about bundling aws-cli for AWS Lambda so I will simply redirect there for completeness.

Once we have everything we need, we can simply run a simple script to bundle everything up:

#!/usr/bin/env bash

mkdir -p deploy/aws-lambda/artifacts
rm -rf deploy/aws-lambda/artifacts/*
cd deploy/aws-lambda
zip -r -j artifacts/mathematica-csv-processor.zip Main.js aws ../../linux-dist/bin/ma-csv-proc

# Add aws stuff
zip -ur artifacts/mathematica-csv-processor.zip aws-cli
cd ../..

It’s very important to run the first command with -j which will “squash” relative paths and will ensure the binary and the JS entrypoint will be at the top level of the zip file, which is required by AWS Lambda in order to access our code correctly. Now the fun part, deploying!

Upload to AWS Lambda

Uploading to AWS Lambda is quite simple. All is needed is for the user to create a new Lambda and then upload the zip file, either from S3 or from a web form:

Pitfalls

After deploying a new revision of the app, I was presented with a quite laconic error in the CloudWatch logs:

c_poll: Permission Denied

There seems to be a few issues mentioning it explicitly, and none of them explaining what’s going on. It seems to be a GHC bug, but I’m not 100% sure (as that trac ticket claims is fixed). What’s sure is that c_poll is coming from GHC. Luckily for me, in order to “fix” the problem, it was sufficient to simply allocate more memory for my lambda, or increase the timeout for the program execution. With 256/512 MB I was able to comfortably running my code under 10 seconds.

Credits

More of the information I present here are not novel; I have shamelessly stolen ideas and concepts from these two excellent resources:

iconv-typed: An experiment in API design and type safety

2016-10-23T00:00:00Z

Summary: I'm releasing a type safe version of the iconv library, discussing my
API design choices and asking for feedback from the community.

I’m slowly making progress in an Haskell piece table library which could be used as a high performant data structure for text manipulation. The typical use case there would be writing a text editor in Haskell, something I had in the back of my mind doing (for fun) for a while.

So far the assumption I have made whilst developing it is that user text would be encoded/decoded as UTF-8, but in the real world, though, this is simply not true! That’s where encoding comes into play. I won’t get into too much detail about the piece table library (is not that interesting in its current shape!), but this should set the scene on why I needed text encoding in the first place.

In Haskell we have a couple of choices when dealing with text encoding: we can use some functions provided directly by the text library, use the encoding library or use Duncan Coutt’s iconv library. I really like iconv because it has such a simple API and it doesn’t assume anything on the input: the latter is given as a “blob of binary data” and it’s up to me to decide how to interpret it.

Despite its simplicity, I always thought the library also had great potential for things to go wrong: first of all, an EncodingName is simply a String, which the programmer can mispell and spend hours debugging why is program in producing garbage. Secondly, it requires the manual step of retrieving the list of available encodings from the system, typically piggybacking on the underlying C/GNU library. This is why today I’m releasing iconv-typed mainly to gather feedback from the community. It’s such a simply abstraction over iconv I’m surprised nobody thought about something similar, but maybe that’s because it’s so simple people have wrote it in their own projects without releasing it, or simply because maybe it has shortcomings I haven’t anticipated!

A taste of the API

APIwise, the library should feel familiar with the original iconv. Compare this short example using the iconv library:

{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where

import Codec.Text.IConv

main :: IO ()
main = print $ convert "UTF-8" "LATIN1" "hello"

With the equivalent in iconv-typed:

{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where

import Codec.Text.IConv.Typed

main :: IO ()
main = print $ convert @"UTF-8" @"LATIN1" "hello"

As you can see it’s almost identical except for the fact we are using TypeApplication’s @ operator. If we mispelled by accident UTF-8, we would get a type error. Profit! But how does it work?

Type families to the rescue!

Conceptually, it’s very simple: it fetches all the available encodings in a platform-dependent way (mainly invoking iconv -l under the hood), and then generates a closed type family via template Haskell to basically constrain the Symbol universe only to ones matching a valid encoding. A code snippet will demostrate this much better! We first commit the biggest sin in the whole Haskell universe and we get the encodings via unsafePerformIO [1] (it requires the Shelly library):

getAvailableEncodings :: [EncodingName]
getAvailableEncodings = unsafePerformIO $ shelly $ silently $ escaping False $ do
  map T.unpack . mconcat . map T.words . T.lines . T.strip <$> run "iconv" ["-l"]
{-# NOINLINE getAvailableEncodings #-}

… and then we generate the type family where each instance would look like this:

type family ValidEncoding (k :: Symbol) :: Bool where
  ValidEncoding "RETRIEVED_ENCODING_1" = 'True
  ValidEncoding "RETRIEVED_ENCODING_2" = 'True
  ...

Note two things: we are using a closed type family to avoid “monkey patching” of our encodings (something which could happen if we chose a typeclass as an abstraction mechanism, as someone could have defined an orphan instance) and we “plug” directly each retrieved encoding as a string literal. So far so good! The “magic” between the minimal API lies in this few lines of code:

type Enc k1 k2 = ByteString

--------------------------------------------------------------------------------
convert :: forall k1 k2. ( KnownSymbol k1
            , KnownSymbol k2
            , ValidEncoding k1 ~ 'True
            , ValidEncoding k2 ~ 'True
            )
         => Enc (k1 :: Symbol) (k2 :: Symbol) -- ^ Input text
         -> ByteString -- ^ Output text
convert input = I.convert (reifyEncoding (E @k1)) (reifyEncoding (E @k2)) input

First of all, we define a type synonym called Enc with 2 phantom types, which will be “filled” by our encodings. This unfortunately generate ambiguity and GHC reports this at compile time. We can help the ambiguity resolving by using AllowAmbiguousTypes, which basically (check this insigthful comment on Reddit for the full explanation. Thanks /u/int_index!)

The convert function has a bit of an intimidating, so let’s start from the typeclass constraints: what I’m saying here is that for any genering k1 and k2 I want those to:

Be an instance of KnownSymbol (think about a String at the type level) so I can reify them back at the value level with reifyEncoding (which is basically just symbolVal under the hood).
The type-level function ValidEncoding must yield True. Simply put, this will only be possibly with the Symbols I have defined an instance for in my closed type family. This is what will prevent you from passing an input a non-existing or mispelled encoding.

The input ByteString is well, just a ByteString in disguise. Remember Enc? That’s basically it, with the only twist of carrying these 2 extra types around, which I’m also saying they are of type Symbol, and this is where I need TypeInType, as Symbol would normally be a Kind.

This is the gist of it! But why am I able to invoke the convert function like this?

main = print $ convert @"UTF-8" @"LATIN1" "hello"

Here is TypeApplications in action! What we are doing is giving an hint to the compiler about which types are k1 and k2, as the only real input is the input Enc k1 k2. Other way to see this, is that we are saying “Hey GHC, Enc carries the utterly generic & ambiguous k1 and k2, I’m telling you explicitly what those 2 are”.

That’s pretty much it, really!

Usability: the unknown

Something I still have no clue is how practical to use this library will be, mostly because those encodings don’t exist at the value level. But not only that, is also very likely you have some existing code which is doing any kind of manipulation with the EncodingName, like comparing them, reading them from disk, from user input, etc. After all, they are only Strings. I suspect doing the equivalent with this library will be clunky, although I hope TypeInType can help in this regard.

You said API design?

The current API is the result of multiple iterations. Initially I was going to use a much simpler approach and simply have my TH code generate plain types so that our API could look like:

convert UTF8 LATIN1 "hello"

This was jolly good for the simplest encodings, but I quickly run into limitations in the allowed characters to be used for a type/type constructor. For example, - is not allowed, so I could have the choice of mangling UTF-8 into UTF_8, which would have been OK. But what about ISO_646.IRV:1991? I quickly realised this approach had 2 problems:

It would require the user to “lookup” the mangled name of Haddock as I couldn’t come up with a mnemonic rule for translating encoding into types
Converting original iconv code would have been a bit painful.

In my opinion, if you are releasing a library which is meant to simplify user life, you really want to aim for a low entry barrier!

Second attempt: Use an ancilliary `E` type

My second attempt is basically what I ended up releasing as the “GHC 7.x” API version. It does work, but when I first releases the library and I tweeted the link to GitHub, Anthony Cowley gave valuable suggestions on how to improve it, which made me realise that if I was going to use TypeApplications and therefore tap into GHC 8.x anyway, I had access to TypeInType also! That yielded a much nicer API.

Final attempt: Perfection?

Exploring the solution space brought me in a place where I feel I have come up with an API which strikes me as a good compromise. The only sour taste in my mouth is the use of AllowAmbiguousTypes, which I wasn’t able to avoid.

Support for GHC 7.x

Althought not as slick and elegant as the version which uses TypeInType and TypeApplications, we support older versions of GHC. This is how the API would look like if you try to compile iconv-typed with GHC 7.x:

{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where

import Codec.Text.IConv.Typed

main :: IO ()
main = print $ convert (E :: E "UTF-8") (E :: E "LATIN1") "hello"

Unexplored territory

As a result of my encoding fetching at compile time, there is something which is subtle and with an impact I cannot anticipate: If you try to run a program which is using iconv-typed on a machine which doesn’t support a particular encoding, your application won’t compile.

Differently put, if you are doomed to produce garbage as part of your encoding process because your underlying iconv library doesn’t support that particular encoding, the library will prevent you from even trying. This could certainly be terrifying or beautiful, depending from the point of view.

I guess we will have to wait and see!

Notes

[1] Using unsafePerformIO here is not necessary, using runIO in the Q monad would have worked as well, and avoid unsafe operations.

Paginators are Mealy Machines in disguise

2016-09-10T00:00:00Z

Paginators are Mealy Machines in disguise

Summary: At work I needed to stream some data out from a service which returned data in paginated chunks.
Using a very simple data type based on Mealy Machines worked surprisingly well.

One of the aspect I enjoy most of programming is when you have the chance of applying something you have learned in the real world. A couple of weeks ago I needed to create a tool to “garbage collect” old ECR images. Very simply put, ECR stands for EC2 Container Registry and is no more, no less, a private Docker Registry you can use as part of the impressive AWS (Amazon Web Service) toolkit.

ECR works by having a set of repositories, and for each of them you can upload up to 500 Docker images. If you exceed this limit, you will have either to delete some images to create free space, or contact the Amazon customer service to dump the upper limit up. My team uses ECR to store the images associated to the Haskell micro services we deploy using Elastic Beanstalk, and each time we do a deploy, we create and upload a new versioned image for each micro service, so it comes as no surprise that space in the repositories is going to finish sooner or later. So the problem is simple: “I have some images stored in the cloud and I need to delete them”.

Now, we could have used the quite impressive amazonka-ecr to solve our problem, but historically we have been using the aws-cli even before Amazonka was around, and sometimes is just super easy to slurp the output of a cli application and to decode that from JSON. Short story short, this is why we didn’t piggyback on this excellent third-party library, but that’s a bit of an OT. What’s important is that to solve our problem, we only need two commands from the aws-cli: list-images to retrieve all our images, and batch-delete-image to delete the ones which meet our criteria (in our case that would be deleting anything older than 2 months).

list-image doesn’t return the whole set of images, as this would be a beefy JSON! What it does instead, is to return a JSON packet with a Token identifying if we have more data to fetch. This is a standard pagination technique: other strategies would be, for example, to return the current page and the total number of pages, so that the user can advance forward or backward. We can easily model an ECRImage as a simple data type by following the specification on the list-image page:


-- Useful import to use in the rest of the post
import qualified Data.Aeson as JSON
import           Data.Aeson.TH
import           Data.Functor.Identity
import qualified Data.List as List
import           Data.String.Conv
import qualified Data.Text as T
import           Shelly


data ECRImage = ECRImage {
    imageDigest :: T.Text
  , imageTag    :: Maybe T.Text
  } deriving (Show, Eq)

deriveFromJSON defaultOptions { omitNothingFields = True } ''ECRImage

type NextToken = T.Text

data ECRListImages = ECRListImages {
    nextToken :: Maybe NextToken
  , imageIds  :: [ECRImage]
  }

deriveFromJSON defaultOptions { omitNothingFields = True } ''ECRListImages

The ECRListImages is an umbrella type we define to parse the raw JSON that AWS gives us, which will include the token AND the data fetched so far. When I approached this problem, I knew two things for sure:

I didn’t want to fetch the whole dataset into memory, but rather processing it in chunks
The problem itself was screaming “streaming!”

Although I could have simply written a recursive function which would fetch the current data and the token, process it, and recur in case we still had data to fetch, that stroke me as a poor solution. Not bacause it was intrinsically bad, but only because it felt a bit ad-hoc and didn’t compose very well. What if I wanted to step through the data “one chunk” at the time? What if I wanted to filter each chunk according to a predicate and retain only a subset of it? Sure, I could extend my function which a predicate to filter on, but that felt even more ad-hoc. So I took a step backward and wondered if I could come up with a super tiny abstraction to “step” through the data whilst retaining code reuse and composition. After some failing attempt, I came up with this small data structure, which I’m calling here ForwardPaginator to stress the fact we cannot iterate backward (yet), which is something I didn’t need to support anyway:

data ForwardPaginator m i o =
    PaginatorLeaf o
  | PaginatorFetch (Maybe i -> m (Maybe i, o, ForwardPaginator m i o))

A ForwardPaginator effectively models a tree of computations; we can have a leaf, meaning we have just started our machine and are at step zero, or a fetch step that, given an input i will produce a triple (newInput, output, paginator), doing (or not) some monadic effect in the process (thus the m wrapping). Due to the fact we could have exhausted our input, we encapsulate this possibility in a Maybe, which explains the presence of those Maybe i. We can even define what seems to be a legal Functor instance for it:

instance Functor m => Functor (ForwardPaginator m i) where
  fmap f (PaginatorLeaf  a) = PaginatorLeaf (f a)
  fmap f (PaginatorFetch g) = PaginatorFetch $ \nextToken -> (\(x,y,z) -> (x, f y, fmap f z)) <$> g nextToken

(Note to the reader: I have the intuition we should be able to define a Contravariant instance for our ForwardPaginator, but I’m not 100% sure as i appears both in positive and negative position. A Profunctor even? Please comment below or on Reddit if you think this is possible, I simply haven’t tried yet.)

If you squint hard, you will recognise that what we have in the PaginatorFetch step is essentially Mealy machine! This is not very surprising; Neil Mitchell used it in Shake only to discover his data structure was indeed a Mealy machine and his definition was almost verbatim to the one included in the machine package. What I find cool is that both me and Neil went through the same creative process; we modeled our solution using an abstraction we later found out be something already present in literature! I find both depressing and invigorating to discover that your clever idea is something someone thought about a long time before you! Oh well, at least that gave me the confidence I was on the right track. Incidentally, Ollie blogged in 2013 about FRP and Netwire, and guess what his Auto type looks like ;)

The reader might be thinking by now “Ok, but what can you do with this?” A Mealy machine is something very simple at its heart, and copying its definition from Wikipedia “…is a finite-state machine whose output values are determined both by its current state and the current inputs.[…]”. Simply put, we can use the current state and the current input(s) to decide where to go next (which could be advance the machine or stop altogether). To be completely honest with you, whilst writing this blog post, I was on the fence about considering what’s inside a PaginatorFetch a Mealy or a Moore machine, as it resemble a bit of both, but I eventually settle on the former, as effectively it’s the input (the “token”) which determines if we can step further or not.

Armed with our ForwardPaginator, let’s generalise it to our domain problem:

type NextToken = T.Text
type Repository = T.Text -- Will use this later
type ECRPaginator m a = ForwardPaginator m NextToken a

Now, let’s get the elephant out of the room and let me give you the (rather) uninteresting definition of our ECR paginator. I personally think that the semantic of the data structure and the operations we can perform of it are much more interesting, but I wanted to post a “real world” paginator just to prove this stuff can also pay the bills ;)

ecrListImagesPaginated :: Repository -> ECRPaginator Sh [ECRImage]
ecrListImagesPaginated repo = PaginatorFetch $ \_ -> do
  initialState <- run "aws" cmd
  case JSON.eitherDecode (toS initialState) of
    Left ex -> yieldZero ex
    Right (ECRListImages Nothing items) -> return (Nothing, items, PaginatorLeaf mempty)
    Right (ECRListImages mbToken items) -> return (mbToken, items, fetch)
  where
    yieldZero ex = do
      echo "aws ecr list-images failed to decode to valid JSON. Error was: "
      echo (toS . show $ ex)
      return (Nothing, mempty, PaginatorLeaf mempty)
    cmd = [ "ecr"
          , "list-images"
          , "--region"
          , "eu-west-1"
          , "--repository-name"
          , repo
          ]
    fetch :: ECRPaginator Sh [ECRImage]
    fetch = PaginatorFetch $ \token -> case token of
      Nothing -> return (Nothing, mempty, PaginatorLeaf mempty)
      Just t  -> do
        rawJson <- run "aws" cmd
        case JSON.eitherDecode (toS rawJson) of
          Left ex -> yieldZero ex
          Right (ECRListImages mbToken items) -> return (mbToken, items, fetch)

The caveat here is that we need to repeat the call to aws ecr twice as we need to call it at least once to acquire a valid token, so that externally we will be able to pass Nothing to our paginator to start it. I have chosen Sh as my monad of choice (from the shelly package), so that I can run bash commands easily.

Now the fun begins! What we can do with this paginator and more generally with a ForwardPaginator?

The first operation we can think of is effectively “stepping” the paginator, and implementing this function is not very hard:

next :: Monad m
     => ForwardPaginator m i a
     -> Maybe i
     -- ^ The initial input state to use.
     -> m (Maybe i, a, ForwardPaginator m i a)
next (PaginatorLeaf i) _ = return (Nothing, i, PaginatorLeaf i)
next (PaginatorFetch cont) tkn = cont tkn

Note how this function is completely generic in terms of m, i and a, apart from the Monad constraint, which means I can “step” arbitrary paginators – talk about code reuse! Another thing we might want is to be evil and fold all the data returned from the paginator into a giant collection. Not hard as well:

foldPaginator :: (Monad m, Monoid a) => ForwardPaginator m i a -> Maybe i -> a -> m a
foldPaginator (PaginatorLeaf items) _ acc = return (items `mappend` acc)
foldPaginator (PaginatorFetch cont) tkn acc = do
  (t', acc', res) <- cont tkn
  case res of
    leaf@(PaginatorLeaf _) -> foldPaginator leaf Nothing acc
    nextFetch              -> foldPaginator nextFetch t' acc'

Again, the only constrain is that our accumulator must be a Monoid, so that we can effectively concatenate all the results together. This would also effectively allow us to return all the ECRImage(s) at once, but beware that this would load them into memory – not recommended for your production services!

ecrListImages :: Repository -> IO [ECRImage]
ecrListImages repo = shelly $ foldPaginator (ecrListImagesPaginated repo) Nothing mempty

Something nice we can do with a ForwardPaginator is being able to find a particular element matching a predicate, short-circuiting our paginator as soon as we find a match, in order to avoid work and more generally expensive calls to external services:

findPaginator :: Monad m
              => ForwardPaginator m i [a]
              -> Maybe i
              -> (a -> Bool)
              -> m (Maybe a)
findPaginator (PaginatorLeaf v) _ prd = return $ List.find prd v
findPaginator (PaginatorFetch cont) tkn prd = do
  (t',items,cont') <- cont tkn
  case List.find prd items of
    Just i  -> return $ Just i
    Nothing -> findPaginator cont' t' prd

Pure or impure? Pick your monad!

To wrap up this blog post, I also wanted to show you how we are not bounded to use a “impure” monad for our ForwardPaginator: we could use something like Identity, State, Reader and so on and so forth. As an example, we will create a ForwardPaginator which can be built out of a pure function (full disclosure: fib, the classic, hehe) and everything will be pure to please the Haskell gods. Let’s start by defining both our pure function and the associated paginator:

fib :: Int -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

fibPaginator :: ForwardPaginator Identity Int Integer
fibPaginator = PaginatorFetch $ \continue -> case continue of
  Nothing -> return (Just 1, fib 0, fibPaginator)
  Just i  -> return (Just $ i + 1, fib i, fibPaginator)

The slight twist is that in case we have no initial input, we return the base case of the recursion, otherwise we iterate in an infinite fashion, exactly like the original fib function. Now we can easily step the paginator using next to get one result at time, or create a convenient take function to get values out our infinite stream:

takePaginator :: (Monad m) => ForwardPaginator m i a -> Maybe i -> Int -> m [a]
takePaginator (PaginatorLeaf v) _ _ = return [v]
takePaginator (PaginatorFetch cont) tkn n
  | n <= 0 = return []
  | otherwise = do
    (newToken, o, newPaginator) <- cont tkn
    (o :) <$> takePaginator newPaginator newToken (n - 1)

Using it is simple enough:

ghci> runIdentity $ takePaginator fibPaginator Nothing 10
[0,1,1,2,3,5,8,13,21,34]

As a bonus, as ForwardPaginator is a functor, we can easily map a function on the output values as we stream them:

ghci> runIdentity $ takePaginator ((*2) <$> fibPaginator) Nothing 10
[0,2,2,4,6,10,16,26,42,68]

Stepping backwards

A bit of a pet peeve the reader migth have with this paginator is that is lacks the ability to step backward, and that would certainly be a valid concern. I still think though that adding the ability to iterate backward should be possible provided that we create a function back which bound the paginator monad to be a MonadState (Maybe i), so that we can store the previous token and go backward and forward as we please. Something like this, for example:

prev :: MonadState (Maybe i) m
     => ForwardPaginator m i a
     -> Maybe i
     -- ^ The initial input state to use.
     -> m (Maybe i, Maybe a, ForwardPaginator m i a)

I think we need to yield a Maybe a as output in case we want to step backward but we are already at the first “page”: in that case, we should yield no result. Maybe, if the readers are interested, I could explore this possibility in a subsequent blog post, which should effectively give us a Paginator worth its name, to be used in each scenario which requires bidirectional pagination.

Conclusions

The ideas presented here are very simple but at the same time quite effective; They allowed me to solve my original problem in a nice compact way. Using findPaginator and the Functor instance I was able to first stop as soon as the current result set contained values I was interested in, and I was able to “zoom” only on pieces of the ECRImage data structure to extract things like the ImageDigest. So, next time you need to implement some form of pagination, remember you have the arsenal of Mealy and Moore Machines at your disposal: it’s not surprising they are called stream transducers!

How I deploy Haskell Code

2015-11-03T00:00:00Z

How I deploy Haskell Code

Summary: I have recently switched to build my apps using an intermediate
Docker container and then simply drop the executable on the target machine.
This has worked remarkably well.

Deploying Haskell code seems to be a pretty hot topic nowadays. Chatting with people at the Haskell Exchange last October made clear everyone has his own approach to put Haskell code into production. At work an approach I used and worked decently was to use Ansible to build my project to an EC2 development machine, then dump an AMI (Amazon Machine Image) and reuse it across different environments. This had the advantage of making provisioning and rollback easy (at the end of the day you only need deploy a new AMI via the AWS’ EC2 API), but has the big snag of being quite slow if your development machine is a tiny instance or similar (which is typically the case for such kind of environments), as you need to perform a cabal/stack install remotely on the server.

Since switching to stack as my project builder/manager I have adopted a different approach which uses a mixture of old and new Unix tools and - although quite simple - it’s effective. It’s important to notice that this might not work for you if you want a technique which works on ALL the different Linux distros; this technique exploits FPComplete’s stack-build Docker image, which is based, to the best of my knowledge, on Ubuntu/Debian. Said that, I have been able to produce executables which worked on CentOS7 out of the box.

I should also add that the following techniques might be completely moot on Linux environments, where you should be using stack’s builtin docker feature to build your binaries. But being on Mac OS X, and considering the quirks of boot2docker, I was forced to find another solution. This is what I do these days:

I use a Dockerfile for the build phase, called Build.plan
I use Build.plan to provision a stack-linux executable which will act as my local stack but will target the Linux environment
I use stack-linux to install my project, mounting my local $HOME/.stack and $PWD/.stack-work in the container in order to cache builds and produce valid Unix executables.
I use upx to compress the output executables to the bare minimum
I upload the binaries & the files listed in the data-files section of my cabal manifest on S3 in a folder called myproject:version (you can use any persistent key-value store)
I use aws s3 sync (you can use rsync if not targeting the AWS platform) to update my development machine with the newly provided binaries & config files
I crack on with the rest of the deployment (NOTE: I still need to dump an AMI as the image needs to be used in a cluster, so YMMV)

Let’s break down the points in more detail.

Create stack-linux

The Build.plan looks like this:

FROM fpco/stack-build:lts-3.10

ADD .  /var/www/myproject
WORKDIR /var/www/myproject

CMD ["stack"]

I usually tend to invoke docker to tag this image to be my “builder”, like so:

docker build -t myproject-builder -f Build.plan .

Now “creating” stack-linux is as easy as writing the following bash script:

#!/usr/bin/env bash

# You might not need the following.
$({ boot2docker shellinit; } 2>/dev/null)

docker run --rm \
       -v $HOME/path/to/my/project/myproject-dist:/root/.local \
       -v $HOME/path/to/my/project/.stack-work:/var/www/myproject/.stack-work \
       -v $HOME/.stack:/root/.stack \
       myproject-builder:latest stack $@

The advantage here is that we are still writing in the host filesystem, but stack correctly installs the libraries in a separate folder:

➜  ~  ls /Users/adinapoli/work/myproject/.stack-work/install
x86_64-linux    x86_64-osx

Building/installing the project

At this point we are ready to call:

stack-linux install

And let it run for a while, depending on how many dependencies your projects has. At the end, you should have some linux binaries in the myproject-dist folder (have a look at the bash script we created for stack-linux). The good news is that future builds will read from your local .stack-work and will be much faster.

Compressing with UPX

If all went well we should have a bunch of linux binaries in your myproject-dist which are already usable on their own. I decided to go a step further (I work remotely and I live in an area with sub-par internet connection) and compress the executables, to minimise the upload time towards S3. upx is a great tool that “just works”: Use it on your linux binaries and watch the size shrink down! For my work project, which is a medium Haskell app composed of roughly 13K lines of code I was able to get the final size down to ~9MB. Not bad!

Uploading binaries & data files

Finally we can tie the knot and upload on S3. I tend to use the shelly library as my go-to tool for this kind of glue code:

release :: T.Text -> IO ()
release vr = shelly $ escaping False $ do
  -- If we are trying to release a version older than the current MyProject,
  -- we need to checkout the relevant tag.
  currentVersion <- liftIO extractCabalVersion
  when (currentVersion >= vr) $ do
    echo " * Older MyProject version required, checking out relevant git tag..."
    run_ "git" ["checkout", vr]

  let deployDir = "/var/www/myproject"
  run_ "./build.sh" [] -- build.sh just calls docker build as I have showed you.
  -- Find project specific files and upload them as well.
  (_, shareDir) <- T.breakOn "." . T.init <$> run "scripts/stack-linux" ["path", "--local-install-root"]
  dataFiles <- findDataFiles shareDir
  echo " * Compressing executable(s)..."
  let myExes = ["myexe1", "myexe2"] -- list here all the binaries you want to upload
  forM_ myExes $ \exe ->
    run_ "upx" ["myproject-dist/bin/" <> exe]

  echo " * Transferring compressed files to S3..."

  let releaseS3Prefix = "s3://my-s3-bucket/" <> vr
  run_ "aws" ["s3", "sync", dataFiles, releaseS3Prefix <> deployDir <> "/" <> dataFiles]

  forM_ myExes $ \exe -> do
    run_ "aws" ["s3", "cp", "myproject-dist/bin/" <> exe, releaseS3Prefix <> "/usr/bin/"]

  echo " * Done!"
  where
    findDataFiles shareDir = T.init <$>
      run "ls" ["-d"
               , shareDir <> "share/*/*"
               , "|"
               , "grep"
               , "myproject-" <> vr]

We essentially did the steps I already explained, with this twist:

We searched within .stack-work to find any file listed in the data-files section of the cabal manifest and we copied them over on S3. This is because, in my specific case, I had configuration files my exe needed to run. Again, YMMV!

At this point your binaries (and config files) are on S3, properly versioned (I have used my project version here). Now rolling back it’s just a matter of transferring a couple of files over!

For completeness, this is an excerpt of a section of my Ansible scripts, which copies the files as we discussed:

- name: Install MyProject
  remote_user: service-runner
  sudo: no
  shell: aws s3 sync s3://my-s3-bucket/{{myproject_version}}/usr/bin/ /usr/local/bin/ &&
         aws s3 sync s3://my-s3-bucket/{{myproject_version}}/var/www/myproject/ /var/www/project/

Easy!

Caveats and Elephants in the room

As said, this technique is by no means universal; chances are it might not suit you for various reasons.
It doesn’t aim to provide static executables; as you know this is possible (and not difficult at all) up to a point.
You might (memory here does not help me) need to install whichever C library your executable depends upon. For example at least one of my projects depends from libpq, so I had to yum install that on my target machine. You don’t need to worry about that when building though, thanks to the fact that stack-build provides you with all you need out of the box (did I mention how great is this?)

Conclusions

In this ocean full of DSLs, orchestrators and whatnot, I find this method simple and with these benefits:

Easy versioning & rollbacks
Small executables (You could potentially store them as binary blobs on a K-V store like Redis)
Native binaries, which entails:
- No container overhead (even if minimal)
- Stability (on CentOS7 my experience with Docker was not the best, but this is for another post)
- No need for a private registry
Cached builds (I pay the compilation time only on what’s really changed)

Releasing the threads-supervisor library

2015-02-13T00:00:00Z

Releasing the threads-supervisor library

I’m happy to announce the first release of threads-supervisor, a small library I have extracted from some code I wrote at work (thanks to Iris Connect for allowing me to release it). The library itself does only one thing: it allows you to fork an IO computation in a supervised fashion, restarting it in case of failure. In a sense, the library it’s similar in spirit to Erlang’s OTP approach to process supervision and supervision trees. At the moment, we support only one restart strategy, Erlang’s OneForOne, which basically means “please always restart this thread”. Of course, threads-supervisor is not as feature complete as the OTP counterpart, nor it aims to be.

Why not use `distributed-process`, `immortal`, `async`, `slave-threads`, `yet-another-library`?

The aim of this small paragraph is not to convince you that my library the best in town (it’s not!), but more to justify my thought process behind deciding to write it. When I looked at distributed-process, it was clear that it was offering exactly this kind of supervision and much, much more. The problem is the library is certainly geared towards Cloud Haskell and the idea of the distributed closures, therefore if you want to use it, you have to buy the full package. What I wanted, instead, was a simple library, with minimal dependencies, which could be used as a replacement of forkIO, with minimal fuss.

immortal is a very nice library indeed, but I also wanted built-in event logging with opt-in subscription, as well as the possibility of compose my supervisors into a nice supervision tree.

The same sort of reasoning can be generalised; the available library in the ecosystem where close enough to what I wanted but not exactly what I wanted. Therefore, I decided it was just simpler to whip up my small abstraction on top of the concurrency primitives.

Using the library

Extensive documentation can be found reading the tutorial, but I’m going to report here the relevant passages.

Use threads-supervisor if you want the “poor-man’s Erlang supervisors”. threads-supervisor is an IO-based library with minimal dependencies which does only one thing: It provides you a ‘Supervisor’ entity you can use to monitor your forked computations. If one of the managed threads dies, you can decide if and how to restart it. This gives you:

Protection against silent exceptions which might terminate your workers.
A simple but powerful way of structure your program into a supervision tree, where the leaves are the worker threads, and the nodes can be other supervisors being monitored.
A disaster recovery mechanism.

Who worked with Haskell’s concurrency primitives will be surely familiar with the forkIO function, which allow us to fork an IO computation in a separate green thread. forkIO is great, but is also very low level, and has a couple of subtleties, as you can read from this passage in the documentation:

The newly created thread has an exception handler that discards the exceptions
`BlockedIndefinitelyOnMVar`,`BlockedIndefinitelyOnSTM`, and `ThreadKilled`,
and passes all other exceptions to the uncaught exception handler.

To mitigate this, we have a couple of libraries available, for example async and slave-threads.

But what about if I do not want to take explicit action, but instead specifying upfront how to react to disaster, and leave the library work out the details? This is what this library aims to do.

In this example, let’s create four different threads:

job1 :: IO ()
job1 = do
  threadDelay 5000000
  fail "Dead"

This job will die after five seconds.

job2 :: ThreadId -> IO ()
job2 tid = do
  threadDelay 3000000
  killThread tid

With this other job instead, we wait three seconds, and then kill a target thread, generating an asynchronous exception.

job3 :: IO ()
job3 = do
  threadDelay 5000000
  error "Oh boy, I'm good as dead"

This guy is very similar to the first one, except for the fact error is used instead of fail.

job4 :: IO ()
job4 = threadDelay 7000000

job4 is what we wish for all our real-world functions: smooth sailing. These jobs represent a significant pool of our everyday computations in the IO monad.

Creating a SupervisorSpec

A ‘SupervisorSpec’ simply holds the state of our supervision, and can be safely shared between supervisors. Under the hood, both the SupervisorSpec and the Supervisor share the same structure; in fact, they are just type synonyms:

type SupervisorSpec = Supervisor_ Uninitialised
type Supervisor = Supervisor_ Initialised

The important difference though, is that the SupervisorSpec does not imply the creation of an asynchronous thread, which the latter does. To keep separated the initialisation of the data structure from the logic of supervising, we use GADTs and type synonyms to force you create a spec first. Creating a spec it just a matter of calling newSupervisorSpec.

Creating a Supervisor

Creating a ‘Supervisor’ from a ‘SupervisionSpec’, is as simple as calling newSupervisor. Immediately after doing so, a new thread will be started, monitoring any subsequent IO actions submitted to it.

Supervising some threads

Let’s wrap everything together into a full blown example:

main :: IO ()
main = bracketOnError (do
  supSpec <- newSupervisorSpec

  sup1 <- newSupervisor supSpec
  sup2 <- newSupervisor supSpec

  sup1 `monitor` sup2

  _ <- forkSupervised sup2 OneForOne job3

  j1 <- forkSupervised sup1 OneForOne job1
  _ <- forkSupervised sup1 OneForOne (job2 j1)
  _ <- forkSupervised sup1 OneForOne job4
  _ <- forkIO (go (eventStream sup1))
  return sup1) shutdownSupervisor (\_ -> threadDelay 10000000000)
  where
   go eS = do
     newE <- atomically $ readTBQueue eS
     print newE
     go eS

What we have done was spawning our supervisors out from a spec, and using our swiss knife forkSupervised to spawn four supervised IO computations. As you can see, if we partially apply forkSupervised, its type resemble forkIO’s one; this is by design, as we want to keep this API as IO-friendly as possible.

In the very same example, we also create another supervisor (from the same spec, but you can create a separate one as well) and we ask the first supervisor to monitor the second one.

Each Supervisor gives you access the its internal event stream, retrievable, under the form of a TBQueue, by calling eventStream.

If you run this program, hopefully you should see on stdout something like this:

ChildBorn ThreadId 62 2015-02-13 11:51:15.293882 UTC
ChildBorn ThreadId 63 2015-02-13 11:51:15.293897 UTC
ChildBorn ThreadId 64 2015-02-13 11:51:15.293904 UTC
ChildDied ThreadId 61 (MonitoredSupervision ThreadId 61) 2015-02-13 11:51:15.293941 UTC
ChildBorn ThreadId 65 2015-02-13 11:51:15.294014 UTC
ChildFinished ThreadId 64 2015-02-13 11:51:18.294797 UTC
ChildDied ThreadId 63 thread killed 2015-02-13 11:51:18.294909 UTC
ChildDied ThreadId 62 Oh boy, I'm good as dead 2015-02-13 11:51:20.294861 UTC
ChildRestarted ThreadId 62 ThreadId 68 OneForOne 2015-02-13 11:51:20.294861 UTC
ChildFinished ThreadId 65 2015-02-13 11:51:22.296089 UTC
ChildDied ThreadId 68 Oh boy, I'm good as dead 2015-02-13 11:51:25.296189 UTC
ChildRestarted ThreadId 68 ThreadId 69 OneForOne 2015-02-13 11:51:25.296189 UTC
ChildDied ThreadId 69 Oh boy, I'm good as dead 2015-02-13 11:51:30.297464 UTC
ChildRestarted ThreadId 69 ThreadId 70 OneForOne 2015-02-13 11:51:30.297464 UTC
ChildDied ThreadId 70 Oh boy, I'm good as dead 2015-02-13 11:51:35.298123 UTC
ChildRestarted ThreadId 70 ThreadId 71 OneForOne 2015-02-13 11:51:35.298123 UTC

Conclusions

I hope that you are now convinced that this library can be of some use to you! It’s on Hackage, play with it!

Alfredo

Announcing "snaplet-purescript"

2015-01-25T00:00:00Z

Today I’m open sourcing and releasing for public consumption snaplet-purescript, a simple snaplet which brings automatic recompilation of PureScript projects into your Snap application.

It was heavily inspired to snaplet-fay, so some credits to Adam Bergmark and Chris Done are due!

The Github project ships with an example app to help get you started.

As always, feedback, bug reports and PRs are welcome!

Alfredo

Convince me to use Rust

2014-12-17T00:00:00Z

Convince me to use Rust

TL;TR I really like Rust, but I feel overwhelmed by its syntax and complexity, so I hope the Rust community will sell me the language, convincing me to learn it.

As we approach the new year, it seems quite natural to follow this list of things I should aim to do in 2015. One of them is learning a new language, which I feel it’s quite an important one. After having a love-hate relationship with high and low level languages, I’m not in that period of my life where I would like to learn a new system language. I’ve done a bit of C/C++ back in university days, so I know what lies in store, even considering the latest available standards (i.e. c11, C++11, C++14 and so on). I would like to learn something different, and I went back and forth in deciding whether I should learn Go or Rust (I know, potentially I should learn both). The real question is: Which of the two?

About me

First of all, let me say I’m a Haskell hacker. Not only am I a OSS contributor, but I’m lucky enough to get paid to code in Haskell during my everyday job. So I am a firm believer that a strong type system and a strong compiler really matters in delivering robust software. So, in a sense, it seems that the natural continuation in my skills development would be to learn Rust, which gets a lot of things right (but I’m sure you didn’t need me to discover this): immutability by default, a sophisticated borrow checker, ADTs, pattern matching, (limited form of) monads and even HKT (only emulated for now, hopefully fully supported in Rust 1.0).

So what?

So what? You might be thinking, which would be a perfectly reasonable feeling. If you feel Rust is the “next big thing”, you should learn it as your next system language, right? That’s true, but I want to play the devil’s advocate here, and I really hope the Rust community will jump on me and completely sell me the language, so I will happily hack in it during 2015 (together with Haskell, of course!).

Zen

If you are not familiar with the Zen philosophy, I will definitely suggest you to dig more into it. Zen can be a lot of things, a religion, a way of living, and a way of coding, too. What I really appreciate of the Zen culture is that things like “beauty” and “simplicity” are something which should be researched in everything we do (also “perfection”, but that sounds more like utopia!). Leaving apart other kind of Zen manifesto, I’m a strong believer that beauty in code leads to simplicity, which leads to beauty, which leads to simplicity, which..

Let’s take Haskell, for example. Don’t you find this is utterly beautiful?

fmap :: (Functor f) => (a -> b) -> f a -> f b

If you are unfamiliar with Haskell it doesn’t matter, all you need to know is that this is the function signature for the fmap function, which can be specialised for lists, to name one data structure. What I like here is that:

It’s simple, with a minimal syntax
It’s completely generic, where a, b and f are completely parametric
I can see upfront the “contract” of this function: f must be a functor

Something which puts me off from learning Rust is the “eye bleeding” (perhaps I’m a bit exaggerating here!) I have when I look at certain snippets of Rust code. I feel overwhelmed by the variety of operators you can use to denote your variables, the macro applications, the trait implementations and much more. These are just two examples I copied opening two random Rust projects on Github:

/// An abstraction to receive `NetworkStream`s.
pub trait NetworkAcceptor: Acceptor + Clone + Send {
    /// Closes the Acceptor, so no more incoming connections will be handled.
    fn close(&mut self) -> IoResult<()>;
}

/// An abstraction over streams that a Server can utilize.
pub trait NetworkStream: Stream + Any + StreamClone + Send {
    /// Get the remote address of the underlying connection.
    fn peer_name(&mut self) -> IoResult;
}

#[doc(hidden)]
pub trait StreamClone {
    fn clone_box(&self) -> Box;
}

impl StreamClone for T {
    #[inline]
    fn clone_box(&self) -> Box {
        box self.clone()
    }
}

impl<'c> Cursor<'c> { /// Create a new cursor instance pub fn new(line: &'c mut Line, offset: uint) -> Cursor<'c> { let mut cursor = Cursor { offset: offset, line: line, }; // check that the current offset is longer than the length of the line let offset = cursor.get_offset(); let line_length = cursor.get_line().len(); if offset > line_length { cursor.set_offset(line_length); } cursor } }

This is obviously very much subjective, but I find Rust code very dense; someone could say the same of Haskell, I suppose, so I’m not sure how much my point stands. But when I look at Rust, I basically see C++ in disguise (angular brackets everywhere, very dense and complicated). Having programmed in C++ before, I was really hoping, in a sense, to get a breath of fresh air.

On the contrary, Go seems to be exactly the opposite: I basically call it “C with concurrency”. But it has a strange allure, probably deriving from its simplicity: I like simple things. On the other hand, it goes against my outlook on software development, as is not very “safe”, as far as the compiler and the type checker is concerned. But the visual overhead is much less.

Conclusions

If you make it till here, you guess this is not a flame post. It’s just my personal ruminations in what makes me feel reluctant in spending my time learning Rust. I really hope people will help me see through the syntax and appreciate the true sprit of the language.

Alfredo Di Napoli's Tech Blog

How to backup and store your GPG private key (semi) securely

First attempt

The final solution

Why?

The scripts

Reconstructing the key

(Update 2018-12-23) A possible improvement

Other alternatives

About MonadBaseControl

The simplest Haskell Priority Queue implementation I know of

Deploying Haskell on AWS Lambda

Background

(Optional step) Reduce the size of the output binary

Use Docker to generate a Linux executable

Create a JS Shim (the Main Handler)

Bundle everything into a zip file (including 3rd party deps)

Upload to AWS Lambda

Pitfalls

Credits

iconv-typed: An experiment in API design and type safety

A taste of the API

Type families to the rescue!

Usability: the unknown

You said API design?

Second attempt: Use an ancilliary E type

Final attempt: Perfection?

Support for GHC 7.x

Unexplored territory

Notes

Paginators are Mealy Machines in disguise

Paginators are Mealy Machines in disguise

Pure or impure? Pick your monad!

Stepping backwards

Conclusions

How I deploy Haskell Code

How I deploy Haskell Code

Create stack-linux

Building/installing the project

Compressing with UPX

Uploading binaries & data files

Caveats and Elephants in the room

Conclusions

Releasing the threads-supervisor library

Releasing the threads-supervisor library

Why not use distributed-process, immortal, async, slave-threads, yet-another-library?

Using the library

Creating a SupervisorSpec

Creating a Supervisor

Supervising some threads

Conclusions

Announcing "snaplet-purescript"

Convince me to use Rust

Convince me to use Rust

About me

So what?

Zen

Conclusions

Second attempt: Use an ancilliary `E` type

Why not use `distributed-process`, `immortal`, `async`, `slave-threads`, `yet-another-library`?