2012-04-22 bindings, git, github

Writing Haskell bindings to the GitHub API

GitHub is a well-known service providing a web API. Using it, it is for instance possible to create a new Git repository. In this post, the creation of bindings for the Haskell programming language is explained. This post is suitable for Haskell beginners.

With this blog post, I hope to provide an example of various Haskell libraries tangled together to produce a small but useful program. Although this text is targeted a Haskell beginners, it is not an introduction to Haskell nor a complete tutorial. It is more like a walkthrough to help beginners feel more confident about their knowledge. ("Oh yeah, this and this, I know about them. I didn't know that one but still, I could have written this on my own!")

Getting the code

The code demonstrated here is available on GitHub and on Hackage. It is better to open the GitHub link in another tab or download the code to your computer to read this blog post as I don't copy everything verbatim.

You can clone the repository from GitHub with

> git clone git://github.com/noteed/hgithub.git

Or install the cabal package from Hackage with

> cabal update && cabal install hgithub

The hgithub command-line interface

When starting a new program, I like to use cmdargs to create a nice command-line interface. By providing help messages associated to a few commands, it makes it easy later to figure out which features are already implemented and how to use them. I keep forgeting very quickly what I was doing with a project, so being able to invoke it later with --help is really great. So let's see what hgithub --help gives us:

> hgithub --help
hgithub 0.0.0 Copyright (c) 2012 Vo Minh Thu.

hgithub [COMMAND] ... [OPTIONS]

Common flags:
  -? --help                Display help message
  -V --version             Print version information

hgithub list-repositories [OPTIONS]
  List repositories you have access to.

hgithub create-repository [OPTIONS]
  Create a new repository.

     --name=NAME           Repository name.
     --description=STRING  Repository description.

hgithub offers two commands: list-repositories and create-repository. The former takes no options while the later needs a name and a description for the new repository.

Without further ado, we can start with the main script bin/hgithub.hs

import System.Console.CmdArgs.Implicit

main :: IO ()
main = (processCmd =<<) $ cmdArgs $
  modes [cmdRepositoryList , cmdRepositoryCreate]
  &= summary versionString
  &= program "hgithub"

processCmd is a function we will define by ourselves. It will take a value (of type Cmd, also defined by ourselves) representing the chosen sub-command (possibly with its options) and then do whatever it wants (i.e. run arbitrary IO () code):

processCmd :: Cmd -> IO ()

cmdArgs, modes, &=, summary, and program are functions defined by cmdargs. You can look them up in the documentation, e.g. via the index. The important bits are the variables defining our sub-commands:

cmdRepositoryList , cmdRepositoryCreate :: Cmd

We have two "modes", corresponding to the two sub-commands we want to provide. To define those two sub-commands we need first to define the Cmd data type. Because we have more than one sub-command, we use a variant type:

data Cmd =
    CmdRepositoryList
  | CmdRepositoryCreate
  { cmdRepositoryCreateName :: String
  , cmdRepositoryCreateDescription :: Maybe String
  }
  deriving (Data, Typeable)

The Cmd data type captures precisely what we want from the user: when the user wants to create a new repository, the name is mandatory while the description is optional (as the Maybe indicates); if she wants to list her existing repositories, no additional data is needed. The deriving clause is necessary so that cmdargs can do its magic.

Now to define processCmd, we can simply pattern-match agains the Cmd constructors and call into the hgithub library, passing whatever options the user has provided. So we have something that looks like:

processCmd CmdRepositoryList{..} = do
  ...
  repos <- repositoryList
  ...

The {..} syntax is provided by the RecordWildCards extension. In the case of CmdRepositoryList, it is useless as there is no fields (but I like to write it nonetheless); for CmdRepositoryCreate, it will define for us the cmdRepositoryCreateName and cmdRepositoryCreateDescription variables, available in the right-hand side of the equation.

The hgithub library

The command-line tool described above is really just a little wrapper around the interesting part: the library implementing the bindings to the GitHub API. The library is called hgithub, just like the command-line program. As it is simple and small, everything is defined in a single module: Network.GitHub.

There are roughly two parts to the Network.GitHub module:

  1. Make a GET or a POST request to api.github.com.
  2. Parse the resulting JSON response into some Haskell data structure.

The first part use the http-enumerator package and is quite straightforward: go read the code, there are only two small functions for the GET case, and two other similar functions for the POST case. apiGetRequest and apiPostRequest are used to create the request data. apiGet and apiPost use that request and actually send them to api.github.com. They also call the JSON conversion code.

The second part, parsing the JSON to Haskell data, is less obvious but actually even simpler, once you have understood it. So let's go through it.

Once we have issued a correct GET or POST request to GitHub, http-enumerator will give us back a bytestring. That bytestring is supposed to contain JSON data so we parse it using the aeson and attoparsec packages. attoparsec provides a parse function, which takes a parser and a bytestring. The parser is given by the aeson package and is called json. So parsing the response is

case parse json responseBody of

The result, when the parsing is succesful, is a Value data structure (aeson's JSON representation). But this is not enough: we don't want to return some arbitrary JSON, we want to return a more specific data structure. To do that, we call fromJSON and the code becomes:

case parse json responseBody of
    Done _ value -> do
      case fromJSON value of
        Success value' -> do
          return $ Just value'
        _ -> return Nothing
    _ -> return Nothing

fromJSON has the following type:

fromJSON :: FromJSON a => Value -> Result a

It means it can turn any Value to some result of type a, provided a is an instance of the FromJSON class. Also, it is polymorphic in its return type: the actual implementation will be chosen based on what you want to parse the JSON to, in our case a Repository. As the binding is pretty incomplete, the repository reprensentation is short:

data Repository = Repository
  { repositoryName :: Text
  , repositoryDescription :: Text
  }

To make fromJSON work for us, we have to instanciate Repository for the FromJSON class:

instance FromJSON Repository where
  parseJSON (Object v) = Repository <$>
    v .: "name" <*>
    v .: "description"
  parseJSON _ = mzero

The instance declaration is read this way: parsing something else than a JSON dictionary (called Object in aeson) results immediately in a failure (mzero). Otherwise, try to obtain a name, and a description, apply the Repository constructor to them, and return the result.

To play around with JSON parsing, maybe this snippet of code can help you:

{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import Data.HashMap.Strict
import Network.GitHub

jsonRepo :: Value
jsonRepo = Object $ fromList
  [ ("name", String "New repo")
  , ("description", String "My awesome new repository")
  ]

main :: IO ()
main = print (fromJSON jsonRepo :: Result Repository)

You can save it in play.hs and run it with runghc play.hs.

Wrapping up

That's it. Command-line arguments parsing and handling, calling into api.github.com via HTTP, and parsing the returned JSON into a nice Haskell data structure, everything is there. As an exercise, you can browse the GitHub API documentation and choose an API call not yet implemented in hgithub. Add the missing code, together with a new cmdargs sub-command. Modify accordingly the README.md file and finally create a pull request on GitHub. If you would prefer bindings to Linode instead, feel free to contribute to hlinode. The code is similar (but even simpler).

Note: if you're interested in a command-line tool to ease work with github, checkout octogit and hub.

Update: someone on reddit pointed out the existence of more complete bindings also on Hackage.

submit to reddit