xi or the fast and furious Haskell

25 Mar 2014

Don’t be confused by the title of this post - I will tell you about my experience in the development of xmpp client xi. The first version of this client was written in Haskell in the shortest time (for me, of cource), and this fact provides the second emotional part of title =)

First of all - xi was inspired by ii irc client. It explains the all of its features, design and main idea. In short - after this post I’m a huge fan of this tool and philosophy.

Second - xi was written in Haskell. I will not explain why =)

Now let’s take a look inside. We can see a lot of dependencies of course - xi uses pontarius xmpp for the XMPP interaction. But there is an interesting hidden trick - we must use this library from the github directly yet, because of an unpleasant bug. This can be done by the cabal sandbox add-source command:

git clone http://github.com/pontarius/pontarius-xmpp .deps/pontarius-xmpp
cabal sandbox init
cabal sandbox add-source .deps/pontarius-xmpp

Also, if we want to support gmail.com, we must use some extra TLS options:

import Network.TLS
import Network.TLS.Extra

sess <- session
    server
      (Just (\_ -> ( [plain user Nothing password]), Nothing))
    def { sessionStreamConfiguration = def
            { tlsParams = defaultParamsClient
                { pConnectVersion = TLS10
                , pAllowedVersions = [TLS10, TLS11, TLS12]
                , pCiphers = ciphersuite_medium } } }

Other important feature is the listening of the file, which will contain a user input. We will use a fsnotify library for these purposes. Michael Snoyman shared the implementation of this feature (he always flying to help, when SO question contains the haskell and conduit keywords =). The main idea is the monitoring file changes by fsnotify, and save the current position in file. There are several disadvanteges with this approach - e.g. we can’t handle a file truncation. But for our purposes we can use files, that will never be truncated.

sourceFileForever :: MonadResource m => FilePath -> Source m ByteString
sourceFileForever fp' = bracketP startManager stopManager $ \manager -> do
    fp <- liftIO $ canonicalizePath $ decodeString fp'
    baton <- liftIO newEmptyMVar
    liftIO $ watchDir manager (directory fp) (const True) $ \event -> void $ tryIO $ do
        fpE <- canonicalizePath $
            case event of
                Added x _ -> x
                Modified x _ -> x
                Removed x _ -> x
        when (fpE == fp) $ putMVar baton ()
    consumedRef <- liftIO $ newIORef 0
    loop baton consumedRef
  where
    loop :: MonadResource m => MVar () -> IORef Integer -> Source m ByteString
    loop baton consumedRef = forever $ do
        consumed <- liftIO $ readIORef consumedRef
        sourceFileRange fp' (Just consumed) Nothing $= CL.iterM counter
        liftIO $ takeMVar baton
      where
        counter bs = liftIO $ modifyIORef consumedRef (+ fromIntegral (S.length bs))

xi uses the following algorithm:

establish connection
get a user roster and convert it to the internal representation (the ContactList type)
create an appropriate directory structure (a separate directory for each contact with in/out)
for the each input file start a separate thread to monitoring the user input
start a thread for monitoring the incoming messages

Little bit about client details. A Session and ContactList objects have been shared through the Reader monad. For the parsing of configuration file yaml-config library has been used. Also, there is an ability to see an entire xmpp data flow - this requires only the debug mode in configuration.

Client source code hosted on the github, but you should keep in mind, that it’s more prototype, than a completed project. So if you want to improve something - welcome =)

Django and PostgreSQL schemas

08 Mar 2014

There are a some cases, when we prefer to use a PostgreSQL schemas for our purposes. The reasons for this can be different, but how it can be done?

There are a lot of discussion about the implementation of PostgreSQL schemas in Django (for example one, two). And I want to describe several caveats.

First of all - you shouldn’t use the options key to choice a schema like this:

    DATABASES['default']['OPTIONS'] = {
        'options': '-c search_path=schema'
    }

It can be working, until you don’t use pgbouncer. This option hasn’t supported because of the connection pool - when you close a connection with search_path, it will be returned into the pool, and can be reused with the out of date search_path.

So what we gonna do? The only choice is to use connection_create signal:

# schema.py
def set_search_path(sender, **kwargs):
    from django.conf import settings

    conn = kwargs.get('connection')
    if conn is not None:
        cursor = conn.cursor()
        cursor.execute("SET search_path={}".format(
            settings.SEARCH_PATH,
        ))

# ?.py
from django.db.backends.signals import connection_created
from schema import set_search_path

connection_created.connect(set_search_path)

But where should we place this code? In general case if we want to handle the migrations, the only place is a settings file (a model.py isn’t suitable for this, when we want to distribute an application models and third-party models over different schemas). And to avoid circular dependencies, we should use three (OMG!) configuration files - default.py (main configuration), local.py/staging.py/production.py (depends on the server), migration.py (used to set a search path). The last configuration is used only for the migration purposes:

python manage.py migrate app --settings=project.migration

For the normal usage we can connect set_search_path function to the connection_create signal in the root urls.py and avoid the migration.py configuration of course.

But that’s not all - there is one more trouble with the different schemas, if you using TransactionTestCase for testing. Sometimes you can see an error at the tests tear_down:

Error: Database test_store couldn't be flushed. 
DETAIL:  Table "some_table" references "some_other_table".

To avoid this error you can define available_apps field, which must contain the minimum of apps required for testing:

class SomeTests(TransactionTestCase):
    available_apps = ('one_app', 'another_app')

So we finished. I hope I have described the all possibe issues =)

A lot of Unix philosophy with the ii

16 Feb 2014

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

It seems like there is no good IRC plugin for vim - I found none of them at least. But there is a brilliant geeky alternative - ii. Here is a quote from its site:

ii is a minimalist FIFO and filesystem-based IRC client. It creates an irc directory tree with server, channel and nick name directories. In every directory a FIFO in file and a normal out file is created.

The in file is used to communicate with the servers and the out files contain the server messages. For every channel and every nick name there are related in and out files created. This allows IRC communication from command line and adheres to the Unix philosophy.

To configure the IRC workflow (join, identify, read/write) you can use these posts. Here I want to help you avoid several caveats.

First of all, there is the final result you’ll get:

I’ll use tmux + multitail + vim.

First we need to connect to an IRC server (freenode.net in my case):

#!/bin/sh
ii -s irc.freenode.net -n nick -f "UserName" &
sleep 10
echo "identify password"> ~/irc/irc.freenode.net/nickserv/in
echo "/j #channel1"> ~/irc/irc.freenode.net/in
echo "/j #channel2"> ~/irc/irc.freenode.net/in
echo "/j #channel3"> ~/irc/irc.freenode.net/in

Next step is to create handy console-based environment to use it. A small bash script can be used for this purpose (I’ve split the implementation):

#!/bin/sh
# tmux_open.sh
tmux -2 new-session -s session_name "ii_open.sh $1"

#!/bin/sh
# tmux_open.sh
tmux splitw -v -p 30 'vim'
multitail -cS ii ~/irc/irc.freenode.net/#$1/out

We should use -2 option for tmux to force 256 colors, and -cS ii option for multitail to ii syntax highlighting. After all this we can execute ./tmux_open.sh channel command to open a two pane, that will contain IRC channel log and vim ifself.

To type in IRC session we will use vim with the following mappings:

map <leader>ii :.w >> ~/irc/irc.freenode.net/in<cr>dd
map <leader>i1 :.w >> ~/irc/irc.freenode.net/\#channel1/in<cr>dd
map <leader>i2 :.w >> ~/irc/irc.freenode.net/\#channel2/in<cr>dd
map <leader>i3 :.w >> ~/irc/irc.freenode.net/\#channel3/in<cr>dd

Also, we can hide tmux status line globally (I prefer a vim status line) to achieve an ideal:

# .tmux.conf
set-option -g status off

or hide it only with the vim

; .vimrc
autocmd VimEnter,VimLeave * silent !tmux set status

What about sharding in the Django?

13 Feb 2014

Some time ago I was faced with the need to implement the sharding in Django 1.6 . It was an attempt to make step beyond the standart features of this framework and I felt the resistance of Django =) I’ll talk a bit about this challenge and its results.

Let’s start with definitions. Wikipedia says that:

A database shard is a horizontal partition in a database. Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

We wanted split our database entities by the different PostgreSQL schemas and used something like this for the id generation. The sharding model was clear, but how to implement it in the Django application?

My solution of this problem was a custom database backend, that contains a custom sql compilers. Maybe it was a dirty hack, but I hope it wasn’t =)

To create your own custom database backend, you can copy structure from one of the existing backends from django.db.backends (postgresql_psycopg2 for our case) and override DatabaseOperations:

# operations.py
from django.db.backends.postgresql_psycopg2.operations import *

class CustomDatabaseOperations(DatabaseOperations):
    compiler_module = "path.to.the.compiler.module"

# base.py
from django.db.backends.postgresql_psycopg2.base import *
from operations import CustomDatabaseOperations

class CustomDatabaseWrapper(DatabaseWrapper):
    def __init__(self, *args, **kwargs):
        super(CustomDatabaseWrapper, self).__init__(*args, **kwargs)

        self.ops = CustomDatabaseOperations(self)

DatabaseWrapper = CustomDatabaseWrapper

A custom sql compilers will be adding a corresponding schema name into the sql request based on the entity id:

# compilers.py

class CustomSQLCompiler(SQLCompiler):
    def as_sql(self):
        table = self.query.get_meta().db_table
        if table not in self.sharded_tables:
            return super(CustomSQLCompiler, self).as_sql()
        else:
            sql, params = super(CustomSQLCompiler, self).as_sql()

            """ The first item of the params tuple must be entity id
            """
            schema = self.get_shard_name(params[0])

            old = '"{}"'.format(table)
            new = '{}."{}"'.format(schema, table)
            sql = sql.replace(old, new)

        return sql, params


SQLCompiler = CustomSQLCompiler

That’s all! Oh, okay, that’s not all =) Now you must create a custom QuerySet (with the two overrided methods - get & create) to provide a correct sharded id for an all entities.

But there is one problem - migrations. You can’t migrate correctly your sharded models and it’s sad. To avoid this we inctoruced the some more complex database configuration dictionary. We used the special method, that converted this complex config into the standard with a lot of database connections - a one for each shard. All connections have the search_path option. In the settings.py we must take in account a type of action:

# settings.py

def get_shard_settings(shard_migrate=False, shard_sync=False):
    """ Not an all apps must be sharded.
    """
    installed_apps = ('some_sharded_app1', 'some_sharder_app2',)
    databases = DB_CONFIGURATOR(DB_CONFIG, shard_migrate=shard_migrate, shard_sync=shard_sync)
    return installed_apps, databases

""" We must separate
    - normal usage,
    - sharded models synchronization
    - sharded models migration 
"""
if sys.argv[-1] == 'shard_migrate':
    del sys.argv[-1]
    INSTALLED_APPS, DATABASES = get_shard_settings(shard_migrate=True)

elif sys.argv[-1] == 'shard_sync':
    del sys.argv[-1]
    INSTALLED_APPS, DATABASES = get_shard_settings(shard_sync=True)

else:
    DATABASES = DB_CONFIGURATOR(DB_CONFIG)

Now we can manage sharded migrations by --database options. For convenience you can write a fab script of course.

And one more and last caveat - you must create SOUTH_DATABASE_ADAPTERS variable, that will be pointing to original postgres adapter south.db.postgresql_psycopg2 - south can’t create a correct migration otherwise.

Older Newer