Blog

My thoughts and experiments.

© 2018. Dmitry Dolgov All rights reserved.

Another strange thing - an endless paginator

A little bit about my new program-frankenstein. Now it is an endless Paginator for Django. It sounds crazy, isn’t?

Standart Django Paginator uses the count() function for the verification of page number. It is converted to the SELECT COUNT(*) ... query, of course. But as I was explained (I really don’t know, maybe it’s just an exaggeration - you can post your opinion in the commentaries), this is not a such lightweight query, as we want for the paginated rest api, because of the MVCC in PostgreSQL.

How we can avoid the extra COUNT(*) query? Don’t panic, we can trick the Django.

First of all we need to disable count parameter from the api response. We can introduce a custom pagination serializer:

# serializers.py
class CustomPaginationSerializer(BasePaginationSerializer):
next = NextPageField(source='*')
previous = PreviousPageField(source='*')

# api.py
class SomeListView(generics.ListAPIView):
model = SomeModel
serializer_class = SomeSerializerClass
pagination_serializer_class = CustomPaginationSerializer

The next our move - disable the page number verification. This can be done by the custom paginator class:

class CustomPaginator(Paginator):
""" HACK: To avoid unneseccary `SELECT COUNT(*) ...`
paginator has an infinity page number and a count of elements.
"""

def _get_num_pages(self):
"""
Returns the total number of pages.
"""

return float('inf')

num_pages = property(_get_num_pages)

def _get_count(self):
"""
Returns the total number of objects, across all pages.
"""

return float('inf')

count = property(_get_count)

def _get_page(self, *args, **kwargs):
return CustomPage(*args, **kwargs)


class SomeListView(generics.ListAPIView):
model = SomeModel
serializer_class = SomeSerializerClass
pagination_serializer_class = CustomPaginationSerializer
paginator_class = CustomPaginator

Oh, goodness - we introduced the infinity number of the pages and the infinity number of elements… But we want also the correct next/prev links, so one more detail:

class CustomPage(Page):
def has_next(self):
""" HACK: Select object_list + 1 element
to verify next page existense.
"""

low = self.object_list.query.__dict__['low_mark']
high = self.object_list.query.__dict__['high_mark']
self.object_list.query.clear_limits()
self.object_list.query.set_limits(low=low, high=high+1)

try:
# len is used only for small portions of data (one page)
if len(self.object_list) <= self.paginator.per_page:
return False

return True
finally:
# restore initial object_list count
self.object_list = self.object_list[:(high-low)]

This solution looks very questionable, but exciting for me. If you have something to say about this - welcome! =)

xi or the fast and furious Haskell

Don’t be confused by the title of this post - I will tell you about my experience in the development of xmpp client xi. The first version of this client was written in Haskell in the shortest time (for me, of cource), and this fact provides the second emotional part of title =)

First of all - xi was inspired by ii irc client. It explains the all of its features, design and main idea. In short - after this post I’m a huge fan of this tool and philosophy.

Second - xi was written in Haskell. I will not explain why =)

Now let’s take a look inside. We can see a lot of dependencies of course - xi uses pontarius xmpp for the XMPP interaction. But there is an interesting hidden trick - we must use this library from the github directly yet, because of an unpleasant bug. This can be done by the cabal sandbox add-source command:

git clone http://github.com/pontarius/pontarius-xmpp .deps/pontarius-xmpp
cabal sandbox init
cabal sandbox add-source .deps/pontarius-xmpp

Also, if we want to support gmail.com, we must use some extra TLS options:

import Network.TLS
import Network.TLS.Extra

sess <- session
server
(Just (\_ -> ( [plain user Nothing password]), Nothing))
def { sessionStreamConfiguration = def
{ tlsParams = defaultParamsClient
{ pConnectVersion = TLS10
, pAllowedVersions = [TLS10, TLS11, TLS12]
, pCiphers = ciphersuite_medium } } }

Other important feature is the listening of the file, which will contain a user input. We will use a fsnotify library for these purposes. Michael Snoyman shared the implementation of this feature (he always flying to help, when SO question contains the haskell and conduit keywords =). The main idea is the monitoring file changes by fsnotify, and save the current position in file. There are several disadvanteges with this approach - e.g. we can’t handle a file truncation. But for our purposes we can use files, that will never be truncated.

sourceFileForever :: MonadResource m => FilePath -> Source m ByteString
sourceFileForever fp' = bracketP startManager stopManager $ \manager -> do
fp <- liftIO $ canonicalizePath $ decodeString fp'
baton <- liftIO newEmptyMVar
liftIO $ watchDir manager (directory fp) (const True) $ \event -> void $ tryIO $ do
fpE <- canonicalizePath $
case event of
Added x _ -> x
Modified x _ -> x
Removed x _ -> x
when (fpE == fp) $ putMVar baton ()
consumedRef <- liftIO $ newIORef 0
loop baton consumedRef
where
loop :: MonadResource m => MVar () -> IORef Integer -> Source m ByteString
loop baton consumedRef = forever $ do
consumed <- liftIO $ readIORef consumedRef
sourceFileRange fp' (Just consumed) Nothing $= CL.iterM counter
liftIO $ takeMVar baton
where
counter bs = liftIO $ modifyIORef consumedRef (+ fromIntegral (S.length bs))

xi uses the following algorithm:

  • establish connection
  • get a user roster and convert it to the internal representation (the ContactList type)
  • create an appropriate directory structure (a separate directory for each contact with in/out)
  • for the each input file start a separate thread to monitoring the user input
  • start a thread for monitoring the incoming messages

Little bit about client details. A Session and ContactList objects have been shared through the Reader monad. For the parsing of configuration file yaml-config library has been used. Also, there is an ability to see an entire xmpp data flow - this requires only the debug mode in configuration.

Client source code hosted on the github, but you should keep in mind, that it’s more prototype, than a completed project. So if you want to improve something - welcome =)

Django and PostgreSQL schemas

There are a some cases, when we prefer to use a PostgreSQL schemas for our purposes. The reasons for this can be different, but how it can be done?

There are a lot of discussion about the implementation of PostgreSQL schemas in Django (for example one, two). And I want to describe several caveats.

First of all - you shouldn’t use the options key to choice a schema like this:

    DATABASES['default']['OPTIONS'] = {
'options': '-c search_path=schema'
}

It can be working, until you don’t use pgbouncer. This option hasn’t supported because of the connection pool - when you close a connection with search_path, it will be returned into the pool, and can be reused with the out of date search_path.

So what we gonna do? The only choice is to use connection_create signal:

# schema.py
def set_search_path(sender, **kwargs):
from django.conf import settings

conn = kwargs.get('connection')
if conn is not None:
cursor = conn.cursor()
cursor.execute("SET search_path={}".format(
settings.SEARCH_PATH,
))

# ?.py
from django.db.backends.signals import connection_created
from schema import set_search_path

connection_created.connect(set_search_path)

But where should we place this code? In general case if we want to handle the migrations, the only place is a settings file (a model.py isn’t suitable for this, when we want to distribute an application models and third-party models over different schemas). And to avoid circular dependencies, we should use three (OMG!) configuration files - default.py (main configuration), local.py/staging.py/production.py (depends on the server), migration.py (used to set a search path). The last configuration is used only for the migration purposes:

python manage.py migrate app --settings=project.migration

For the normal usage we can connect set_search_path function to the connection_create signal in the root urls.py and avoid the migration.py configuration of course.

But that’s not all - there is one more trouble with the different schemas, if you using TransactionTestCase for testing. Sometimes you can see an error at the tests tear_down:

Error: Database test_store couldn't be flushed. 
DETAIL: Table "some_table" references "some_other_table".

To avoid this error you can define available_apps field, which must contain the minimum of apps required for testing:

class SomeTests(TransactionTestCase):
available_apps = ('one_app', 'another_app')

So we finished. I hope I have described the all possibe issues =)

A lot of Unix philosophy with the ii

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

It seems like there is no good IRC plugin for vim - I found none of them at least. But there is a brilliant geeky alternative - ii. Here is a quote from its site:

ii is a minimalist FIFO and filesystem-based IRC client. It creates an irc directory tree with server, channel and nick name directories. In every directory a FIFO in file and a normal out file is created.

The in file is used to communicate with the servers and the out files contain the server messages. For every channel and every nick name there are related in and out files created. This allows IRC communication from command line and adheres to the Unix philosophy.

To configure the IRC workflow (join, identify, read/write) you can use these posts. Here I want to help you avoid several caveats.

First of all, there is the final result you’ll get:

I’ll use tmux + multitail + vim.

First we need to connect to an IRC server (freenode.net in my case):

#!/bin/sh
ii -s irc.freenode.net -n nick -f "UserName" &
sleep 10
echo "identify password"> ~/irc/irc.freenode.net/nickserv/in
echo "/j #channel1"> ~/irc/irc.freenode.net/in
echo "/j #channel2"> ~/irc/irc.freenode.net/in
echo "/j #channel3"> ~/irc/irc.freenode.net/in

Next step is to create handy console-based environment to use it. A small bash script can be used for this purpose (I’ve split the implementation):

#!/bin/sh
# tmux_open.sh
tmux -2 new-session -s session_name "ii_open.sh $1"

#!/bin/sh
# tmux_open.sh
tmux splitw -v -p 30 'vim'
multitail -cS ii ~/irc/irc.freenode.net/#$1/out

We should use -2 option for tmux to force 256 colors, and -cS ii option for multitail to ii syntax highlighting. After all this we can execute ./tmux_open.sh channel command to open a two pane, that will contain IRC channel log and vim ifself.

To type in IRC session we will use vim with the following mappings:

map <leader>ii :.w >> ~/irc/irc.freenode.net/in<cr>dd
map <leader>i1 :.w >> ~/irc/irc.freenode.net/\#channel1/in<cr>dd
map <leader>i2 :.w >> ~/irc/irc.freenode.net/\#channel2/in<cr>dd
map <leader>i3 :.w >> ~/irc/irc.freenode.net/\#channel3/in<cr>dd

Also, we can hide tmux status line globally (I prefer a vim status line) to achieve an ideal:

# .tmux.conf
set-option -g status off

or hide it only with the vim

; .vimrc
autocmd VimEnter,VimLeave * silent !tmux set status