Engineering Fantasy

What Heroku Should Do

2018-02-03T21:41:00+06:00

DISCLAIMER: THESE OPINIONS ARE MY OWN AND DO NOT REFLECT THE OPINIONS OF MY FORMER OR PRESENT EMPLOYERS

Table of Contents

Heroku Connect
- Heterogeneous Syncing Types
- Pay per Row
Underpowered Dynos
Deploy Anywhere (but...)
Heroku is an Ecosystem
What Heroku is Doing Right

I've been using Heroku for a while not at my workplace and so have many of our developers. Heroku is actually a really good idea, and thats why services like convox.io and platform.sh keep popping up. The notion, that you can have your servers managed by people, and not need a full time devops team is a total godsend. Iterations are faster and easier, and it allows our teams to focus on application logic. It is not without its set of caveats though. The market place also makes for quick integrations with other service providers, which makes adding things like cache and search very easy.

I myself was pretty skeptical at the beginning because although for my personal projects, my perception of Heroku was that it was pretty expensive, and not something that you'd want to use at a real company. By the time I joined, Heroku was already being used for several applications. I mean, at the time, I just thought it was a nice skin on top of AWS (which is an accurate description even to present day).

However, the real reason that the company was using Heroku was because of a product called Heroku Connect (HC).

Heroku Connect

Heroku Connect is basically a syncing tool. It basically syncs data from SalesForce (SF) to a Heroku operated Postgresql database. The data in SalesForce (SF) was coming from the company's own servers. In other words, we were paying Heroku to gain access to out own data. This initially perplexed me, but I soon found out is that the old platform had issues scaling data access out, so we decided to use HC and Postgresql as a temporary solution to create a couple of applications that we could use to test new products and new markets. SF does have an API, but that has API limits for reads and writes, and is generally used for bulk updates and reads. Furthermore, it gets quite slow if you're building an API on top of it. However, we soon ran into a couple of issues very quickly. HC in conjunction with Postgresql was seen as a really good alternative, because it allowed you to bypass data access limit by essentially creating a copy of your business data in Postgresql.

However, we soon ran into a couple of issues really quickly.

Heterogeneous Syncing Types

In SF, data is stored within things called SalesForce Objects (SO). These SOs are basically tables, and through HC are mapped to Postgresql tables. There are two ways in which syncs occur. One is that there is a set interval after which the data from SF is synced to Postgresql and the changes with Postgresql are synced to SalesForce. The other way is that data is synced in real time. What is perplexing is that you never know which SOs are eligible for real time updates and which ones will only be updated at intervals. You can always see which objects are being updated at intervals, and which ones are being updated in real time, but you never know why a particular SO gets real time updates while another SO gets updates at intervals. Sometimes, SOs that had real time updates will start getting interval updates instead. This is because you might hit what is called the "Streaming API limit". What this limit is, and how it is reached remains a mystery to me as well as our SF developers.

If you're building a service that gives you quick and unlimited access to SF, then you better deliver on that promise otherwise, people will become disillusioned by your service as a whole and so despite have some awesome product, your reputation is going to suffer as a natural consequence.

Pay per Row

If you want to use HC, you have to pay an amount for a particular number of rows. So basically, you are paying for the data in SF, you are paying for the data in Postgresql, and on top of that you have to pay per row to use HC. This is maddening because it makes so little sense. If you're building a syncing service, you should be charging for throughput, and not for the amount of data. This also makes sense because the higher the throughput, the higher the costs for Heroku as a business, so it'd make sense for both Heroku and the companies buying the HC service.

But the real consequence of this pricing scheme is that if you want to launch a geo-distributed application built on top of HC, you need to pay for rows in each region because Heroku's postgres does not have multi-master replication, only master-slave replication. So, if you had to pay for regions in the US, EU and China, then you'd need to pay for the same number of rows three times.

Underpowered Dynos

Dynos are basic units of computation for Heroku. In a generic sense, they are a virtual machine.

The lowest tier dyno is really quite pathetic. Its just 512 MBs of RAM and 1 GHz, and for the equivalent price, you could get a few t2.mirco instances from AWS and I'm pretty sure that you could get the same from Azure and Google Cloud Platform. Although you get auto-scaling as well as maintenance on those VMs, the price seems quite steep for the kind of product that they're offering.

Deploy Anywhere (but...)

You can theoretically deploy on pretty much any continent using Heroku. However, there are two Herokus. There is the public cloud, and then there is the private cloud. If this is news to you, then what you've been using all this time is Heroku's public cloud. The public cloud only has a few regions, whereas the private cloud has many regions from Virginia to Tokyo.

However, the two clouds have a completely different payment scheme. To just be able to use the private cloud you need to pay a certain amount every month, just for the privilege of being able to use it. You're not getting any dynos or anything. Just to be able to use the private cloud, you need to pay an amount upfront, and its not cheap either. Furthermore, the dynos in the private cloud are more expensive (yes, they have a little more compute power), which makes geo-distribution of your applications quite expensive.

Heroku is an Ecosystem

In Heroku the app is the center of the universe. You attach add-ons like redis or Postgresql to your application. You can indeed share add-ons between applications, but later on, it becomes difficult to monitor what add-on is owned by what app. It is far better if you have an interface, where you have the add-ons separate to the applications, so that you can share resources between applications more easily.

What Heroku is Doing Right

Heroku has probably the best service you can get. If something breaks or something goes wrong with your application, their support mechanism is truly exceptional, because they will ping up an on-call engineer to fix your production problems. Usually, these issues are resolved very quickly.

Secondly, even if you consider that Heroku is expensive, having your own devops team is even more expensive, and further still if you need them to be on-call. This is difficult for small companies.

Furthermore, Heroku also has some of the best account executives you can get. They are really quite helpful, and truly fight for you with upper management to give you the best deals possible if you buy bulk plans. I've only worked with two so far, but they really are fighting for you in the company, and what to see you succeed.

GraphQL in the Python World

2017-06-23T16:53:00+06:00

Table of Contents

Getting Started
- Here Comes Python
- Playing with Graphene
Basic Types
- A Few Opinions
Integration with ORMs
- Django
- SQLAlchemy and Others
Integration with Web Frameworks
Final Thoughts

Not too long ago, Facebook unveiled Graphql. This was thought up to be a flexible data access layer designed to work as a simple interface on top of heterogeneous data sources. Not long after facebook had introduced it, github announced that the latest version of its API (version 4) would indeed be a GraphQL API. Many other companies have followed including Meteor, Pinterest, Shopify to name a few.

At its heart, Graphql is designed to replace REST APIs. It also has a proper specification so that developers aren't constantly fighting over the proper definition of Graphql. It has documentation inbuilt, with its powerful graphiql query application that allows people to quickly check out the results of a particular query, and get code completion on the fly, as they are writing the query.

Getting Started

Python's main competitor in the ring is a nifty little library called graphene. But before we get into graphene, we need to talk about some of the fundamental parts of Graphql.

The model is an object defined by a Graphql schema.
The schema defines models and their attribute.
Each attribute of a model has its own resolver. This is a function that is responsible for getting the data for that particular model attribute.
A query is basically what you use to get or set data inside Graphql
Mutations are particular queries that allow you to change the data for a given model or a set of models.
Graphiql is the UI that you use to interact with a Graphql server. The IDE-like thing that you see in the above screenshot is the Graphiql interface.

Here Comes Python

Python's attempt at Graphql can be broken down into three parts. Graphene, ORM specific graphene add-ons and graphene libraries that add support to web frameworks such as flask or django. Graphene allows you to define your models, define the attributes in your model, and also parses queries to get the data for those particular models.

The ORM specific graphene libraries basically do the job of turning your SQLAlchemy or Django models into Graphql objects with the help of graphene. Some libraries will will also come in with views that fit in with your web framework like graphene-django.

Playing with Graphene

Installation is quite easy, and graphene supports both python versions 2 and 3.

pip install graphene

Once you've installed it, import it, write a simple model, and the execute it like so:

import graphene  # 1

class Query(graphene.ObjectType):  # 2

    hello = graphene.String(description='A typical hello world')  # 3

    def resolve_hello(self, args, context, info):  # 4
        return 'World'

schema = graphene.Schema(query=Query)  # 5

query = '''
    query {
      hello
    }
'''  # 6
result = schema.execute(query) # 7

In 1 we are importing the package. Note that in 2 we are creating the query class. All query classes inherit from the graphene.ObjectType class. You can also embed queries within queries (this is great when you want to modularize your applications). Furthermore, even complex objects will also inherit from graphene.ObjectType.

This class basically holds all the models. Right now, there is only one model, and that is hello, which happens to be a simple string. Yes, I know its not terribly exciting, but bare with me.

In 3 we add an object to the schema. In this case it is a simple string. In 4 we basically write the resolver for that particular model in the schema, we will get into the details of the function signature soon enough.

In 5 we basically declare the schema, and we tell it that the query class object is Query (a query object can be made up of multiple query objects, and we will get into that later).

Then, we declare the simplest query possible, in 6 and in 7 we finally execute it. This is what the result looks like:

In [6]: result = schema.execute(query)

In [7]: type(result)
Out[7]: graphql.execution.base.ExecutionResult

In [8]: result.data
Out[8]: OrderedDict([('hello', 'world')])

The result comes with three main attributes.

In [12]: result.data
Out[12]: OrderedDict([('hello', 'world')])

In [13]: result.errors

In [14]: result.invalid
Out[14]: False

data gives us the return data that we want. errors points any errors that we might [1] have and finally, invalid basically tells us that if the query itself is valid.

Basic Types

So, now that we've gotten the hang of the fundamental parts of the Graphql system, its time that we explored the other data types, and started playing around with more complex objects. There is already plenty of good documentation on this stuff, so I won't do a copy of the tutorials that are already available. The first thing to take a look at is the type reference.

There are quite a few types to go over, but in basic terms, there is a dichotomy between scalars and non-scalars. Scalars are the base object types, they are things like strings, integers, booleans etc. Non-Scalars on the other hand are more complex objects, often they are containers of scalars, like lists, graphene.List. They can also be interfaces, that other graphene ObjectTypes inherit from and of course, there are mutation types, that basically make changes on top of the data.

In simple terms, there are a lot of types, but what it comes down to are basically types that hold information, types that contain basic scalar types, types that other types inherit from in a concrete sense and types that other objects inherit from in an abstract sense. Pretty much what you'd expect from a simple object oriented system.

A Few Opinions

However, what I have come to notice is that it is almost always better to avoid using abstract types, and just go straight for interfaces. Most of the time, you just don't need abstract objects. In the case of Graphql, composition is almost always better than inheritance. Probably the only place that it makes sense to use the AbstractType class is for Query types that you are later on going to use in your final Query object.

Make sure to keep your names simple, and avoid using graphene.NotNull, because you can always set required=True for scalar types anyway. This is similar to what you get in other serializer libraries, so people looking at the codebase can see commonalities between graphene and other libraries.

Integration with ORMs

If you think about it, at its core, graphene is basically a combination of a serialization library and a Graphql query interpreter, so it would make a lot of sense for it to work with ORMs and it does so quite well. At the time of writing this, there is support for django, sqlalchemy as well as google app engine. Most are really quite simple to integrate. Most of the time, you just have to set the meta model attribute of an object type and you're done.

Django

from django.db import models

from graphene_django import DjangoObjectType

class Account(models.Model):

    birth_date = models.DateField(db_column='personbirthdate', null=True)
    created_date = models.DateTimeField(blank=True, null=True, db_column='createddate')
    is_paying_customer = models.NullBooleanField(db_column='iscustomer')
    country = models.CharField(db_column='country', max_length=3, null=True, blank=True)
    customer_number = models.CharField( db_column='cnumber', unique=True, max_length=255,
        blank=True, null=True, editable=False)

    class Meta:
        managed = False
        db_table ='accountinfo'


class AccountType(DjangoObjectType):
    class Meta:
        model = Account

You can now use AccountType just like you would any other object type. But most of the time, you don't even have to manually add things to the query objects. If you have django-filter installed, you can do something pretty simple, and add graphene.Node to your list of interfaces for a give type. This allows you to set some variables in the metaclass that allows for simple filtering and also allows your type to be easily integrated into the query using DjangoConnectedFilterField. Here's an example of what that would look like from the perspective of the Account model:

class AccountNode(DjangoObjectType):
    class Meta:
        model = Account
        interfaces = (graphene.Node, )
        filter_fields = [
            'customer_number',
            'is_paying_customer',
        ]

And this is what it would look like in the query object:

from graphene_django.filter import DjangoConnectedFilterField

class AccountQuery(graphene.AbstractType):
# Gives you a particular account
account = graphene.Node.Field(AccountNode)

# Gives you all the accounts available
all_accounts = DjangoFilterConnectionField(AccountNode, order_by='-customer_number')

This really simplifies your whole query process, and focuses on fat database models and not fat graphene objects. Note, that I use graphene.AbstractType for this AccountQuery objects because I'm basically going to use it as a mixin for my final Query Object. Here is what that looks like:

from .queries import AccountQuery


class Query(AccountQuery, graphene.ObjectType):
    pass

schema = graphene.Schema(query=Query)

This way, your main query object won't be cluttered, and you can keep adding your queries as you please. Make sure to add graphene.ObjectType as the last argument for inheritance, otherwise your final Query object will not be a concrete class.

SQLAlchemy and Others

Other ORMs are also similarly supported. With SQLAlchemy, you just use SQLAlchemyObjectType instead of DjangoObjectType, you can also add the Node interface to it, and have a similar querying experience that you do with django using SQLAlchemyConnectionField. The same is true for google app engine. Peewee support is still in the works.

Integration with Web Frameworks

As you might expect, graphene supports a few popular frameworks, with more support coming. Right now there is awesome support for flask and django. Here is what a flask Graphql application looks like:

from flask import Flask
from flask_graphql import GraphQLView

from models import db_session
from schema import schema, Department

app = Flask(__name__)
app.debug = True

app.add_url_rule(
    '/graphql',
    view_func=GraphQLView.as_view(
        'graphql',
        schema=schema,
        graphiql=True
    )
)

@app.teardown_appcontext
def shutdown_session(exception=None):
    db_session.remove()

if __name__ == '__main__':
    app.run()

I would of course suggest that you have separate endpoints for graphiql and for your HTTP requests. You can also subclass GraphQLView if you want to add some custom authentication.

Final Thoughts

I really like Graphql, I've recently launched a production application based on top of Graphql, and graphene-django has really helped me replace my current REST API with a Graphql API (or service or whatever it is that you want to call it). The delcarative syntax makes it quite easy to use, and so there really is not much of a learning curve, especially if you've used serializer libraries like Marshmallow or DRF's serializers.

Currently, graphene is basically a python port of the Graphql nodejs package at version 0.6. I'm pretty sure there will be version bumps in the future as the library has a healthy following. It also supports all kinds of clients including the apollo client.

[1]	You will always get back errors in a separate sub-object in a query if you have any. You will not get any errors in HTTP status codes.

Only Real-Time will Remain Standing

2015-11-21T12:14:00+06:00

The Internet has changed tremendously over the past few years. What started off as a platform to share research has become the hub of communication, commerce and entertainment. However, we are at the verge of the next stage in the evolution of web applications: the normalization of real-time applications.

This transition will not only see real-time web applications become ubiquitous but rather, and more fundamentally, become the de-facto standard for a web application. This eventual change is a result of a myriad of things, but mostly as a result of increased computing power, faster and cheaper storage as well as javascript becoming ridiculously more powerful.

There have been dramatic changes in technology in order for change to happen. With the advent of SSDs and multi-core processors as well as architectural shift towards distributed systems, we suddenly have a lot more processing power at our disposal. These changes did not happen solely because we wanted to build real-time applications but rather, they arose out of a need to scale our sites to monolithic proportions.

As such, consequences will be dire for languages, frameworks and implementations that do not have first-class support for creating asynchronous or concurrent applications. Many will claim that non-real time web applications will survuive, but they will survuive the way that static html pages (with little to no styling) do. In other words, they will become artifacts of a bygone era.

The underlying technology has become powerful enough to support this, but the rise of modern javascript libraries such as react, angular and others will only catalyze the eventual rise and dominance of such applications. RESTful web services are merely a precursor to full blown sites running mostly using websockets. In other words, we are in for a lot more single page applications.

Language-wise, golang and scala will become much more popular than they already are.[1] Languages like Python and Ruby will no longer remain in the positions they currently hold until and unless they get their acts together to support this new paradigm or find another use case entirely.[2] Although both Python and Ruby have their asynchronous frameworks and/or libraries, they are not powerful enough to face off against go, scala and nodejs as they do not have an implementation fast enough to compete with the likes of go, which has concurrency primitives and Scala which has many libraries for paralell code execution.

But languages alone will not face the consequences of this change; databases will need to evolve as well. One example of a database built solely for real-time applications is rethinkdb, which already has many adherents. Rethinkdb started off as a MySQL engine, but has evolved to become an independent database in and of itself, touting features like atomic changfeeds in its latest release.

So, when trying to master language, framework or database, make sure to take into consideration the eventual rise of real-time applications and do your best to avoid languages, frameworks and databases that do not support this new paradigm. Otherwise, the skills that you're building might eventually have very little value in the near future.

[1]	If you're asking why javascript isn't on that list, that's because its not possible for javascript to get more popular than it already is.

[2]	Python has already found a good niche in the realm of education and data-science, so I feel that it will still remain quite relevant or even as popular as it is now in the years ahead simply because data is blowing up in everyone's face.

Setting up Rethinkdb with Lunchy

2015-09-27T14:35:00+06:00

Table of Contents

Installation
Setup
Launching

Rethinkdb is a powerful new NoSQL database that comes with its one web-based admin console. However, its a bit awkward to set up. The installation is not hard when you have a one line homebrew installation or if you prefer to install it as an application, then you can simply use the use their disk image provided as a url download on their OSX installation page. However, the Rethinkdb command line tool is a bit awkward in the sense that it just creates a directory to store your data in the same directory that you launch it. This is a simple tutorial on how to set rethinkdb as a service so that it stores your data in the same place with a proper set of default configurations.

Installation

I would highly recommend using homebrew to install rethinkdb. Its a painless process and gives you full control as to how you use the rethinkdb CLI.

brew install rethinkdb

Now, we are going to install lunchy, which is a wrapper over launchctl, that basically starts and stops our services in a simple manner.[1] [2]

gem install lunchy

Setup

The first thing that we need to do is install the rethinkdb plist file in our LaunchAgents directory. We can do this by firstly creating a file called com.rethinkdb.server.plist and then running lunchy install on that file. My own configuration is shown below.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.rethinkdb.server</string>
  <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/rethinkdb</string>
        <string>-d</string>
        <string>/Users/quazinafiulislam/Library/RethinkDB/data</string>
    </array>
  <key>StandardOutPath</key>
    <string>/Users/quazinafiulislam/Library/Rethinkdb/out.log</string>
  <key>StandardErrorPath</key>
    <string>/Users/quazinafiulislam/Library/Rethinkdb/error.log</string>
  <key>RunAtLoad</key>
  <false/>
  <key>KeepAlive</key>
    <true/>
  <key>LowPriorityIO</key>
    <false/>
</dict>
</plist>

Steps

Create a file using the touch command, i.e. touch com.rethinkdb.server.plist, and just paste in the contents.
Replace quazinafiulislam with your username in the file.[3]
Create a new directory, ~/Library/Rethinkdb. This place will house all your rethinkdb stuff i.e. data, logs etc.
Create a new data directory using rethinkdb create -d ~/Library/Rethinkdb/data.
The install the file into your LaunchAgents directory using lunchy install com.rethinkdb.server.plist.
Once this is done, create two log files using the touch command to create the output log and the error log inside ~/Library/LaunchAgents named out.log and error.log respectively.

This is the express installation, you can of course customize the installation to your needs. For example you can add your own .conf file for rethinkdb.[4] You can then add it as a parameter to your program arguments by adding the following to your ProgramArguments inside of array.

<string>--config-file<string>
<string>/path/to/your/config/file<string>

Feel free to change the plist file to your liking. One other thing that you might want to change is the RunAtLoad key. I've set this to false, but you might want rethinkdb to start up when you log into your system.

Launching

Launching this is quite easy, all you need to do is tell lunchy that you want to start the rethinkdb service.

lunchy start rethinkdb

We can see that the standard output is printed to the out.log file inside of ~/Library/Rethinkdb.[5]

Stopping rethinkdb is also a breeze.

lunchy stop rethinkdb

[1]	The Wikipedia Page will give you a good idea of what both `launchd` and `launchctrl` are.

[2]	If you want to know the difference between a launch agent and a demon, take a look at this article.

[3]	Make sure that when you enter the path of a file or folder in a `.plist` file, you give the complete path without aliases.

[4]	Learn to configure rethinkdb from their configuration documentation page. I would strongly encourage you to create your own `.conf` file.

[5]	You can specify your port as another `ProgramArgument`.

Python 3 May Become Relevant Now

2015-08-03T03:30:00+06:00

Table of Contents

Type Hints
Async/Await
Summary

Python 3 has been mocked for a pretty long time for not having any big-ticket features. For the first four minor releases for Python 3 (3.1 to 3.4), this did not change. There are two features that are worth noting in Python 3.5. One being the type checker/hints.[1] The other being the async/await key words coming to Python 3.5.[2] Question is, how relevant?

Type Hints

This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and performance optimizations utilizing type information.

PEP 0484

Python now has a type checker. This does not mean that python is going to magically become a statically typed language. In other words, the type checker is merely a tool for a better development experience. For example, if your IDE/Editor knows what type you're working with, it can provide better code completion. Some IDEs actually provide rudimentary type checking.[3] However, this will standardize the way in which types are declared and that means this will raise the bar for all the python tools available at our disposal.

Think of this as the python equivalent to flow, which is a static type checker for javascript. The type checking will happen through stub files, a lot like DefinitelyTyped's Typescript stubs.[4]

Other than better tooling, this also means that a type induced runtime errors can be thwarted before they happen, which is something that makes developing large code bases with Python a lot easier. You will hear no end of war stories about how a function that sometimes returned None deep in the source brought about pyapocalypse. Having type problems deep within your source code, is by no means uncommon.

A friend of mine had to deal with this kind of problem once. The problem was with a function (buried deep in the source code as one of many decorators) that usually returned a list but under a certain circumstances, it returned None. This wasn't noted before because most of the time, the result would be checked in a simple if statement.

In other words bool(None) and bool([]) return the same thing, which is False. The problem was happening because another branch still thought that the returned value was a list and so tried to append to it. As you might expect, the traceback wasn't pretty.

I'm not saying that the person who originally wrote the code is a bad programmer. I'll leave that up to you. What I am saying is that python allows you to make such silly mistakes. This is especially problematic when you're working with really large code-bases.

However, while this is all good in terms of development tools, this does not mean a large boost in performance. PyPy's FAQ talks about this issue and addresses some of the key concerns. [5] In a nutshell, the performance will not increase simply because python's types are objects and do not correspond directly to how types are represented in binary. For example, python's int type isn't necessarily a 64-bit number, but has the ability to grow to whatever size you need it to grow.

... annotations are at the wrong level (e.g. a PEP 484 “int” corresponds to Python 3’s int type, which does not necessarily fits inside one machine word; even worse, an “int” annotation allows arbitrary int subclasses).

PyPy FAQ (Would type annotations help PyPy’s performance?)

Async/Await

This proposal makes coroutines a native Python language feature, and clearly separates them from generators. This removes generator/coroutine ambiguity, and makes it possible to reliably define coroutines without reliance on a specific library. This also enables linters and IDEs to improve static code analysis and refactoring.

PEP 0492

async and await are very good ideas. In fact, these two keywords have been an integral part of C# for quite some time now and make programing with tasks much easier. These two key words makes python feel a lot more abstract, whereas turning generators into coroutines (even through the use of a decorator) and then using them like generators feels rather clumsy. In simple terms, async creates a coroutine for you. For example, the following is a coroutine:

import asyncio
import datetime

@asyncio.coroutine
def display_date(loop):
    end_time = loop.time() + 5.0
    while True:
        print(datetime.datetime.now())
        if (loop.time() + 1.0) >= end_time:
            break
        yield from asyncio.sleep(1)

loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()

The above will turn into the following under PEP-492:

import asyncio
import datetime

async def display_date(loop):
    end_time = loop.time() + 5.0
    while True:
        print(datetime.datetime.now())
        if (loop.time() + 1.0) >= end_time:
            break
        await asyncio.sleep(1)

loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()

However, what is being introduced is syntactic sugar meaning that this does not change python on a fundamental level. For example, introducing these native constructs will not magically make python be able to compete with node (V8).

Summary

The new features being added are subtle but will go a long way into making Python 3 the abstract language that it needs to be as well as provide the tooling required to make it a potential choice for huge code-bases. However, this does not mean that Python has become a better choice over other languages such as golang for high performance applications. So, although these new features will definitely persuade people to move from Python 2 to Python 3 (at least for new projects), it does not necessarily mean that more and more people will leave Python seeking abstract yet high performance alternatives.

[1]	PEP-0484. This type checker is based on the mypy type checker created by one of the authors of PEP-0484, Jukka Lehtosalo.

[2]	PEP-0492. This is a pretty radical change since it means Python will get new key words.

[3]	PyCharm provides type checking through its skeletons. These are a lot like the stubs that PEP-0484 is proposing.

[4]	DefinitelyTyped is a collection for stub files for a language called TypeScript which is basically a typed superset of JavaScript. These stub files help IDE and editors in providing better code completion.

[5]	PyPy's FAQ talks about this issue, and its lack of effect on performance under "Would type annotations help PyPy’s performance?". Look under the "PEP-484 Type Hints" sub section.

Potential Pythonic Pitfalls

2015-05-11T15:42:00+06:00

Table of Contents

Not Knowing the Python Version
Obsessing Over One-Liners
Initializing a set the Wrong Way
Misunderstanding the GIL
Using Old Style Classes
Iterating the Wrong Way
Using Mutable Default Arguments
Takeaway
Update

Python is a very expressive language. It provides us with a large standard library and many builtins to get the job done quickly. However, many can get lost in the power that it provides, fail to make full use of the standard library, value one liners over clarity and misunderstand its basic constructs. This is a non-exhaustive list of a few of the pitfalls programmers new to Python fall into.

Not Knowing the Python Version

This is a recurring problem in StackOverflow questions. Many write perfectly working code for one version but they have a different version of Python installed on their system.[1] Make sure that you know the Python version you're working with. You can check via the following:

$ python --version
Python 2.7.9

Not Using a Version Manager

pyenv is a great tool for managing different Python versions. Unfortunately, it only works on *nix systems. On Mac OS, one can simply install it via brew install pyenv and on Linux, there is an automatic installer. [2]

Obsessing Over One-Liners

Some get a real kick out of one liners. Many boast about their one-liner solutions even if they are less efficient than a multi-line solution.

What this essentially means in Python is convoluted comprehensions having multiple expressions. For example:

l = [m for a, b in zip(this, that) if b.method(a) != b for m in b if not m.method(a, b) and reduce(lambda x, y: a + y.method(), (m, a, b))]

To be perfectly honest, I made the above example up. But, I've seen plenty of people write code like it. Code like this will make no sense in a week's time. If you're trying to do something a little more complex that simply adding an item to a list or set with a condition then you're probably making a mistake.

One-Liners are not achievements, yes they can seem very clever but they are not achievements. Its like thinking that shoving everything into your closet is an actual attempt at cleaning your room. Good code is clean, easy to read and efficient.

Initializing a set the Wrong Way

This is a more subtle problem that can catch you off guard. set comprehensions are a lot like list comprehensions.

>>> { n for n in range(10) if n % 2 == 0 }
{0, 8, 2, 4, 6}
>>> type({ n for n in range(10) if n % 2 == 0 })
<class 'set'>

The above is one such example of a set comprehension. Sets are like lists in that they are containers. The difference is that a set cannot have any duplicate values and sets are unordered. Seeing set comprehensions people often make the mistake of thinking that {} initializes an empty set. It does not, it initializes an empty dict.

>>> {}
{}
>>> type({})
<class 'dict'>

If we wish to initialize an empty set, then we simply call set().

>>> set()
set()
>>> type(set())
<class 'set'>

Note how an empty set is denoted as set() but a set containing something is denoted as items surrounded by curly braces.

>>> s = set()
>>> s
set()
>>> s.add(1)
>>> s
{1}
>>> s.add(2)
>>> s
{1, 2}

This is rather counter intuitive, since you'd expect something like set([1, 2]).

Misunderstanding the GIL

The GIL (Global Interpreter Lock) means that only one thread in a Python program can be running at any one time. This implies that when we create a thread and expect to run in parallel it doesn't. What the Python interpreter is actually doing is quickly switching between different running threads. But this is an oversimplified version of what is actually happening. There are many instances in which things do run in parallel, like when using libraries that are essentially C extensions. But when running Python code, you don't get parallel execution most of the time. In other words, threads in Python are not like Threads in Java or C++.

Many will try to defend Python by saying that these are real threads.[3] This is indeed true, but does not change the fact that how Python handles threads is different from what you'd generally expect. This is the same case for a language like Ruby (which also has an interpreter lock).

The prescribed solution to this is using the multiprocessing module. The multiprocessing module provides you with the Process class which is basically a nice cover over a fork. However, a fork is much more expensive than a thread, so you might not always see the performance benefits since now the different processes have to do a lot of work to co-ordinate with each other.

However, this problem does not exist every implementation of Python. PyPy-stm for example is an implementation of Python that tries to get rid of the GIL (still not stable yet). Implementations built on top of other platforms like the JVM (Jython) or CLR (IronPython) do not have GIL problems.

All in all, be careful when using the Thread class, what you get might not be what you expect.

Using Old Style Classes

In Python 2 there are two types of classes, there's the "old style" classes, and there's the "new style" classes. If you're using Python 3, then you're using the "new style" classes by default. In order to make sure that you're using "new style" classes in Python 2, you need to inherit from object for any new class you create that isn't already inheriting from a builtin like int or list. In other words, your base class, the class that isn't inheriting from anything else, should always inherit from object.

class MyNewObject(object):
    # stuff here

These "new style" classes fix some very fundamental flaws in the old style classes that we really don't need to get into. However, if anyone is interested they can find the information in the related documentation.

Iterating the Wrong Way

Its very common to see the following code from users who are relatively new to the language:

for name_index in range(len(names)):
    print(names[name_index])

There is no need to call len in the above example, since iterating over the list is actually much simpler:

for name in names:
    print(name)

Furthermore, there are a whole host of other tools at your disposal to make iteration easier. For example, zip can be used to iterate over two lists at once:[4]

for cat, dog in zip(cats, dogs):
    print(cat, dog)

If we want to take into consideration both the index and the value list variable, we can use enumerate:[5]

for index, cat in enumerate(cats):
    print(cat, index)

There are also many useful functions to choose from in itertools. Please note however, that using itertools functions is not always the right choice. If one of the functions in itertools offers a very convenient solution to the problem you're trying to solve, like flattening a list or creating a getting the permutations of the contents of a given list, then go for it. But don't try to fit it into some part of your code just because you want to.

The problem with itertools abuse happens so often that one highly respected Python contributor on StackOverflow has dedicated a significant part of their profile to it.[6]

Using Mutable Default Arguments

I've seen the following quite a lot:

def foo(a, b, c=[]):
    # append to c
    # do some more stuff

Never use mutable default arguments, instead use the following:

def foo(a, b, c=None):
    if c is None:
        c = []
    # append to c
    # do some more stuff

Instead of explaining what the problem is, its better to show the effects of using mutable default arguments:

In[2]: def foo(a, b, c=[]):
...     c.append(a)
...     c.append(b)
...     print(c)
...
In[3]: foo(1, 1)
[1, 1]
In[4]: foo(1, 1)
[1, 1, 1, 1]
In[5]: foo(1, 1)
[1, 1, 1, 1, 1, 1]

The same c is being referenced again and again every time the function is called. This can have some very: unwanted consequences.

Takeaway

These are just some of the problems that one might run into when relatively new at Python. Please note however, that this is far from a comprehensive list of the problems that one might run into. The other pitfalls however are largely to do with people using Python like Java or C++ and trying to use Python in a way that they are familiar with. So, as a continuation of this, try diving into things like Python's super function. Take a look at classmethod, staticmethod and __slots__.

Update

Last Updated on 12 May 2015 4:50 PM (GMT +6)

Made the section on Misunderstanding the GIL better.

[1]	Most people are taught Python using Python 2. However, when they go home and try things out themselves, they install Python 3 (quite a natural thing to install the latest version).

[2]	pythonz is an alternative to Pyenv thats worth checking out if pyenv doesn't work for you.

[3]	When people talk about real threads what they essentially mean is that these threads are real CPU threads, which are scheduled by the OS (Operating System).

[4]	https://docs.python.org/3/library/functions.html#zip

[5]	`enumerate` can be further configured to produce the kind of index you want. https://docs.python.org/3/library/functions.html#enumerate

[6]	http://stackoverflow.com/users/908494/abarnert

An Open Letter To StackOverflow Election Winners

2015-04-21T14:21:00+06:00

Almost three years ago, I found a relatively new site on the internet. It kept popping up on the first few search results whenever I would be looking for an answer to a programming problem. You had trouble with Java? StackOverflow. You don't understand how python generators work? StackOverflow. You have a question about how C++ friend classes worked? StackOverflow. Because it was such a great site open to free membership, I decided to join.

My time as a StackOverflow user started terribly. The first question I asked was also one of the first question I deleted, because I thought it was so terrible!

I was transitioning from Java to Python after taking an online course on Python. I was quite new to computer science too. One of the very first question was about the unary operator, ~.[1] I didn't even know what it was called back then. Was it a tridle? Was it a twiddle? What was it even called in the context of computer Science? I didn't know. That's why I asked the question, I had never used it in writing. I searched google, and all I got was several links pertaining to the @ symbol. I checked again just now and it still does.

I thought to myself that its a simple question, and I'm sure that this would be a valid one. It was down-voted six times in total, where I thought it would be amicably met. If you're the kind of arse that gets a kick out of making other people feel crap about themselves, then the above example has earned nothing but a condescending sneer from you. If so, then you don't need to keep on reading.

This was a community that I admired, valued and looked up to. When six of this community conveyed how stupid my question was, I was totally crushed.

Wouldn't you be? If the very community you had come to admire had told you that you had the intelligence of a goat, how would you feel? I felt so ashamed that I deleted the question. Such a stupid question is undeserving of StackOverflow. In a site that is completely transparent like StackOverflow, all your stupidity is neatly documented in the questions that you've asked. All your deficiencies are there for the world to see. So I started deleting my questions.

But then, I got banned from asking questions entirely. No one told me what I was banned either. Did the StackOverflow bot determine that I was a terrible programmer, and that I had not belong here? Did I not make the cut? Was there a limit to how many times you could be down-voted when starting out?

Fast-forward to now, and you still get this kind of behaviour.[2] Trigger happy down-votes and close votes are the norm now. The thing that's changed is that I've answered a few questions and asked a few more too.

I wanted to stand in the elections. I didn't because I don't think I'm anywhere near good enough. You need to have plenty of reputation and badges to stand a chance of winning. I don't so I voted for the people who I believe will be a little kinder and gentler to new users.

So, whoever you three are, I hope you're kinder to newbies. You might think that StackOverflow is too big and too important to fail. Make no mistake that bars need to be set and examples made. However, that does not give us license to be cruel to new users who value what we do here and want to contribute.

I know I'm no one to tell the highly esteemed winners of the elections what to do. I forfeited that right when I chose not to stand. However, I beg of you to consider the consequences of the chronic levels of intolerance that we show new comers.

[1]	http://stackoverflow.com/questions/18742078/what-does-the-operator-do-in-python

[2]	You can only see this if you have the ability to see deleted questions. OP asked a simple question, got down-voted 8 times. Eventually deleted the question. This could eventually lead to OP getting question-banned.

The Deceptive Anagram Question

2015-04-16T00:10:00+06:00

Table of Contents

The Question
- An Example
Quadratic Time
- Initial Solution
Linear Time
A Vector Approach to Hashing
The Final Frontier
- But There's An Import For That
Lessons Learnt
Acknowledgments

Last month, I had a programming interview. It hadn't gone as I would've liked, but I did get asked a question that I found interesting. The question is deceptively simple, but has a lot of depth to it. Since I failed to solve the problem correctly in the interview, I decided explore the ways in which I could optimize my initially O(n²) solution.

After a few attempts at solving the problem from different angles, I've come to appreciate the importance of understanding complexity, as well as its limitations.

The Question

The question was simple:

Given a list of words L, find all the anagrams in L in the order in which they appear in L.

An Example

Given the input

["pool", "loco", "cool", "stain", "satin", "pretty", "nice", "loop"]

The desired output would be

["pool", "loco", "cool", "stain", "satin", "loop"]

in that order exactly.

Quadratic Time

The naive solution to this will give you an O(n²) algorithm. Be warned, the following code may burn your eyes.

Initial Solution

from collections import Counter


def anagram_finder(word_list):

    _ret = set()
    for word in word_list:
        c = Counter(word)
        for other_word in [w for w in word_list if w != word]:
            if Counter(other_word) == c:
                _ret.add(word)
                _ret.add(other_word)
    return [w for w in word_list if w in _ret]

if __name__ == '__main__':

    print anagram_finder(["pool", "loco", "cool", "stain", "satin", "pretty", "nice", "loop"])

This is wrong on so many levels, I don't even know where to begin.

The problem solves the solution in quadratic time meaning, for more computing power we throw at this the slower it gets per computer. Imagine if we were to put an algorithm like this in the server, we would have serious scaling issues.
collections.Counter is expensive. It needs to create a dictionary, then it needs to add each character to the dictionary, that means hashing each character.
It adds the original word every single time it finds an anagram, and relies on set not to add duplicate values.

I'm probably missing a few points but the point remains; this is seriously bad code.

Linear Time

So, how can we turn this problem form an O(n²) solution into an O(n) solution? Using hash-maps correctly. Unfortunately, I did not come up with the brilliant idea of using a hash-map, but rather my interviewer told me that the way to get O(n) was to use a hash-map.

Hashing The Right Way

In general, whenever you hear the word "hash" you think md5 or SHA. But in reality, a hash is a way to map data in a uniform way. Think of it like this, if you have a the word pool and loop, in the eyes of the anagram solver, they are the same. Why? Because both words use the same characters. In other words, there had to be a uniform way to converting these two words into the same thing. If we were to simply sort the characters in the word, we'd get exactly what we're looking for. Here's a demonstration:

>>> sorted("loop")
['l', 'o', 'o', 'p']
>>> sorted("pool")
['l', 'o', 'o', 'p']
>>> "".join(sorted("loop"))
'loop'
>>> "".join(sorted("pool"))
'loop'

With that, I had my hashing function and with it, I had my linear solution.

def hasher(w):  # 1
    return "".join(sorted(w))


def anagram_finder(word_list):

    hash_dict = {}  # 2

    for word in word_list:
        h = hasher(word)  # 3

        if h not in hash_dict:  # 4
            hash_dict[h] = []  # 5

        hash_dict[h].append(word)  # 6

    return [anagram for l in hash_dict.values() for anagram in l if len(l) > 1]  # 7

if __name__ == '__main__':
    print(anagram_finder(["pool", "loco", "cool", "stain", "satin", "pretty", "nice", "loop"]))

In 1, we create the hasher function. The hash is simple, it sorts the string alphabetically using sorted, which returns a list, which we then use as an iterable for "".join to create a string. We do this because python lists are not hashable (because they are mutable).

In 2, inside the anagram_finder function, we create a hash_dict, a dictionary for all our hashes. It must be pointed out that the dictionary, when adding new keys, will hash those keys as well.[1] The worst case for hasher is O(n) where n is the length of the word in question, so no issue with the size of the list we're given.

In 3 we actually call the hasher to hash the string. In 4, we check to see if this hash exists in the keys of hash_dict. If not, then we create a new list so that we can append words to it in 5.

In the end, we always append the word to the list of that key in 6. This means, that every key will always have at least one value stored in its list, and these values are the ones we don't one.

The simplified version of 7 is as follows:

_ret = []
for l in hash_dict.values():
    if len(l) > 1:
        _ret += l

The Pythonic Version

The above is great for explanation, but the pythonic version is much smaller:

from collections import defaultdict


def hasher(w):
    return "".join(sorted(w))


def anagram_finder(word_list):

    hash_dict = defaultdict(list)

    for word in word_list:
        hash_dict[hasher(word)].append(word)  # 1

    return [anagram for l in hash_dict.values() for anagram in l if len(l) > 1]

if __name__ == '__main__':
    print(anagram_finder(["pool", "loco", "cool", "stain", "satin", "pretty", "nice", "loop"]))

We've made the code significantly smaller by using a defaultdict in 1. defaultdict allows us to give it a factory, in this case list, that will automatically create a list with a key if a key does not exist. If it does exist, then it will return that list, and we can append to it.[3]

But wait, we forgot about the ordering.

Ordering Done Right

We have a quick fix for the ordering, and that is simply to loop through all the words in the initial list and include only those that is in the list of anagrams. The solution is still O(n), but remains highly inefficient. One thought might be to use the collections.OrderedDict class. But although that might seem to work, the ordering will still not match the original in the case where anagrams are not next to each other. For example, the following piece of code will return:

from collections import OrderedDict


def hasher(w):
    return "".join(sorted(w))


def anagram_finder(word_list):

    hash_dict = OrderedDict()

    for word in word_list:
        hash_dict.setdefault(hasher(word), []).append(word)

    return [anagram for l in hash_dict.values() for anagram in l if len(l) > 1]

if __name__ == '__main__':
    print(anagram_finder(["nala", "pool", "loco", "cool", "stain", "satin", "pretty", "nice", "loop", "laan"]))

['nala', 'laan', 'pool', 'loop', 'loco', 'cool', 'stain', 'satin']

nala and laan should not be next to each other. This is because collections.OrderedDict remembers the order in which the keys were added.

So, in the end, I stuck to the following:

from collections import defaultdict


def hasher(w):
    return "".join(sorted(w))


def anagram_finder(word_list):

    hash_dict = defaultdict(list)

    for word in word_list:
        hash_dict[hasher(word)].append(word)

    return [word for word in word_list if len(hash_dict[hasher(word)]) > 1]

if __name__ == '__main__':
    print(anagram_finder(["nala", "pool", "loco", "cool", "stain", "satin", "pretty", "nice", "loop", "laan"]))

With that, we solved the ordering problem and still managed to make it linear.

Profiling

We never know how well something will actually work unless we profile it. So I got a big list of words[2] and go to profiling.

from collections import defaultdict


def hasher(w):
    return "".join(sorted(w))


def anagram_finder(word_list):

    hash_dict = defaultdict(list)

    for word in word_list:
        hash_dict[hasher(word)].append(word)

    return [word for word in word_list if len(hash_dict[hasher(word)]) > 1]

if __name__ == '__main__':

    with open('wordsEn.txt') as f:
        w_list = f.read().split()
        anagram_finder(w_list)

I used a library called profiling, and it can be installed with a simple pip install. After profiling the above code, I found that my hasher function was actually making the the whole process a lot slower, because it was being called twice.

So, my first attempt (like a good pythonista) was to use functools.lru_cache. That attempt failed spectacularly.

from collections import defaultdict
from functools import lru_cache

@lru_cache()
def hasher(w):
    return "".join(sorted(w))


def anagram_finder(word_list):

    hash_dict = defaultdict(list)

    for word in word_list:
        hash_dict[hasher(word)].append(word)

    return [word for word in word_list if len(hash_dict[hasher(word)]) > 1]

if __name__ == '__main__':

    with open('wordsEn.txt') as f:
        w_list = f.read().split()
        anagram_finder(w_list)

This is because there were excessive calls being made all over the place. Even increasing the size of the cache from lru_cache() which has a default if 128 to lru_cache(11000) which is roughly the size of the word list I'm using. In fact, increasing the lru_cache size to such an amount slowed down the program so much that I didn't wait for it to finish, but the root problem was the same; too many calls were being made all over the place.

So, to compare the two programs, with lru and without lru, we can see that the program without lru was significantly faster than the one with lru (4.77 seconds to 15.31 seconds. With the failure or lru_cache as a feasible solution, I decided to just use a normal dictionary to store hashes. In our original attempt, hasher was being called twice, once to add words to the dictionary, and then in the list comprehension. Why not just use a dictionary to store the hash and the word?

from collections import defaultdict


def hasher(word):
    return "".join(sorted(word))


def anagram_finder(word_list):

    hash_dict = defaultdict(list)
    hashes = {}

    for word in word_list:
        hashes[word] = hasher(word)
        hash_dict[hashes[word]].append(word)

    return [word for word in word_list if len(hash_dict[hashes[word]]) > 1]

if __name__ == '__main__':

    with open('wordsEn.txt') as f:
        w_list = f.read().split()
        anagram_finder(w_list)

We just create a new dictionary, hashes to store all our hash and word pairs. This resulted in a speedup.

A Vector Approach to Hashing

Another approach to hashing would be to use a vector to determine the number of letters that appear in a word. This is better explained through code:

def hasher(word):
    h = [0] * 26                   # 1
    for c in word:
        h[ord(c) - ord('a')] += 1  # 2
    return tuple(h)                # 3

In 1, we create a list of length 26, having all its values initialized to 0. In 2, we calculate the rank of the letter in the english alphabet using ord(c) - ord(a). We use this rank as the index of our list. We then add 1 to the value of the index. In other words we are merely counting the frequency of the letters in a word; very similar to a histogram.

The above is O(n) compared to the O(n log(n)) solution that sorting a string gives us (n is the number of characters in the word here). However, considering that the average size of a word in the english language is actually 5 and the average size of a word in my wordsEn.txt file is 8, O(n) actually becomes the smaller problem here, and the constant time that is required to create a list of 26 items is the bigger problem. In other words, its O(26) for list creation and initiation. Its O(n) for creating the rank and its O(26) for creating the tuple.

Compare that to O(n log(n)) worst case for using sorted. In this particular use case since n is very low, the better option is to use sorted.

The Final Frontier

The best solution to this problem however is not one of my own making, but rather one that a friend of mine (Alexander) provided me with. My solution actually is superfluous in many cases, but the following cuts straight to the point:

from collections import defaultdict
from itertools import izip

def get_anagrams(words):

    normalized = [''.join(sorted(w)) for w in words]  # 1

    d = defaultdict(int)  # 2

    for n in normalized:  # 3
        d[n] += 1         # 4

    return [w for w, n in izip(words, normalized) if d[n] > 1]  # 5

The genius of this approach is to use collections.defaultdict with int as the factory. This basically means that you have a smaller memory footprint. So, in 1, we get a list of hashed values (which he calls normalized). In 2, we create a defaultdict of int, this means that all the keys will have an initial value of 0.

In 3, we loop over all the normalized/hashed values. 4 is where the magic happens. So, there will be duplicates in normalized, and we are going to count the number of times each of those normalized values appear.

In 5, we zip over the initial word list, words and normalized. Remember, that all the words are normalized. All we do then is simply check to see if the number of times the normalized value of the word appears in d. If there's more than one occurrence, we add the word to the final list in this list comprehension.

But There's An Import For That

In the very beginning, I used collections.Counter, to provide the worst possible solution to the problem. But, I think this time, we can actually use collections.Counter correctly, because 2, 3 and 4 can be shortened to just Counter(normalized). So the final final (gosh, that feels javaish) version is:

def get_anagrams(words):

    normalized = [''.join(sorted(w)) for w in words]

    d = Counter(normalized)

    return [w for w, n in izip(words, normalized) if d[n] > 1]

Lessons Learnt

Hash Maps are very important. Learn to use them properly.
When making an algorithm don't think standard library, think of finding the best algorithmic solution.
After you've found the algorithmic solution try optimizing it using the standard library.
Quadratic time is worse than you can imagine.
Complexity analysis alone is a blunt instrument so always profile.
Just because an algorithm has better worst case complexity doesn't mean that its the best one for the job.
Sometimes our brains stop working under pressure. Take a deep breath, stop worrying about what will happen if you don't solved the problem and focus on all the good things that will happen if you do.

Acknowledgments

Ashwin (@inspectorG4dget) for helping me answer this question and explaining the vector approach to hashing. Ashwin's help has been paramount to my understanding.

To my interviewer for giving me a very good question, and to Ridwan (@hjr265) for pointing out flaws in my solution and for encouraging me to write a blog post on this problem.

Alexander (@metaprogrammer) for proof reading this post and providing me with the optimal solution to this problem.[4]

[1]	Python Hash Algorithms explains in detail how the different hash functions in Python actually work.

[2]	wordsEn.txt is the list I used.

[3]	Using defaultdict in Python is a great writeup on how to use `defaultdict`. It explains the concept of a factory as well.

[4]	Alexander Kozlovsky is the creator of Pony ORM.

PyCharm: The Good Parts II

2015-01-18T18:06:00+06:00

Table of Contents

Code Completion
Editor
Up Next
Previously
Update

Before we begin, I must talk about what I mean by editing, and what I intend to cover. I believe that the most important thing about editing is code completion. In this section I pay special attention to python although PyCharm offers superb code completion for other languages like SQL and JavaScript.[1]

I start off with code completion, because if you're like me, you have little time and want to dive into the thick of things and code completion is what we want most out of an IDE.

Then, I talk about a couple of editor settings that help keep me sane.

Code Completion

PyCharm's editor is very powerful and extensible. In this section we will go over some useful tools that I use to write less error prone code faster.

Smart Completion

You get suggestions as you type. You can further enhance this by the following:

Collecting runtime information
Adding docstrings to functions and methods
Invoking assert statements

Runtime Information

Lets start with the easiest enhancement. All we need to do is check this box, and PyCharm will give you better code completion as you debug your program more:

If you ever get wrong suggestions for completion clear the caches. Running your tests in debug mode will also give PyCharm a better chance to get an understanding of the types that you're using.

Docstrings

Writing docstrings helps PyCharm, but also helps developers gain a better understanding of the code that you have written. PyCharm understands docstrings and uses the information in them to give you code completion and give you warning signals if it thinks you're being inconsistent:

def foo(a, b, c):
    """

    :param a: A for apple
    :type a: str

    :param b: B for ball
    :type b: str

    :param c: C for cat
    :type c: str

    :return: The answer to everything is 42. Remember ye well!
    :rtype: int
    """
    return 42

if __name__ == '__main__':

    foo(1, 2, 3)

In the example above, I've used PyCharm tells me that I'm being inconsistent:

But there's more! If you wrote docstrings, then PyCharm will give you a beautiful documentation popup about your variables (by using quick documentation):[2]

Assert Statements

assert statements can help tell PyCharm what types you should expect. Please note that, assert statements are not overhead free, unlike docstrings. I would advise using them in tests, so that the debugger can collect type information about your parameters. For example, the following will give code completions for a but not for b:

def foo(a, b, c):

    assert isinstance(a, str)

    return 42

This is because PyCharm knows about a, and that its a string, but it doesn't know about b:

Hippie Complete

Hippie Complete or Cyclic completion is best explained with a video:

Basically, it completes the word based on previous words you've typed in the same document.

Spelling Correction

PyCharm also offers spelling correction as a quick fix. If you misspell "massachussets" like so:

Then you can try quick fixing it with ⌥ + return. After you hit return on the first option above, you should see correction options:

This was a request on reddit

Language Injection

Language Inject basically means that you can inject other languages into Python strings, and have python recognize those languages. This comes in really handy when you're trying to syntax highlighting for regex or html, but it goes further than that often providing code completion. In order to get language injection, one must invoke the quick fix command inside string delimiters ('').[2]

Here is a video to demonstrate how language injection works:[3]

Editor

This part covers the settings regarding the editor for better workflow. The options I set are based on my own preferences and hence not scientifically tested but they do make my editing a lot faster. Take the following advice with a healthy dose of skepticism since what might work for me, might not work for you.

Font

Consolas is the only font that looks good on Windows, Mac and Linux. Consolas remains monospace regardless of whether it is bold or italic, and the only font that looks consistently good in PyCharm. I have tested this on Mac OSX Yosemite, Windows 8.1 and Ubuntu (14.04).[4]

If you've had success with other fonts (especially in Linux), please do mention them in the comments.

Braces and Quotes

A lot of the time, I forget to put parentheses around things or quotes around strings. By enabling the following option, you can select text, and when you press " or ' or ( or ), instead of replacing the text with the character entered, it will surround the selection with quotes or parentheses:

Case Sensitive Completion

By default, PyCharm comes with first letter case sensitivity, meaning that if you were to type e, Exception would not be offered as a viable completion. This is why I disable case sensitivity:

Please note however, that on slower machines this does make code completion slower.

Up Next

I will likely be covering how to deal with interpreters.

Previously

I talked about a couple of things in part one.

Update

2015-01-19 11:41 : Added a section on Spelling Correction

[1]	I will cover javascript, html and other stuff in another section that deals with web tools. Don't worry I know that they are indispensable tools, and will make sure to give them the time they deserve.

[2]	(1, 2) Use the keymap to find your corresponding keyboard shortcut discussed in the previous part.

[3]	Don't worry if you don't know what emmet does, I'll cover that when I cover web tools.

[4]	Take a look at this answer if you want to use consolas on Ubuntu: http://askubuntu.com/questions/269757/consolas-not-visible-in-intellij-idea-on-ubuntu-12-04

PyCharm: The Good Parts I

2015-01-17T12:09:00+06:00

Table of Contents

Productivity Guide
Starting Out
Keyboard Shortcuts
Navigation
- Macro
- Micro
  - Structure
  - AceJump
Up Next
Update

tldr: If you don't have the time to read the entire post, just read Productivity Guide, because I assume that you want better productivity as your primary goal.

PyCharm is a very powerful IDE, and has a multitude of features that keep growing at a large pace. PyCharm is essentially, a plugin built on top of the IntelliJ platform, Jetbrains' main IDE. Because PyCharm borrows from a whole host of features available from IntelliJ, its easy to get lost. [1]

This post shows you some of the best features that PyCharm has to offer, and how to use those features properly. I assume that you've installed plugins and know your way around the IDE. The table of contents should give you an idea of what is covered in this post.

Productivity Guide

You might not know this, but PyCharm tracks what features you use, and tells you how productive you are being in your productivity guide:

The productivity guide gives you detailed descriptions of what features can help you code faster. You can see below that although I don't use all the features, I do use some quite a lot:

Starting Out

PyCharm touts its project creation wizard a lot, and for Windows users is a godsend because you can easily create a project and associate a virtualenv with it. However for *nix users, the best option is to start PyCharm from the command line, and you can do this by creating a command line launcher:

There is one thing that you need to know about starting from the command line is that PyCharm uses the interpreter that the python command is linked to. So for example, if your python command in the shell links to /usr/bin/python, then that interpreter will become the interpreter for your new project. However, if you want to have the interpreter as a virtualenv, you have to create the virtualenv, activate it and then use the command line launcher to initialize your project. If we do the following:

$ mkdir SimpleProjectDir
$ cd SimpleProjectDir
$ charm .

It would create a PyCharm project in SimpleProjectDir and use the default interpreter in the shell like so:

We can see that it is the same python that was linked to in our shell:

$ which python
/usr/local/bin/python

However, if we were to create a virtualenv beforehand and then create a project using the command line launcher, we would get the virtualenv as the interpreter instead:

$ mkdir SimpleProjectDir
$ cd SimpleProjectDir
$ virtualenv .venv
New python executable in .venv/bin/python2.7
Also creating executable in .venv/bin/python
Installing setuptools, pip...done.

$ source .venv/bin/activate
(.venv)$ charm .

In simple terms, whatever python points to in the shell at the time of using the command line launcher, PyCharm will adopt that interpreter as the project interpreter.

Keyboard Shortcuts

Keymap

PyCharm allows you to see all your keyboard shortcuts using the Keymap. Keymap has two key features which aid in development. One being the ability to find keyboard shortcuts based on what they're called:

But it also allows you to find shortcuts based on the keys themselves, so that you can talk other people about them:

So, whenever I talk about an action like "Find Action" or "Search Anywhere", just search it up on your keymap to see the corresponding keyboard shortcut.

Find Action

The best way to learn Keyboard shortcuts is to use Find Action. [2] Every time you use an action using find action, the keyboard shortcut is also denoted on the right:

Whatever you find yourself using over and over again through Find Action, just commit it to memory and you'll be saving yourself some time.

Useful Shortcuts

Use the Keymap to find what shortcuts these commands refer to: [3]

Find Action
Search Everywhere
Find
Structure
Add Selection for Next Occurrence
Switcher
Check In Project
Syntax Aware Selection
New

The first two features are must use shortcuts, and will aid you in learning more shortcuts as you see fit. The others are shortcuts I use regularly, and find helpful.

Navigation

This section covers how to get from place to place. I've divided it into two sections. One being Macro and the other being Micro. Macro is about moving between files, panels and tools. Micro is about moving from place to place within a file.

Macro

Back and Forward

This is one of my most favorite features. Back and Forward allows you to go back to the place where your cursor was previously. This is immensely powerful, and allows you to jump between places within files and places between files. Let me demonstrate:

Search Everywhere

We need to find stuff all the time like functions, files, classes and methods. The best way to search for classes, files, and directories is to use Search Everywhere (press SHIFT in quick succession):

Switcher

The other way is to use the Switcher, which gives you quick access to your most recently used panels and files:

Micro

Structure

In order to find things within files, its best to use Find, and for a more structural view, use Structure. The Structure panel gives you a skeletal view of your file, with variables, classes, functions and methods. It allows you to quickly navigate between important structures within your file:

AceJump

AceJump is a plugin that allows you to quickly jump from place to place. Its a clone of Emacs' AceJump mode. There are two superb videos by the creator of AceJump, John Lindquist on how to use the plugins.

Installing a plugin is simple:

Up Next

In the next part, I will cover editing in PyCharm, with special respect to taking control of its powerful code completion feature-set.

Update

On 2015-01-18 20:40

Part II is out now!

[1]	The creator of PyCharm, IntelliJ, RubyMine and other IDEs.

[2]	`⌘` + `SHIFT` + `A` on Mac `CTRL` + `SHIFT` + `A` on Windows and Linux

[3]	If you can't find any of these, then you're using an older version of PyCharm.

Pingback

Making a Static Blog with Pelican

2015-01-06T14:51:00+06:00

Table of Contents

Pelican allows you to create a static blog. Most blog sites on the web are dynamic in the sense that the content of the site live in a database. In order to view a post on a blog, the server has to query the database, get the right content and then convert it into presentable HTML. However, in a static site, every page is pre-rendered by the static blog generator. This means that your entire blog can be uploaded to a server.

Here's a non-exhaustive list of why a static blog generator is good:

Cheap Scaling: Pre-rendered static files can stand the onslaught of traffic if your post makes it to the top of hackernews or reddit very cheaply. No queries are being made.
Any format: You can use reStructuredText, Markdown etc. to write your posts.
Host Anywhere: Static sites can be hosted easily on github pages, Amazon S3 and even dropbox.
Own Everything: You have access to all your posts and all your themes. No company in the middle, just you and your content.
Control: Static site generators give you a lot of control over pretty much every aspect of your site through templates and plugins, allowing you to quickly add complex functionality not found in popular web blogging platforms.
Update Issues: If you are using wordpress, you have to update your software when a new version comes out; otherwise, you have a security risk. However, one can use the same version of a static blog generator indefinitely.

Here's why pelican is a good choice in contrast with other generators: [1]

Plugins: Pelican has a lot of plugins, that allow you to quickly add functionality.
Hackable: Extending pelican as well as altering its behaviour is simple.
Themes: Pelican has a lot of themes, and you can make your own.
Multi-platform: Works well on Windows, OSX and Linux. [2]

I've been using Pelican as my static blog generator for the past year. Over this time, I have come to recognize its strengths and weaknesses. Pelican allows you to quickly get a themed site up and running in a matter of minutes. You can also choose to invest more time and create your own theme. In short, if you want a low hassle setup, its there and if you want a more custom site you can have that too.

However, Pelican's initial setup barely scratches the surface of what's possible with this small yet powerful blog generator. It has many plugins too for quickly adding extra functionality, like the support of other document formats; ipynb and asciidoc for example. Plugins also allow you to support things like MathJax in your pages and embed disqus comments. [3]

Getting Started

Before diving in, I expect you to know a few basic things:

How to use git
How to use bash or your shell of choice
How to use Python

Installing Pelican

Installation is very simple:

$ pip install pelican

Feel free to use virtualenv if you want to but its not required.

Pelican works with both Python 2 and Python 3. However, I chose to use Python 2 because I use an automation tool called Fabric, which only supports python 2 (for the time being).

Basic Setup

When first beginning with pelican, one can choose to use the pelican skeleton generator to create a basic structure using pelican-quickstart. [4]

$ pelican-quickstart
Welcome to pelican-quickstart v3.5.0.

This script will help you create a new Pelican-based website.

Please answer the following questions so this script can generate the files
needed by Pelican.


> Where do you want to create your new web site? [.]
> What will be the title of this web site? Test Site
> Who will be the author of this web site? Kevin Kevinson
> What will be the default language of this web site? [en]
> Do you want to specify a URL prefix? e.g., http://example.com   (Y/n) n
> Do you want to enable article pagination? (Y/n)
> How many articles per page do you want? [10] 8
> Do you want to generate a Fabfile/Makefile to automate generation and publishing? (Y/n)
> Do you want an auto-reload & simpleHTTP script to assist with theme and site development? (Y/n)
> Do you want to upload your website using FTP? (y/N)
> Do you want to upload your website using SSH? (y/N)
> Do you want to upload your website using Dropbox? (y/N)
> Do you want to upload your website using S3? (y/N)
> Do you want to upload your website using Rackspace Cloud Files? (y/N)
> Do you want to upload your website using GitHub Pages? (y/N)
Done. Your new project is available at /Users/quazinafiulislam/Desktop/testingzone

There's no need to change much in the default quickstart wizard. Default values are indicated by [default] and (y/N), where the capitalized letter is the default value. Just one thing to note here is URL prefix; don't set it right now, set it when you actually want to take your site online, and we will go over how to do that later in this tutorial.

Your First Post

After pelican-quickstart is done, this is what the directory should look like:

.
├── Makefile
├── content/
├── develop_server.sh
├── fabfile.py
├── output/
├── pelicanconf.py
└── publishconf.py

Pelican supports reStructuredText out of the box. If you want to use another format like markdown then just install it:

$ pip install markdown

More uncommon formats will be discussed later. With this out of the way, we can create our first post inside the content folder. All our writing has to be inside the content folder, so we change our directory to the content folder:

$ cd content

My file is called first_post.rst and it looks like this:

first_post
##########

:date: 2014-12-13 18:32
:category: Test

Hello World from Pelican!

Basically, all one needs is a category and a date. A markdown version of the same post would be:

Title: first_post
Date: 2014-12-13 18:32
Category: Test

Hello world from Pelican!

This is what my project directory looks like now:

.
├── Makefile
├── content
│   └── first_post.rst
├── develop_server.sh
├── fabfile.py
├── pelicanconf.py
├── pelicanconf.pyc
└── publishconf.py

Now changing back to our project directory, we can start the development server, and see the generated blog post:

$ make devserver

Once this is done, we should be able to see our post on localhost:8000:

We can stop the server through:

$ make stopserver

Automation

Although only a title, date and category are required for a post, pelican allows you to add a lot more information like tags, authors and much more. However, creating a new file with the right date and time can be time consuming, so we can use certain tools to automate the task for us.

Vanilla Python

We can create simple scripts to automatically create our entries with the right date and time. Here is an example script that I use called make_entry.py:

import sys
from datetime import datetime

TEMPLATE = """
{title}
{hashes}

:date: {year}-{month}-{day} {hour}:{minute:02d}
:tags:
:category:
:slug: {slug}
:summary:
:status: draft


"""


def make_entry(title):
    today = datetime.today()
    slug = title.lower().strip().replace(' ', '-')
    f_create = "content/{}_{:0>2}_{:0>2}_{}.rst".format(
        today.year, today.month, today.day, slug)
    t = TEMPLATE.strip().format(title=title,
                                hashes='#' * len(title),
                                year=today.year,
                                month=today.month,
                                day=today.day,
                                hour=today.hour,
                                minute=today.minute,
                                slug=slug)
    with open(f_create, 'w') as w:
        w.write(t)
    print("File created -> " + f_create)


if __name__ == '__main__':

    if len(sys.argv) > 1:
        make_entry(sys.argv[1])
    else:
        print "No title given"

With this, making a new file can be created like so:

 $ python make_entry.py "New Post"
File created -> content/2014_12_13_new-post.rst

Fabric

Jump to the next section if you're not using Python 2 [5]

Using Vanilla Python is great, but imagine trying to make a script for every single task you need done. This would be a nightmare. Of course, one can keep using if-else statements to add more argument parameters but that is tiring and not very flexible (not to mention error prone). This is why one option to making scripts that can take in different parameters is to use a handy library called Fabric.

I use fabric for most of my tooling, since pelican already generates a fabfile.py. Getting a new command line function in fabric is very easy:

If you don't have a fabfile.py, create one.
Create a new function inside fabfile.py
Run the function like so: fab do_something
If the function has positional arguments, fab do_something:True, False
If the function has optional keyword arguments fab do_something:kill_p=True, be_happy=False

Installing Fabric is easy:

$ pip install Fabric

The body of the function to make an entry is the same as before:

# TEMPLATE is declared before hand, and all the necessary imports made
def make_entry(title):
    today = datetime.today()
    slug = title.lower().strip().replace(' ', '-')
    f_create = "content/{}_{:0>2}_{:0>2}_{}.rst".format(
        today.year, today.month, today.day, slug)
    t = TEMPLATE.strip().format(title=title,
                                hashes='#' * len(title),
                                year=today.year,
                                month=today.month,
                                day=today.day,
                                hour=today.hour,
                                minute=today.minute,
                                slug=slug)
    with open(f_create, 'w') as w:
        w.write(t)
    print("File created -> " + f_create)

However, this time we can call it like so:

$ fab make_entry:"New Post"

Fabric alternatives

There are tonnes of alternatives that allow you to make a simple command line interface quickly:

Click : A simple to use CLI by Armin Ronacher.
Clip : Like click, but newer, and has more features
Argparse : This package comes with python by default. Its not the easiest to use package out there, but if you don't want to install a new package, then this should work fine.

All three work with python 3.

Themes

Using a Theme

Pelican has many themes to choose from its themes repository. These are the steps:

Clone a theme repo, or if you want clone the entire themes repository.
Inside pelicancon.py file, change the THEME variable, to the directory of your theme.

On a side note, if you're looking for a place where you can view a screenshot of all the different themes, this site has nice screen-shots of each.

For example, if you wanted to use the maggner-pelican theme:

Clone it into your project directory, git clone https://github.com/kplaube/maggner-pelican.git
Change the THEME variable to 'maggner-pelican' and you're done :)

Creating Themes

Creating themes in Pelican is well documented in their documentation about theme creation. Although creating themes is beyond the scope of this post (because there's a lot that could be said), here are a few pointers:

Target one format, if you try to create a theme for multiple formats like both rst and md, you're doing to run into a lot of issues.
Using preprocessors can be more effective than using raw css since they allow you create a single css file and provides you with a lot more flexibility.
If you wish to embed disqus or google comments or other third party dynamic content, then I suggest that you use separate include files instead of plugins.

Settings

The Two Settings Files

By default, pelican comes with both a pelicanconf.py and publishconf.py. When you are writing content and wish to see a preview, the settings in pelicanconf.py are used. When you choose to publish, for example to github or some other place, the settings in pelicanconf.py and publishconf.py are used as pelicanconf.py is imported in 1 below:

#!/usr/bin/env python
# -*- coding: utf-8 -*- #
from __future__ import unicode_literals

# This file is only used if you use `make publish` or
# explicitly specify it as your config file.

import os
import sys
sys.path.append(os.curdir)
from pelicanconf import *  # 1

(...)

How Settings Work

By convention all settings are capitalized:

#!/usr/bin/env python
# -*- coding: utf-8 -*- #
from __future__ import unicode_literals

AUTHOR = u'Nafiul Islam'
SITENAME = u'fUNNY bLOG'
SITEURL = ''

If we take a look in the default theme, they are used like so:

{% if DISQUS_SITENAME and SITEURL and article.status != "draft" %}
<div class="comments">
<h2>Comments !</h2>
<div id="disqus_thread"></div>
<script type="text/javascript">
var disqus_shortname = '{{ DISQUS_SITENAME }}';
var disqus_identifier = '{{ article.url }}';
var disqus_url = '{{ SITEURL }}/{{ article.url }}';
(function() {
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
(...)

Here, SITEURL AND DISQUS_SITENAME are both settings that are described in either pelicanconf.py or publishconf.py. They appear as variables inside the jinja template. [6]

Most settings that we put into our settings files are used in the theme, however, some settings like DEFAULT_PAGINATION are used to determine other things.

Writing Content

Pelican supports reStructuredText as its default document format. However, there is also support for markdown, ipython notebook, asciidoc and probably much more. To write in any of these formats, one needs to install the plugin associated with the file extension, such as asciidoc_reader for the asciidoc format and pelican-ipynb for the ipython notebook format. [7]

I would suggest reStructuredText, since it has the most features and can offer many ways to extend the syntax with features such as directives and interpreted roles. For example, reStructuredText supports footnotes out of the box, whereas if one were to use markdown, one would need to install a separate plugin and attach it to pelican. [8]

However, pelican has many third party plugins for markdown documents as well so if you prefer using markdown, then please do so, I only recommend reStructuredText because its easy to extend.

Previewing Content

devserver

The default way to see a preview of the generated blog is to use make devserver, this regenerates your blog every time that you change your content folder. This option is a horrible way to preview your blog for a couple of reasons:

You need to hit the refresh button in your browser to see the changes after a save. This can often get tiring.
make devserver runs two processes in background for no good reason, the processes log data into the console anyways, so why not just have a tool that generates your content and terminates when you hit ctrl + C ?
Furthermore, in the transformation from your chosen format to html, if you hit an error, the daemon that keeps transforming your files to html also stops, and the error is often buried deep inside the logs

livereload

We need a solution that does two things for us:

Allows us to see live previews on each save without having to refresh the browser.
Does not stop unexpectedly from an error.
Has an easy to use CLI

The solution is to use a neat little package called livereload and can be installed through:

$ pip install livereload

Once this has been installed, we need to install the livereload plugin for our respective browser. For example in chrome:

In firefox, we head over to addons.mozilla.org and search for livereload:

Once this has been installed, we need to run the livereload server. I do this by adding an extra function to my fabfile.py file:

import livereload

def live_build(port=8080):

    local('make clean')  # 1
    local('make html')  # 2
    os.chdir('output')  # 3
    server = livereload.Server()  # 4
    server.watch('../content/*.rst',  # 5
        livereload.shell('pelican -s ../pelicanconf.py -o ../output'))  # 6
    server.watch('../naffy/',  # 7
        livereload.shell('pelican -s ../pelicanconf.py -o ../output'))  # 8
    server.watch('*.html')  # 9
    server.watch('*.css')  # 10
    server.serve(liveport=35729, port=port)  # 11

We have a lot to explain here so lets begin with 1 and 2. In both 1 and 2, we use the local function from fabric. local allows us to run shell commands like cd. In this case, we are using the make command because pelican comes with a Makefile. If we involve make help from the command line in our project directory, we get the following:

$ make help
Makefile for a pelican Web site

Usage:
   make html                        (re)generate the web site
   make clean                       remove the generated files

So, in 1 we clean out output folder, and in 2 we generate our html files from our content.

In 3, we change the directory to our newly created output folder and in 4 we create an instance of the livereload server. In 5, we use the server.watch function to watch all the rst files since my content is written in reStructuredText. We can change this to ../content/*.md for markdown files for example.

server.watch had one required argument and two optional arguments. The first argument is filepath, the second is func, a function that you run every time watched files are changed. The last argument is delay, which delays function execution. [9]

In 6, we specify the function to be run as livereload.shell('pelican -s ../pelicanconf.py -o ../output'). The shell function from livereload allows us to run shell functions every time a watched file is changed. In this case, we ask pelican to remake our html files. Why was make html not used here? This is because the make command looks for a Makefile and we are right now in our output directory.

In 7, the server is directed to watch the "naffy" directory. The "naffy" directory is the directory that houses my theme; if any change is made to my theme, livereload refreshes the server. Replace "naffy" with whatever folder you are housing your theme in. If you're using the default theme, then do not add 7 and 8. 8 is the same as 6, we just rebuild the html.

In 9 and 10, the server is further instructed to watch html and css files that change within the output directory.

Finally in 11, we tell livereload to serve on port 35729. This is the default port for the livereload plugin [10]. The port variable is the port of the server serving the files, the liveport variable is a port that livereload uses to make changes directly to the HTML file.

Plugins

Using Plugins

Most plugins in pelican tell you how to install them. For example, the render_math plugin allows you to embed math inside your posts. Here's the basic gist of what you need to do in order to install them:

Download the plugin package into your plugins folder, which is set inside your pelicanconf.py file in the PLUGIN_PATHS variable.
Then append the name of your plugin package inside your pelicanconf.py file. [11]
Follow any other additional instructions that your plugin might have.

If just adding the name does not work, then one might consider adding <plugin_name>.<plugin_file>. [12]

Creating Plugins

Creating plugins in pelican basically means that you create a function to do your processing. You then register that function as a plugin. Creating plugins is well documented and an example of creating a reader is given.

In my use of pelican, more often than not, I've needed to create plugins that are process text in a given document, not add support for new document formats. So, I'll show you how I created a plugin for adding keyboard keys like alt and ⌘ .

I like to create my plugins as packages, but you can create them as simple files as well. So, inside our plugins folder, create a new package, and inside the package, we create a new file for the plugin:

$ cd plugins  # Or whatever your plugins folder is called
$ mkdir keyboard
$ cd keyboard
$ touch __init__.py kb.py
$ ls
__init__.py kb.py

Inside kb.py, we can need a function that modifies existing code, and a function that registers the function:

from pelican import signals

import logging
import re

logging.Logger(__name__)


def content_object_init(instance):
    """
    Provides the key plugin, make sure that you have Keys.css, http://michaelhue.com/keyscss/
    imported inside your HTML. How to use:

        So you hit [kb:CTRL] + [kb:ALT] + [kb:DEL] when in doubt

    Note, that light keyboard keys are enabled by default.
    """
    if instance._content is not None:  # 1
        content = instance._content
        new_content = re.sub(r"\[kb:(.+?)\]", r'<kbd class="light">\1</kbd>', content)  # 2
        instance._content = new_content
        return instance


def register():
    signals.content_object_init.connect(content_object_init)  # 3

In 1, we check if the article, page or blog post in question has content; essentially does it have a body. If we don't have a body, then we skip this object. In 2, replace with the corresponding stylized version, so that typing [kb:alt] will give you alt .

In 3, we register the function with pelican. Registering this plugin is simple, inside pelicanconf.py, we merely append to the list of PLUGINS:

(...)
PLUGINS += ['keyboard.kb']
(...)

I used key.css to style my keys, and I simply added the css file to my theme's article.html.

Once can create far more interesting plugins than the one I've shown, by looking at the plugin repository for more interesting plugins or even reading the documentation (although its not exhaustive in its explanations).

Hosting on Github Pages

Github is the best way to host your static blog. Some notable sites which get a substantial amount of traffic are hosted on github like Bootstrap, Yeoman, Pythonic Perambulations are hosted on github pages. If you don't have a github account, creating one is simple and quick.

Creating the repo

In order to get a site name like username.github.io, one needs to create a repo with that name on github, set the visibility to public and do not initialize with anything, you want an empty repo that you can push to.

ghp-import

ghp-import is a simple plugin that allows you to easily push your content to a github repo. You can install it using pip:

$ pip install ghp-import

There are two steps to doing this:

$ ghp-import output
$ git push git@github.com:<username>/<username>.github.io.git gh-pages:master

This will push the files in your output folder directly to the master branch of the <username>.github.io repository. It may take some time initially, but you will be able to see your fully generated blog on this domain.

Custom domains

In order to get your own top level domain like I have with nafiulis.me, you simply need to add a CNAME file with your domain. The CNAME file must me located inside your output directory and must only have the domain name. My CNAME file looks like this:

nafiulis.me

The CNAME file must be included when you are pushing to your github repo.

Once this is done, you need to link your domain up with github, and this is wonderfully explained in this article

Automating Publication

I use fabric to automate my publication, I just have two functions to do this:

def enter_dns_file():  # 1
    with open('output/CNAME', 'w') as f:
        f.write('nafiulis.me')


def github(publish_drafts=False): # 2

    try:  # 3
        if os.path.exists('output/drafts'):
            if not publish_drafts:
                local('rm -rf output/drafts')
    except Exception:
        pass

    local('ghp-import output')  # 4
    local('git push'
          'git@github.com:<username>/<username>.github.io.git
          'gh-pages:master') # 5
    local('rm -rf output')  # 6

In 1, the function enter_dns_file is used to insert the CNAME file into the output folder before pushing to github because in 6, we delete the entire output folder.

In 2, the github function is used to publish the content of the output folder to our github repo, with the option of publishing drafts which is False by default. The try-catch block in 3 is there to check if we have any drafts in the first place, and if the publish_drafts variable is True, then we publish them.

From 4 to 5, we push the content to the repository. Replace <username> with your username.

Acknowledgements

I would like to thank Ashwini Chaudhury, Karan Sikka, Mahmud Ridwan, Sahib bin Mahboob and Keiron Pizzey for proofreading this article before publication.

[1]	Assuming that you are proficient in Python.

[2]	Don't use virtualenv, just use your base python when using windows. If using Windows, make sure that you have 2.7.9 or 3.4.2, both of which have "ensurepip".

[3]	Pelican has its own plugin repository on github. Please note however, that some plugins are defunct and are not supported by newer versions of pelican. https://github.com/getpelican/pelican-plugins

[4]	If you cannot access this, restart your virtualenv if you are currently in one. Otherwise use restart your shell.

[5]	Check out the wall of superpowers if this has changed http://python3wos.appspot.com/

[6]	This piece of code is taken from the default theme that comes with pelican called "notmyidea" https://github.com/getpelican/pelican/blob/3.5.0/pelican/themes/notmyidea/templates/article.html#L17

[7]	The ipython notebook plugin is one example of a plugin that you can't find on the official plugin repository. This may be, because the plugin does not support the newest version, Pelican 3.5, since the plugin says that it required pelican 3.4 to function.

[8]	reStructuredText also allows for more versatile footnotes as well. Check out their quickstart guide.

[9]	Livereload has documentation on the server class on its documentation website

[10]	I cannot find the same options for chrome, hence I use firefox to see live builds of the generated site

[11]	In some documentation regarding plugins, the pelicanconf.py file is called settings.py.

[12]	Sometimes, Pelican packages do not import the functions necessary inside their `__init__.py` file. In this case you import one of the plugins's files. In most cases it is the file with a `register` method.

Kotlin Koans I - The Beginning

2014-11-22T21:32:00+06:00

Table of Contents

Kotlin is a programming language created by Jetbrains. It runs on both the Java Virtual Machine (JVM) and can be compiled to JavaScript.

I've been meaning to try out Kotlin for quite some time. After my foray into Scala, which brought some very dramatic changes to the Java universe, my interest in JVM based languages was re-invigorated.

Back in 2013, Kotlin seemed like an interesting language because Jetbrains, a heavy investor in the JVM, was promoting it as an industry first programming language. I knew Kotlin also was going to have excellent tooling support, because after all Jetbrains is an IDE company. It wasn't until recently that Kotlin had a tutorial series called the Kotlin Koans.

So why should you learn Kotlin? Well that is what I will be trying to answer throughout this series. We will go through the Kotlin Koans, and find out the differences between Java and Kotlin as well as other languages, but in short [1]:

No semicolons
First class functions, meaning they do not need to be declared within classes
Lambdas
An awesome collections library
Small runtime
data classes, like case classes in Scala
Good interoperability with existing Java code

Getting Started

Installing Kotlin

The simplest way to install Kotlin is to download IntelliJ IDEA. The community edition supports Kotlin so you can go ahead and download it. After this, we can install the Kotlin plugin, which will automatically install the Kotlin runtime as well:

Hello World in IntelliJ IDEA

Once this has been installed, getting started with your first hello world project is simple. Hadi Hariri [2] has created a bunch of videos on getting started. Here, he demonstrates how to create a new Kotlin project:

Dash doctests

Kotlin also has Dash [3] doctests, provided by a third party:

However, this merely is a copy of the online documentation available for Kotlin, and does not actually have a searchable API reference. The code completion in IDEA somewhat mitigates this though. The Jetbrains team is still working on modernizing the reference for Kotlin at the moment.

Cloning the Koans

The next part is just cloning the Koans repository. One can just use the command line like so:

git clone https://github.com/jetbrains/workshop-jb

Or checkout from the IDEA start menu:

Koans Part I: Introduction

I'm going to go through all the exercises in the Koans. I will share how I solved these problems, with the challenges I've had to face as well as examples of solutions themselves. One can merely continue reading to see a list of features that Kotlin has, or one can follow along using the consecutive sections as an answer sheet.

0: Functions

The goal of the Koans is to make all the tests pass. Lets head over to our first exercise:

Inside the file, we have todoTask0 and task0. todoTask0 gives us the information we need to complete the task. We essentially need to make it return "OK". This seems easy enough:

fun task0(): String {
    return "OK"
}

That was easy. We have a function here that returns a String.

1: Collection to String

In this task, we need to convert a given collection into a string. We are allowed to copy and paste the Java code and transform it into Kotlin code:

You can just copy-paste it and agree to automatically convert it to Kotlin - but only in this task :).

It is unfortunate, however, that todoTask1 does not give us any extra information other than telling us to just "Rewrite JavaCode1.task1 to Kotlin". So, I decided to take a look at the test function located here:

We can see that we are asked to turn the collection of numbers into a comma delimited string, with braces at the beginning and at the end:

class _01_Functions() {
    test fun collection() {
        Assert.assertEquals("{1, 2, 3, 42, 555}", task1(listOf(1, 2, 3, 42, 555)))
    }
}

Copying the Java code from JavaCode1.task1 results in this:

val sb = StringBuilder()
sb.append("{")
val iterator = collection.iterator()
while (iterator.hasNext()) {
    val element = iterator.next()
    sb.append(element)
    if (iterator.hasNext()) {
        sb.append(", ")
    }
}
sb.append("}")
return sb.toString()

Not very different from the actual Java code. However, I chose to use the following function:

fun task1(collection: Collection<Int>): String {

    val sj = StringJoiner(", ", "{", "}")

    for (item in collection) {
        sj.add(item.toString())
    }

    return sj.toString()
}

I don't know if my version has worse performance than the converted Java Code, but it sure passes the test:

In this I chose to use Kotlin's for loop that supports collections.

2.1: Default Parameters

In this exercise, we need to convert an overloaded Java class into a simple Kotlin function with default parameters. The function in question is foo:

fun foo(name: String): String = todoTask2_1()

At first this was confusing because it had been a long time since I had used a language that did not have default parameters. Secondly, foo already seemed to have a function body. Thirdly, there seemed to be no instructions on what the function did; you seemingly had to read the Java code to understand. I decided to go back to the tests again and see what was up.

We have to rewrite the function foo to have three parameters:

name which is a String. Required
toUpperCase which is a Boolean value. Optional, defaults to false
number which is a Int. Optional, defaults to 42

With that out of the way, this is what I ended up with:

fun foo(name: String, number: Int = 42, toUpperCase: Boolean = false): String {
    return if (toUpperCase) name.toUpperCase()+number.toString() else name+number.toString() // 1
}

fun task2_1(): String {
    return (foo("a") +
            foo("b", number = 1) +
            foo("c", toUpperCase = true) +
            foo(name = "d", number = 2, toUpperCase = true))
}

In 1 I use a ternary operator, the equivalent in a full if-else statement would be:

if (toUpperCase) {
    return name.toUpperCase() + number.toString()
} else {
    return name + number.toString()
}

Compared the above to the amount of Java code one would have to write:

public class JavaCode2 extends JavaCode {
    private int defaultNumber = 42;

    public String foo(String name, int number, boolean toUpperCase) {
        return (toUpperCase ? name.toUpperCase() : name) + number;
    }

    public String foo(String name, int number) {
        return foo(name, number, false);
    }

    public String foo(String name, boolean toUpperCase) {
        return foo(name, defaultNumber, toUpperCase);
    }

    public String foo(String name) {
        return foo(name, defaultNumber);
    }
}

2.2: Collections functions

In 1: Collection to String, we wrote some Kotlin to turn a collection of numbers into a string, enclosed by braces. In this exercise, we just have to use on line of code to achieve the same thing using joinToString:

fun task2_2(collection: Collection<Int>): String {
    return collection.joinToString(", ", "{", "}");
}

3: Lambdas

This exercise essentially asks us to use lambdas to search for a value, in this case 42 in a list of integers. I could not find what I was looking for in the exercise file itself. For a person who is not used to Kotlin's lambda syntax this exercise might seem a little confusing. I happened to find out how to use lambdas in Kotlin from a blog post. In the beginning, I ended up with the following:

fun task3(collection: Collection<Int>): Boolean {
    return collection map { x -> x % 42 == 0} contains true
}

Here, we learn about Kotlin's new syntax for lambdas. One does not have to use braces, but they are allowed:

fun task3(collection: Collection<Int>): Boolean {
    return collection.map({ x -> x % 42 == 0}).contains(true)
}

However, a friend of mine pointed out that map is not needed. Instead, any [4] [5] is a better solution:

fun task3(collection: Collection<Int>): Boolean {
    return collection any {i -> i % 42 == 0}
}

But, one can go even further. Instead of declaring a lambda that takes in a parameter, one can use the implicit iterator created, it shown in 1 below:

fun task3(collection: Collection<Int>): Boolean {
    // ---------------------vv------------------+
    return collection any { it % 42 == 0 }  //  1
}

4: String templates

In this exercise, we need to make a regex to match the patterns in the tests. The regex for months is already made:

val month = "(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)"

All that we need to do is use this variable inside the task4() function like so:

//                    |||                                        |||
// -------------------vvv----------------------------------------vvv
fun task4(): String = """(\w*) (\w*) \((\d{2}) ${month} (\d{4})\)"""

Above, we stick in the month variable, we could have also used $month instead of ${month}. The string that we're using is triple quoted, or a multi-line string.The difference is that in a multi-line string, special characters don't need to be escaped.

5: Null safety

This exercise is meant to demonstrate Kotlin's safety features. We have to change the body of sendMessageToClient to do the following:

If client is not null, then we get its personalInfo
If personalInfo is not null, then we get the email from it
If both email and message are not null, then we send a message using mailer.sendMessage. Otherwise, we return.

Initially, I ended up with the following:

fun sendMessageToClient(client: Client?, message: String?, mailer: Mailer) {

    val personalInfo = client?.personalInfo  // 1
    val email = personalInfo?.email          // 2

    if (email != null && message != null) {
        mailer.sendMessage(email, message)
    } else {
        return  // 3
    }
}

I use two statements, 1 and 2 to get the data I needed. This However, can be done in a better way:

val email = client?.personalInfo?.email

In 3, we just have an empty return statement. When a function returns nothing, it is actually returning Unit. Unit is a singleton and does not have to be return explicitly [6]. In other words, this function will work just fine:

fun sendMessageToClient(client: Client?, message: String?, mailer: Mailer) {

    val email = client?.personalInfo?.email

    if (email != null && message != null) {
        mailer.sendMessage(email, message)
    }
}

In the end, we need email and message to use mailer.sendMessage. Even in something like Python, one would need multiple if statements to get this done, and plenty of hasattr calls.

6: Smart Casts & Recursion

In this exercise, we need to write a recursive function using Kotlin's when keyword. The basic gist of the problem is that we have:

An abstract class called Expr
A class called Num that represents a number.
A class called Sum that represents the sum of two Num types.

With this, we need to create a function that:

Returns the expression of a summation or a number. For example:
- print(Num(2)) should return "2"
- print(Sum(Sum(Num(1), Num(2)), Num(3))) should return "1 + 2 + 3"

Hence, if we are given a Sum object, we need to call the function again breaking the sum down into two parts, the left and the right, as shown in Sum's function definition:

class Sum(val left: Expr, val right: Expr) : Expr()

With that in mind, we arrive at the following function, which sadly is called print:

fun print(expr: Expr): String {  // 1
    when (expr) {  // 2
        is Num -> return expr.value.toString()  // 3
        is Sum -> return print(expr.left) + " + " + print(expr.right)  // 4
        else -> throw IllegalArgumentException("Unknown expression")  // 5
    }
}

In 1 we declare the function print that takes in a type of Expr and returns a String. In 2, when feels like a switch statement. If expr is of instance Num, then we return the value of expr as a String in 3.

However, if expr is of instance Sum, then we break down the expr, and call print on its left and right properties in 4.

Otherwise in 5, we throw an IllegalArgumentException error because expr is neither a Sum or Num.

In 4, we could have also used string templates instead, like so:

fun print(expr: Expr): String {
    when (expr) {
        is Num -> return expr.value.toString()
        //                vvvvvvvvvvvvvvvvvvv   vvvvvvvvvvvvvvvvvvvv
        is Sum -> return "${print(expr.left)} + ${print(expr.right)}"
        else -> throw IllegalArgumentException("Unknown expression")
    }
}

In this example, we do not need to explicitly cast expr as Num or Sum, like we have to do in Java:

(...)
if (expr instanceof Num) {
        return "" + ((Num) expr).getValue();
}
(...)

7: Data Classes

Data classes are like case classes in Scala, they're for storing data, and have a lot of useful helper functions baked in, like equals and hashCode and the obligatory toString. In this exercise, we really don't have a lot to do, we just have to read through some of the code samples. One thing I feel that the samples missed out on showing that data classes can also have optional parameters. The following code:

data class Account(val name: String, val account_id: String, val active: Boolean = true)

fun main(args: Array<String>) {
    val ac = Account("Nafiul Islam", "A2231200ODSSDF4%%32123")
    print("${ac.name}, ${ac.account_id}, ${ac.active}")
}

Outputs:

Nafiul Islam, A2231200ODSSDF4%%32123, true

8: Extension functions

The Kotlin team came up with a pretty neat euphemism for what is essentially monkey-patching. Extension functions allow you to patch a class with extra methods. For example:

fun String.lastChar() = this.charAt(this.length - 1)

fun main(args: Array<String>) {
    print("Hello World!".lastChar())  // Outputs: !
}

This is pretty neat, we can actually monkey patch stdlib classes. So, without much further ado, we need to create extension functions on Int, to allow two new constructor methods for a RationalNumber. One being, Int.r(), which allows us to create a rational number like so:

4.r()  // Outputs RationalNumber(4, 1), in essence 4/1

As well as:

Pair(10, 20) // Outputs RationalNumber(10, 20), in essence 10/20

This is one way of implementing such auxiliary constructors:

data class RationalNumber(val numerator: Int, val denominator: Int)  // 1

fun Int.r(): RationalNumber {
    return RationalNumber(this, 1)  // 2
}

fun Pair<Int, Int>.r(): RationalNumber {
    return RationalNumber(this.first, this.second)  // 3
}

In 1, we have declared our RationalNumber data class. This is already given to us in the exercise.

In 2, we return an a RationalNumber, the first parameter is this, and the second 1. Basically, creating 4/1.

In 3, we extend Pair, this is an inbuilt class from the stdlib. In this case, we're extending a pair of two Int variables. Code completion brought up first and second fields for Pair. However, the documentation did not say it had these fields:

Hitting ⌘ + B [7] brought this up inside Tuples.kt (which appears to be part of the stdlib):

public data class Pair<out A, out B>(
        public val first: A,  // 1
        public val second: B  // 2
) : Serializable {
    override fun toString(): String = "($first, $second)"
}

We can see that in 1 and 2, first and second is declared as variables.

9: Extending Collections

In this exercise, we have to extend collections. Following the convention of the other exercises before, we need to translate some Java code to Kotlin code. In this case, the Java code does the following:

Group strings present in collection by length.
Find the maximum size of all the groups, meaning find the group that has the most number of members. Stored in maximumSizeOfGroup.
Return the members of the group with the highest number of members.

After deciphering the Java version of the program in JavaCode9.java, I managed to figure out what needed to be done. Here are the results:

fun doSomethingStrangeWithCollection(collection: Collection<String>): Collection<String>? {
    val groupsByLength = collection. groupBy { s -> s.length }

    val maximumSizeOfGroup = groupsByLength. values(). map { group -> group.size }. max()

    return groupsByLength. values(). firstOrNull { group -> group.size == maximumSizeOfGroup }
}

10: Object Expressions

This exercise shows us how to modify object instances on the fly. In this case, we need to modify MouseAdapter to register the number of clicks that it receives. So, following the examples, we can see that we need to use the following syntax:

val foo: Foo = object : Foo() {
    override fun foo() {
        // Code goes here
    }
}

With that set, lets modify our own instance of MouseAdapter:

fun task10(handleMouse: (MouseListener) -> Unit): Int {
    var mouseClicks = 0

    val customAdapter : MouseAdapter = object : MouseAdapter() {
        override fun mouseClicked(e: MouseEvent) {  // 1
            super<MouseAdapter>.mouseClicked(e)  // 2
            mouseClicks += 1  // 3
        }
    }

    handleMouse(customAdapter)  // 4
    return mouseClicks  // 5
}

In 1 we override the function that handles mouse clicks. In 2, we make a call to the super function. When doing this, I did not know how to make this call and found it well documented in the reference [8].

In 3, we increase the number of mouse clicks in our mouseClicks variable. In 4, we register the handler and finally in 5, we return the number of mouse clicks.

Takeaway

The Kotlin Koans have a lot of material packed in. If you're coming from a language other than Java, it might be a little difficult to understand what is going on. I wish it had more language agnostic instructions, as Kotlin is targeting JavaScript platforms as well.

What I've learnt from the Introduction is that Kotlin is a ridiculously powerful language. Some of its features can even make Python sweat. The parts I liked most were lambdas, null safety features and extension functions.

In total there are six parts to this series. We've just completed the first. I just can't wait for more :)

Acknowledgements

A big thanks to the following people:

Taro Nagasawa
Keiron Pizzey
Ilter Canberk
Sahib Bin Mahboob
Jeong Min-Lee
Ashwini Chaudhury
Hadi Hariri

[1]	If I've made a mistake in this list, or if you have any other suggestions that I've missed out, please do not hesitate to add.

[2]	Hadi Hariri is an evangelist at Jetbrains.

[3]	Dash doctests are also supported by other tools on different platforms. Check out Zeal.

[4]	有難うございます。(Thank you very much) to Taro Nagasawa for pointing this out to me :)

[5]	Implicit iterators talked about towards the end of "Higher Order Functions" http://kotlinlang.org/docs/reference/lambdas.html

[6]	Andrey Breslav, the lead designer of the language commented on Unit https://confluence.jetbrains.com/display/Kotlin/Functions?focusedCommentId=40704501#comment-40704501

[7]	For Win/Linux, one can use `CTRL` + `B`, in essence "Go to Source".

[8]	In the section called "Overriding Rules" http://kotlinlang.org/docs/reference/classes.html#overriding-rules

Making dash doctests using sphinx and doc2dash

2014-10-24T11:16:00+06:00

Dash is an offline documentation search engine. I've been using Dash a lot recently. It has very fast search and supports many libraries out there. Zeal is a free and ope source alternative to Dash that uses Dash doctests for offline documentation that supports all platforms. Other platform specific tools also exist such as Velocity for Windows and LovelyDocs for Android.

Dash supports many libraries however, libraries like Fabric and requests are not officially supported; their dash compatible doctests can be generated though. So, I thought I'd make a quick tutorial on how to generate doctests for third party libraries:

A Todo app with flask and Pony

2014-10-09T12:30:00+06:00

Table of Contents

Installation
Routes
Using Pony
References
Update
P.S.

Pony is a new ORM that has some very nice features especially when it comes to querying. It's still in the works, missing features like migration and py3 support (although version 0.6 supports py3 and already has a release candidate), but the creators tell me that both these features will be made available by the end of this year. So, I decided to make a simple todo app with pony, flask and flask-restful. You can get the source code from github.

Installation

First, we need to set up our project and install a couple of requirements:

pip install flask flask-restful pony

Routes

We are going to be making an API for out todo list. A todo item will have three things:

id, the primary key
data, what the item is about e.g. write a tutorial on pony
tags, what type of item it is, e.g. work, important, home

In order to do this, lets create our routes:

/ will show you all the todo items on a GET request. One can add new items with a PUT request.
/<int:todo_id> will allow a GET request, and a DELETE request. that shows you information regarding the todo item with that id.
/tags/, will show you all the tags that are available, with links to those tag elements
/tags/<int:tag_id> will show you the corresponding tag.

Thus, a tag must have an id and a value, which will likely be a string. Lets get to making our routes then:

# Inside app.py
from flask import Flask
import flask.ext.restful as rest

app = Flask(__name__)
api = rest.Api(app)


# Resource ######################################################################
class Todos(rest.Resource):

    def get(self):
        """Will give you all the todo items"""
        return {}

    def put(self):
        """Payload contains information to create new todo item"""
        return {}


class TodoItem(rest.Resource):

    def get(self, todo_id):
        """
        Get specific information on a Todo item

        :param todo_id: The Todo Item's ID, which is unique and a primary key
        :type todo_id: int
        """
        return {}


class Tags(rest.Resource):

    def get(self):
        """Will show you all tags"""
        return {}


class TagItem(rest.Resource):

    def get(self, tag_id):
        """
        Will show you information about a specific tag

        :param tag_id: ID for the tag
        :type tag_id: int
        """
        return {}

# Routes #######################################################################
api.add_resource(Todos, '/', endpoint='Home')
api.add_resource(TodoItem, '/<int:todo_id>', endpoint='TodoItem')
api.add_resource(Tags, '/tags/', endpoint='Tags')
api.add_resource(TagItem, '/tags/<int:tag_id>', endpoint='TagItem')


if __name__ == '__main__':
    app.run(debug=True)

The get and put methods inside the rest.Resource objects indicate the types of methods that you can use to interact with that route. The api.add_resource(...) function allows you to register the API handler, i.e. a rest.Resource object and a route. the endpoint is optional.

The endpoint is an optional parameter and gives its route a reference, which is useful for redirection.

Using Pony

Right now, all the API will return are empty json objects. Before we actually start returning stuff, lets make our models:

# Inside models.py
from pony import orm  # 1

db = orm.Database()  # 2


class Todo(db.Entity):  # 3

    _table_ = 'Todos' # 4

    data = orm.Required(unicode)  # 5
    tags = orm.Set("Tag")  # 6


class Tag(db.Entity):

    _table_ = 'Tags'

    name = orm.Required(unicode, unique=True)  # 7
    tags = orm.Set("Todo") # 8

The first thing we do is import pony.orm in 1 . In most of the documentation, they import everything via from pony.orm import *, but since I like namespaces, I prefer using orm. pony.orm houses everything you need to create and query objects.

We initialize the database using orm.Database() in 2. This instance of the pony.orm.Database class stores all the data needed to create the tables that your models define.

In 3, we create an Entity object by inheriting from db.Entity. Note that we use the db instance, and not any import from pony.orm. Its similar to SQLAlchemy's sqlalchemy.schema.MetaData object.

In 4, we set the table name to Todos, its a lot like __tablename__ in SQLAlchemy.

In Pony, you can have either Required columns or Optional columns (described here). In 5, we set data to be a Required field of type unicode. Note that this unicode class is not from the pony.orm namespace.

You can also have many-to-many types in pony, using Set in 6 and 8. In 7, we set unique to be True. You have a couple of optional attributes that you can add to Required or Optional fields. (Required and Optional Explained in more detail)

Notices any id anywhere? No? Well, thats because pony implicitly creates the id column for you. (More about implicit ID creation)

Now, in app.py we need to bind a database and then generate mappings:

# Inside app.py
(...)
from models import *

app = Flask(__name__)
api = rest.Api(app)

db.bind('sqlite', 'todo_api.db', create_db=True)
db.generate_mapping(create_tables=True)


# Resource #####################################################################
(...)

We first bind the objects, in this case Tag and Todo to the sqlite database. (More on binding databases)

We set create_tables and create_db to True because this is a new database, and no tables have been created yet. (More information regarding creating tables)

generate_mapping generates all the SQL needed to create the tables. If you actually set orm.sql_debug(True), then all the SQL that is being generated will be logged into your console.

These are what the generated tables look like:

Querying

Lets get to finding stuff from the database. First, we need to show all the todo items in our database for our Todos.get function:

# Inside app.py
class Todos(rest.Resource):
    def get(self):
        """Will give you all the todo items"""

        with orm.db_session:  # 1
            return {
                item.id: {
                    'task': item.data,
                    'tags': [tag.id for tag in item.tags]  # 3
                }
                for item in Todo.select()  # 2
            }

We are using a context manager, orm.db_session in 1. This treats everything that pony does within the code-block as a transaction, you don't have to worry about committing. We are going to return a list of all the items according to their id, we have the 'task' value in the return dictionary provide the information regarding the Todo item. As for 'tags' we loop through the values in the item's tags and give the id. Note that in 2, we are looping through all the Todo items in the database, Todo.select() is the same as orm.select(tdo for tdo in Todo).

However, the id of the tags alone are not enough information about the tags associated with a certain Todo item, it would be better if we could generate links to the tag pages. We can actually add a property inside Tag as url:

# Inside models.py

(...)
@property
def url(self):
    return "http://localhost:5000/tags/{}".format(self.id)
(...)

We use property because we will be able to access it as if it were any other attribute.

Now that we have added this property, we can call the url property instead of the id:

# Inside app.py
(...)
'tags': [tag.url for tag in item.tags]  # 3
(...)

Now that this is done, lets send a GET request to the / url:

quazinafiulislam@Nafiuls-Mac: ~
 $ http GET http://localhost:5000/
HTTP/1.0 200 OK
Content-Length: 3
Content-Type: application/json
Date: Thu, 09 Oct 2014 10:32:00 GMT
Server: Werkzeug/0.9.6 Python/2.7.6

{}

Creating Objects

So that (probably) works! We won't know until we've added a Todo item. We haven't inserted anything into our database yet so lets do that. We are going to be accepted PUT requests for our / route. In order to do this, we need to import the json module, as well as flask.request:

# Inside app.py
import json  # New import

from flask import Flask, request  # New import
import flask.ext.restful as rest
(...)

Lets start by working on our Todos.put function. This is how the data is going to come in:

{
    "data": "Buy Milk!",
    "tags": ["work", "high"]
}

In order to make this function work, we need to do a couple of things:

Extract data and tags from the json request.
Create a new Todo item.
Create new tags, if tags exist, then you need to use them instead of creating new ones.

In order to fulfill those roles, this is the function I ended up with:

# Inside app.py
(...)
def put(self):
    """Payload contains information to create new todo item"""

    info = json.loads(request.data)  # 1

    with orm.db_session:  # 2
        item = Todo(data=info['data'])  # 3

        for tag_name in info['tags']:  # 4
            tag = Tag.get(name=tag_name)  # 5
            if tag is None:  # 6
                tag = Tag(name=tag_name)  # 7
            item.tags += tag  # 8

    return {}, 200  # 9
(...)

In 1 we are loading the data from request.data, and turning that json string into a python dictionary. After doing so, we start our database session by using the orm.db_session context manager in 2.

We start off in 3 by creating a new Todo item. Note that our arguments are named/keyword arguments. We now loop over all the values in values in info['tags'] using tag_name as the temporary variable in 4.

In 5, we try to retrieve the tag if it exists in the database. If it doesn't 6, then we create a new one in 7. We then append this tag to the item's list of tags in 8.

In 9 we return an empty json object, with a 200 code, to say that everything happened smoothly. Note that in this entire function, it feels like there's no database. I just find this amazing that I feel as if I'm working with native python object stored in memory.

quazinafiulislam@Nafiuls-Mac: ~
 $ http PUT http://localhost:5000/ data="Buy rice" tags:='["work", "high"]'
HTTP/1.0 200 OK
Content-Length: 3
Content-Type: application/json
Date: Thu, 09 Oct 2014 11:36:04 GMT
Server: Werkzeug/0.9.6 Python/2.7.6

{}

Yay! So that worked. Or did it? Lets go back to our home page at / and lets see whats up:

quazinafiulislam@Nafiuls-Mac: ~
 $ http GET http://localhost:5000/
HTTP/1.0 200 OK
Content-Length: 166
Content-Type: application/json
Date: Sat, 11 Oct 2014 02:33:24 GMT
Server: Werkzeug/0.9.6 Python/2.7.6

{
    "1": {
        "tags": [
            "http://localhost:5000/tags/1",
            "http://localhost:5000/tags/2"
        ],
        "task": "Buy rice"
    }
}

Now that we're done with the Todo.get and Todo.put, lets do TodoItem.get:

class TodoItem(rest.Resource):
    def get(self, todo_id):
        """
        Get specific information on a Todo item

        :param todo_id: The Todo Item's ID, which is unique and a primary key
        :type todo_id: int
        """

        try:
            with orm.db_session:
                todo = Todo[todo_id]  # 1
                tags = [{tag.name: tag.url} for tag in todo.tags]  # 2

                return {
                    "task": todo.data,
                    "tags": tags
                }

        except orm.ObjectNotFound:  # 3
            return {}, 404

Since we are getting todo_id from the request itself, we can use it to get the Todo item with that id in 1. This syntax is the same as Todo.get(id=todo_id).

In 2, we are are creating a list of dictionaries from the tags in todo.tags. We are using tag.name and tag.url. If you just wanted to get the names of of the tags, you could do either of these:

tag_names = list(todo.tags.name)

tag_names = [tag.name for tag in todo.tags]

If you think I'm pulling your leg, you can check the source code and run it for yourself. In 3 if we cannot find the object, then we return a 404 error message back.

Now on to Tags.get:

class Tags(rest.Resource):
    def get(self):
        """Will show you all tags"""

        with orm.db_session:
            return {
                tag.name: tag.url
                for tag in Tag.select()
            }

Nothing new here, very similar logic to our / url. Finally in our TagItem.get we have:

class TagItem(rest.Resource):
    def get(self, tag_id):
        """
        Will show you information about a specific tag

        :param tag_id: ID for the tag
        :type tag_id: int
        """

        try:
            with orm.db_session:
                tag = Tag[tag_id]
                todos = list(tag.todos.data)  # 1

                return {
                    "tag": tag.name,
                    "tasks": todos
                }

        except orm.ObjectNotFound:
            return {}, 404

This is similar logic to that of TodoItem.get. The difference here is that in 1, we're getting all the data for the todos in a different way since we only need to know one thing. The code in 1 is the same as:

todos = [tag.data for tag in todo.tags]

Deleting Objects

All that's left is to find a way to delete objects from the database.

def delete(self, todo_id):

    try:
        with orm.db_session:
            todo = Todo[todo_id]  # 1

            if todo:
                tags = todo.tags.copy()  # 2
                todo.delete() # 3

                for tag in tags:
                    if not tag.todos:  # 4
                        tag.delete()  # 5
    except orm.ObjectNotFound:
        return {}, 400

    return {}, 200

In 1 we get the todo item like before. In 2, we copy the tags, because when we delete them in 3, we can no longer reference them through todo.tags, since the todo item is marked for deletion. If you do not do this, then you are going to get a OperationWithDeletedObjectError.

In 4 we check to see if the tag has todo items associated with it, if it doesn't then we delete the tag too in 5.

Now, a demonstration of the API that we've built:

References

Update

Last Updated on 15 Oct 2014 7:00 PM

I'd like to thank the pony ORM authors for their help in making this tutorial.

You can also take a look at the modularized version of the source code

P.S.

If you want to know what command line tool I used to make requests, I used httpie.

Testing out PyCharm's REST Client

2014-10-09T07:55:00+06:00

I've been trying out PyCharm's "Test RESTful Web Service" tool recently, just to check it out. I usually use requests, httpie or curl if I have to use command line tools, but the tool I use most is Advanced REST client.

Wanting to test out a sample API from Flask-Restful, I decided to take PyCharm's tool out for a spin. Here is the sample code from their quickstart page:

from flask import Flask, request
from flask.ext.restful import Resource, Api

app = Flask(__name__)
api = Api(app)

todos = {}

class TodoSimple(Resource):
    def get(self, todo_id):
        return {todo_id: todos[todo_id]}

    def put(self, todo_id):
        todos[todo_id] = request.form['data']
        return {todo_id: todos[todo_id]}

api.add_resource(TodoSimple, '/<string:todo_id>')

if __name__ == '__main__':
    app.run(debug=True)

GET requests are easy enough. You don't need to change anything from the defaults:

If you click the button indicated by the orange arrow, you will not have any query string in your URL. We will leave the request body empty:

After sending the request:

The client supports syntax highlighting for multiple formats, like html, json and xml. In this case, html has been highlighted. You can also check out response headers in the adjacent tab.

PUT, POST and DELETE are just as easy, but they do not have the defaults that you'd expect. For example, in requests, your content type is implicitly set for you to application/x-www-form-urlencoded:

>>> import requests
>>> req = requests.put('http://127.0.0.1:5000/todo1',
            data={'data': 'Make a tutorial on PyCharm\'s REST client' })
>>> req
<Response [200]>
>>> req.request.headers['Content-Type']
'application/x-www-form-urlencoded'
>>> req.content
'{\n    "todo1": "Make a tutorial on PyCharm\'s REST client"\n}\n'

This took me some time to figure out in PyCharm's REST client. I knew I was doing something wrong, but oddly enough it took me a while before I understood that it was because I had not set my content type in my header. I think its because curl, requests and the chrome extension all set this implicitly for you. The good news is that headers have code completion, but but not for values:

This time, we set the text to the following, since we want to add a new todo list item:

Voilà, our request succeeds:

You can do a lot more with the options that are available, like uploading from files, adding request parameters and setting cookies.

References

P.S.

This tool is also available in other jetbrains IDEs as well, like webstorm and intellij.

Understanding Schemas

2014-10-08T15:44:00+06:00

Table of Contents

Schemas are namespaces
Inside a Schema
Creating Schemas
References

I've been brushing up on my SQL lately, and one key concept that I forgot about where schemas. I rarely used them since I created most of my tables in the public schema (shame on me), but it was interesting to read up the documentation on them again.

Schemas are namespaces

In short, schemas are namespaces that house your functions, tables triggers etc. What's a namespace? Its like a container of addresses. The reason why we use them is to prevent cluttering of the main/public namespace.

In fact, every table that you've ever created inside postgres lives inside a schema. For example, in psql you can use \d and then hitting tab to show you what schemas are available:

The dots indicate that those options are schemas. There is always a schema involved when you create a table or a function. Generally, all your functions are put into the public schema. So, in the above example, we can see messages as a table, and it looks as though its not in a schema, but it is. One can also refer to it as public.messages:

test=# \d public.messages
   Table "public.messages"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer | not null
 data   | text    |
 date   | date    |
Indexes:
    "messages_pkey" PRIMARY KEY, btree (id)

test=# \d messages
   Table "public.messages"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer | not null
 data   | text    |
 date   | date    |
Indexes:
    "messages_pkey" PRIMARY KEY, btree (id)

Inside a Schema

To see what tables are inside of a schema, you can keep on using tab completion, use a wildcard or use a query. Tab completion is pretty self-explanatory. If you want to use a wildcard then use it like so:

exercises=# \dt cd.*
               List of relations
 Schema |    Name    | Type  |      Owner
--------+------------+-------+------------------
 cd     | bookings   | table | quazinafiulislam
 cd     | facilities | table | quazinafiulislam
 cd     | members    | table | quazinafiulislam
(3 rows)

We use the extra t in \dt to give us table information only. If you use used \d cd.*, then you'd get a lot more information regarding each table. You can try it out for yourself.

Another way to get the same information is use to use a query:

SELECT * FROM information_schema.tables WHERE table_schema = 'cd'

Note that this is also a schema, and this schema also exists under the public schema. What do I mean by that, well take a look at this:

exercises=# \d public.information_schema.tables
                       View "information_schema.tables"
            Column            |               Type                | Modifiers
------------------------------+-----------------------------------+-----------
 table_catalog                | information_schema.sql_identifier |
 table_schema                 | information_schema.sql_identifier |
 table_name                   | information_schema.sql_identifier |
 table_type                   | information_schema.character_data |
 self_referencing_column_name | information_schema.sql_identifier |
 reference_generation         | information_schema.character_data |
 user_defined_type_catalog    | information_schema.sql_identifier |
 user_defined_type_schema     | information_schema.sql_identifier |
 user_defined_type_name       | information_schema.sql_identifier |
 is_insertable_into           | information_schema.yes_or_no      |
 is_typed                     | information_schema.yes_or_no      |
 commit_action                | information_schema.character_data |

In essence, everything lives under the public schema.

Creating Schemas

Creating schemas are as simple as creating tables. Here's an example:

test=# CREATE SCHEMA happy_schema;
CREATE SCHEMA

When you create a schema, you can also create tables and functions under it in one command:

test=# CREATE SCHEMA ShoppingCenters
test-#     CREATE TABLE Malls (id integer PRIMARY KEY, name VARCHAR(100))
test-#     CREATE TABLE SuperMarkets (id integer PRIMARY KEY, name VARCHAR(100), capacity integer);
CREATE SCHEMA
test=# \dt shoppingcenters.*
                     List of relations
     Schema      |     Name     | Type  |      Owner
-----------------+--------------+-------+------------------
 shoppingcenters | malls        | table | quazinafiulislam
 shoppingcenters | supermarkets | table | quazinafiulislam
(2 rows)

This feels a lot like using python's with statement.

References

Why you should be using Pyenv

2014-07-15T00:48:00+06:00

TL;DR -> Install Pyenv its good for you. You'll thank me later.

So, a lot of people have issues with managing their different Python versions. There are plenty of "pythons" out there. There's cpython2, pypy, stackless, anaconda, cpython3 ...

The list could go on forever.

Thus it isn't a surprise to find people struggling with trying to leverage different versions of Python. I mean, things can get messy, when you have two versions of Python installed. I for example have five versions of Python installed.

Luckily, there's a tool out there that allows you to seamlessly change between the different versions of Python on your *nix operating system (sorry Windows folks, no dice). Its called Pyenv (you probably guessed that one).

On a mac, pyenv is pretty easy to install, using homebrew with:

brew install pyenv

For linux, you can use the following installer. Just to top it off here are the benefits:

Manage multiple versions of the Python interpreter, this includes not only the language versions but also different interpreters entirely.
Stay up to date with all the latest versions of Python that's coming out.
Change seamlessly between the different versions of Python, you have complete control over what the python alias means.

Watch out iPython here comes b(i)python!

2014-05-20T20:21:00+06:00

Table of Contents

Getting Started
A Few Quick Tips
Sweet code completion
Don't worry if you mess up
Share your code on the fly
Contacting the team
But what if you wanted both?
Update

So if you've been using Python for some time now, you've probably heard of iPython. This is pretty much the go to shell for most pythonistas who want to get a better command line debug experience. I find ipdb to be a godsend when I'm quickly trying to diagnose the problem.

But what about when you're trying to explore a library or even write some code quickly with plenty of auto-completion, or perhaps quickly demonstrate something, you might want to chose bpython. And if you wanted the best of both worlds, both ipython and bpython, there's bipython. But first, lets take a look at what bpython can offer us:

Getting Started

bpython works on *nix systems with ease, for windows users all you need to is use this installer to install the package. But *nix users, its much simpler:

pip install bpython

In my tests bpython worked with both python 2 and python 3 as well as pypy because its written in pure python and the only dependency bpython has is on pygments:

Once you've installed bpython, you can then just involve it from the command line:

A Few Quick Tips

The first thing I learn to do with any shell or any new software for that matter is how to get out out of it. The best way to get out of bpython is just to hit CTRL + D, and that will take you out.
bpython will automatically add the contents of your session to your console.
You can save you session with CTRL + S.
You can clear the bpython screen with CTRL + L.

Sweet code completion

I love code completion. How can you not love something that pretty much writes your code for you? As a person who's always looking out for better tools, bpython hits the spot. It's got the best code completion I've seen in any shell, it even puts fish to shame:

What I love about bpython is how fast the code completion pops up and the way documentation just pops up when you want to use a function.

Don't worry if you mess up

When I'm trying to reproduce something on the command line, I often make mistakes (although with bpython's code completion it's a lot harder to do). I'm trying to import asyncio here:

>>> import asycnio
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ImportError: No module named 'asycnio'

Now, when you're trying to send a console session to someone, these mistakes are not only annoying but also can get in the way of the person trying to help you, so you can just redo the last line with CTRL + r:

Please note that this does not negate what you did, it just clears the screen of the last command that you input. So if you imported datetime, but you actually wanted to import requests (yes, I don't know how to spell but bear with me), then datetime will still be usable from the shell:

Contacting the team

These are just some of the awesome things you can do with bpython. There's probably more. I don't know them all since I'm still exploring what you can do with it.

You can find out more about by talking to the guys on their irc in #bpython on irc.freenode.net. Find out more about the community around bpython here.

I initially though that bpyhon was not properly maintained, but if you take a look at their project page, it seems that its being kept up to date:

But what if you wanted both?

Well you'd have to use bipython then. Its still in the works, and there's no proper highlighting for errors right now like in ipython, but its heading towards being a package that unifies both ipython and bpython. So, you need to install ipython, bipython:

pip install ipython bipython

And you're done with the installations but running bipython is not as simple. At this time, bipython does not work with python 3, but it does work with python 2. Have a look:

Update

Last Updated on 15 Nov 2014 1:11 PM

Information about bipython added.

Rifts in the Community

2014-04-29T17:54:00+06:00

Table of Contents

Diving Deeper
Challenges
Solutions
Resources

I love Python. Its the first programming language that I really stuck to. At the time when I was first starting, Python had a lot going for it:

MIT taught its Introductory CS Course in Python
Python was widely used in the Scientific community
It was very intuitive and easy to use off the bat
The community on StackOverflow was super nice too

So, my journey began with MITx's Introduction to Computer Science course, and although it was the first time such a course was offered on edx, it was pretty darn awesome. The course was mostly taught by John Guttag. Most of the lectures made sense, and the homework questions were a lot of fun to complete. I made a lot of diverse friends too; one guy, a plumber but learning code so that he could help his son learn code (what an awesome dad!).

I never finished the course, but it gave me enough to get started with Python and programming. I started asking questions on StackOverflow, and soon started answering them too. I soon found one of the hidden gems of the site, its chat-room. I was enthralled by the community that grew around Python, it was and continued to be a super helpful community.

Diving Deeper

However, as I started diving deeper into things I soon came to realize a few of Python's limitations. First and foremost was that Python was very slow in some cases, especially when it came to raw loops or dealing with numbers. I personally was super grateful for Python's Integer promotion (helped me a lot with a couple of assignments and Project Euler too). However, I also realized there were many cases where Python can be optimised. The Python community responded to these problems by dividing the problem into different parts:

To make numerical data crunching (for lack of a better term) faster, the community came up with numpy.
For wrapping third party C libraries, there came swig by David Beazley and cython which is an evolutionary form of the Pyrex project. Please note that Cython is python specific whereas swig can be used with other programming languages as well. If you'd like to know more about these two libraries, I feel that this video does a good job at exploring what they are, and highlights the features that make them different.
Python Implementation, faster implementations have popped up like Pypy (a faster implementation of Python thanks to JIT). Jython/IronPython, both of which take advantage of their respective VMs, bypassing Python's GIL. Most recently, Pyston. There are others, but they have not gained enough traction.

Now, I can tell you that numpy (and Python's Data tools in general) was a huge success, and hence Python's scientific community has thrived. I can honestly say this with pride, that Python's data tools and libraries are arguably the best of the breed. PyData, which is a conference dedicated to the discussion of Python's data tools (and boy are there plenty of them) has been growing both in popularity and influence.

Challenges

Now the tools that the scientific community employ are third party libraries, they do not offer changes to the implementation of Python itself. However, the projects that are trying to make headway have two really big challenges, and the two are actually quite interconnected. So, what does making python faster mean? In essence there are two parts to it:

Making Python code itself execute faster
Better support for concurrency in Python

Now in order to meet the first goal, pypy implemented a JIT compiler and it WORKS! You can check it out yourself. Pyston on the other hand is trying to achieve JIT but through a different means (which has been attempted by pypy, but was not successful; it still remains a theoretical possibility). The first challenge here is make an implementation successful enough to replace CPython. I feeling that if the two camps, Pyston with its Dropbox backing and Pypy with its years of experience were to combine their efforts, then we might be able to see good results even faster. I know its just conjecture, but its something that might divide our community between two sides, ones that use pypy and the other which uses Pyston. Right now, Pyston is working towards supporting only Python 2.

Secondly, trying to get rid of the GIL has be a major problem (this is the reason why threading will often slow down your python programs), and as Alex Gaynor put it, a subtle reason for the division in our community:

I think there's been little uptake because Python 3 is fundamentally unexciting. It doesn't have the super big ticket items people want, such as removal of the GIL or better performance (for which many are using PyPy).

Most attempts at making Python faster work mostly with Python 2.7.*. Pypy 3 is not fully stable yet, and pypy-stm, which is pypy's attempt at removing the GIL (iirc) is aimed at Python 2 and not 3. Thus we come to the second big problem, and that is division between python 2 and 3.

Solutions

Pool our resources

We need to get the experts in the Python community together. We also need to get the companies that use python to back up a project that can make a python implementation fast enough to compete with Javascript's v8 engine, and one that supports concurrency well.

Admitting our problems

Everyone knows that Python 3 has issues. Every single time someone says "Unicode", I scream "Python 3" inside my head. Right now the Python community is in a state of denial regarding Python 3:

Guys i get it. I should not say anything bad about Python 3 because clearly I don't see the big picture.
— Armin Ronacher (@mitsuhiko) April 29, 2014

We need to get out of it, and start working words solving the problem.

Make it easier to contribute

Pythonistas want to help. We need to have more resources describing the problems that we face right now. Pypy project does an excellent job of that, and the effort they put into their FAQs are laudable with the exception of describing what rPython is:

RPython is a restricted subset of Python that is amenable to static analysis. Although there are additions to the language and some things might surprisingly work, this is a rough list of restrictions that should be considered. Note that there are tons of special cased restrictions that you’ll encounter as you go. The exact definition is “RPython is everything that our translation tool-chain can accept” :)

Resources

If you want to get started on understanding what these different technologies are, then these resources will help:

The Global Interpreter Lock (GIL)

David Beazley's talk on the GIL was pretty much the only talk I needed to get started, and (sadly) I did not do much more than that.

PyPy

Beazley's Keynote on Pypy was an excellent talk (if you haven't realized already, I adore Beazley's talks). He really dives into the guts on Pypy on this one, and shows you how it works from the inside.

Pypy and Software Transactional Memory is a good talk, but not the best. Its made by members of Pypy's core team, and introduces what STM is. Although this talk leaves a lot to be desired of, you can still walk away with some key points regarding what STM is and why its useful.

Pyston

This is a new attempt at making a faster python implementation, backed by Dropbox. You can take a look at their progress on their github project page.

Python 2 to 3

Armin Ronacher's blog post on porting from 2 to 3, really good stuff. He's been a strong critic of Python 3, but I feel that he justifies is position well and regularly.

I've joined the dark side

2014-03-23T20:45:00+06:00

Table of Contents

Some developer apps
- Oh-My-ZSH
- Homebrew
- pyenv
- Sublime Text 3
- PyCharm
- Keka
- AppCleaner
The Switch

So, after debating about it for quite some time, I finally got myself a Mac! Now, I got myself a Mac book pro 15 inch and it pretty much cost me a fortune. So far, I've been playing around with it, and I gotta say, that I would prefer a Mac over windows for Python development, any day. But before I tell you why its so great, let me share a couple of tools that I've found immensely useful. So, here's a list that any Python dev would find useful on a new Mac:

Some developer apps

Oh-My-ZSH

This is a must install for me on any *nix system. It gives me some awesome auto-completion, and some very beautiful themes. Frankly, I don't even think I want need to justify why one should install this awesome tool, just give it a tool, if you don't like it, then you can always change back. Check it out on oh-my-zsh.

Homebrew

This is such a wonderful package manager for Mac. I was initially split between Macports and Homebrew, but most of the people I talked to, overwhelmingly favored Homebrew. What really got me to pick Homebrew was that Pyenv supported a direct installation via Homebrew (Pyenv is next on the list). With homebrew installed with a quick:

ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

And bam! All installed. I just copied and pasted the text from the homebrew website, so I did not even notice that I was using ruby. You gotta love *nix systems for having python and ruby installed by default. With brew, I made my first install:

brew install pyenv

pyenv

This allows you to manage different versions of Python with ease. Really useful if you want to work with different versions of python on the command line. Check it out here.

Sublime Text 3

Wonderful text editor, I use this for small projects or a quick edit. For more intensive projects, I use pycharm. Although their website says that ST3 is still in beta, its been pretty stable for me, and the package manager for ST3 works very well. I suggest using ST3 because its starts up instantly, compared to ST2. If you're going to be using this, then I suggest you install the package manager as soon as you've installed ST3, instructions here.

PyCharm

Well, I was bound to install this at some point. In my honest opinion, this is the Python IDE to beat. Just make sure you create the charm Terminal variable, through "Create Command Line Launcher" option:

Keka

This is for unpacking archives. Really simple to use tool. Its available on the Mac Store as well.

AppCleaner

Simple app that completely uninstalls your installed apps. Works like a charm. Get it on their site. I recommend this as a must install since a lot of the times, your apps leave behind settings, that come back to bite you if you reinstall.

And thats a wrap. You can find loads of other tools available, and git comes installed by default. If you want to install mercurial, then its a simple brew install hg.

The Switch

For me, it was very painful at first. The keyboard was just plain wrong, it had the control key in the wrong place. I say wrong because the standard keyboard for windows and Linux is very different. I just wish Apple had followed the general keyboard layout, it would make migration so much simpler. However, this was probably the only problem I had with my new Mac Book pro, and everything else is pretty neat. The best thing about Mac is that the platform had many apps for windows virtualisation. My pick would be Vmware Fusion, it allows you to run Windows apps as if they were mac applications.

The only qualm I have right now is that I haven't been able to find a download manager that comes close to Internet Download Manager (IDM). I'm using the free version of Folx. It sucks. It looks nice, but it sucks.

Other than that, I feel that a Mac was exactly what I needed since although Ubuntu is nice, it lacks some good commercial software for screen recording, which is something that I really need.

If you make the switch to Mac, and you're an open source developer, I assure you that you won't regret it. Funnily enough, three days after I got my mac, my PC just died. I took it to the repair guy, and he told me that the motherboard was broke. I had just spend 50 bucks on repairs a week ago. The motherboard would cost me another $250.

I won't my fixing my PC any time soon.

Updating Pycharm 3.0.2 on Ubuntu

2013-12-12T20:21:00+06:00

Now, I've just come out of an issue with Jetbrains PyCharm 3. The problem was this, updating the software to 3.0.2. I had 3.0.1 installed. Now, when the update initially came around, the IDE told me, that I had to download the entire file all over again from the jetbrains website.

So, I downloaded the newest version, and it turned out that the update was broken, when I went to unzip the software, Archive Manager lashed out saying that it was a broken archive file. So, I had another guy check it out, and he got broken archive too.

After a couple of days, jetbrains realised that they messed up so, they put in a patch. I downloaded the patch, and this time again, I had another problem. What was it this time, it turns out that I did not have rights to update Pycharm (imagine that!). So, in order to update it, I had to find out where it was installed.

whereis pycharm

Now, the file was installed /opt/pycharm-3.0.1/bin/pycharm.sh. So, I went in there, and opened it as root:

sudo bash /opt/pycharm-3.0.1/bin/pycharm.sh

After that, I just checked for updates, by clicking the check link to the bottom left of the dialogue box:

The update worked after that. So, if you find yourself in a similar predicament, then I strongly advise you to run the pycharm.sh file as root.

Fabric Entry Creation

2013-12-11T00:58:00+06:00

So, in my attempts to create a functioning static blog generator using pelican, I've been using fabric to create my entries so that I can easily sort them according to their dates.

Here's the code:

def make_entry(title='default'):
    today = datetime.datetime.today()
    slug = title.lower().strip().replace(' ', '-')
    f_create = "content/{}_{:0>2}_{:0>2}_{}.rst".format(today.year, today.month, today.day, slug)
    with open(f_create, 'w') as w:
        with open('template.txt', 'r') as r:
            s = r.read().format(title=title,
                                hashes='#' * len(title),
                                year=today.year,
                                month=today.month,
                                day=today.day,
                                hour=today.hour,
                                minute=today.minute,
                                slug=slug)
            w.write(s)
            print("File created -> " + f_create)

Now, when I was doing this, I initally used the datetime.date class, but that did not have the hour and the minute attributes in the class. I also did not want to bother with slug creation, so I got that build in too.

I tried to get all the template string inside the function, but later on decided to refactor it out to a template.txt, which I can more easily use and change:

{title}
{hashes}

:date: {year}-{month}-{day} {hour}:{minute}
:tags:
:category:
:slug: {slug}
:author: Nafiul Islam

#Post goes here

The great thing about fabric is how easy it makes writing command line arguments. So, if I wanted to create a file with say {year}_{month}_{day}_{title}, I'd just do this:

fab make_entry:'My awesome title here'

Now that is awesome! :D