EngineeringFantasy

What Heroku Should Do

Saturday, 03 February 2018

DISCLAIMER: THESE OPINIONS ARE MY OWN AND DO NOT REFLECT THE OPINIONS OF MY FORMER OR PRESENT EMPLOYERS

I've been using Heroku for a while not at my workplace and so have many of our developers. Heroku is actually a really good idea, and thats why services like convox.io and platform.sh keep popping up. The notion, that you can have your servers managed by people, and not need a full time devops team is a total godsend. Iterations are faster and easier, and it allows our teams to focus on application logic. It is not without its set of caveats though. The market place also makes for quick integrations with other service providers, which makes adding things like cache and search very easy.

I myself was pretty skeptical at the beginning because although for my personal projects, my perception of Heroku was that it was pretty expensive, and not something that you'd want to use at a real company. By the time I joined, Heroku was already being used for several applications. I mean, at the time, I just thought it was a nice skin on top of AWS (which is an accurate description even to present day).

However, the real reason that the company was using Heroku was because of a product called Heroku Connect (HC).

Heroku Connect

Heroku Connect is basically a syncing tool. It basically syncs data from SalesForce (SF) to a Heroku operated Postgresql database. The data in SalesForce (SF) was coming from the company's own servers. In other words, we were paying Heroku to gain access to out own data. This initially perplexed me, but I soon found out is that the old platform had issues scaling data access out, so we decided to use HC and Postgresql as a temporary solution to create a couple of applications that we could use to test new products and new markets. SF does have an API, but that has API limits for reads and writes, and is generally used for bulk updates and reads. Furthermore, it gets quite slow if you're building an API on top of it. However, we soon ran into a couple of issues very quickly. HC in conjunction with Postgresql was seen as a really good alternative, because it allowed you to bypass data access limit by essentially creating a copy of your business data in Postgresql.

However, we soon ran into a couple of issues really quickly.

Heterogeneous Syncing Types

In SF, data is stored within things called SalesForce Objects (SO). These SOs are basically tables, and through HC are mapped to Postgresql tables. There are two ways in which syncs occur. One is that there is a set interval after which the data from SF is synced to Postgresql and the changes with Postgresql are synced to SalesForce. The other way is that data is synced in real time. What is perplexing is that you never know which SOs are eligible for real time updates and which ones will only be updated at intervals. You can always see which objects are being updated at intervals, and which ones are being updated in real time, but you never know why a particular SO gets real time updates while another SO gets updates at intervals. Sometimes, SOs that had real time updates will start getting interval updates instead. This is because you might hit what is called the "Streaming API limit". What this limit is, and how it is reached remains a mystery to me as well as our SF developers.

If you're building a service that gives you quick and unlimited access to SF, then you better deliver on that promise otherwise, people will become disillusioned by your service as a whole and so despite have some awesome product, your reputation is going to suffer as a natural consequence.

Pay per Row

If you want to use HC, you have to pay an amount for a particular number of rows. So basically, you are paying for the data in SF, you are paying for the data in Postgresql, and on top of that you have to pay per row to use HC. This is maddening because it makes so little sense. If you're building a syncing service, you should be charging for throughput, and not for the amount of data. This also makes sense because the higher the throughput, the higher the costs for Heroku as a business, so it'd make sense for both Heroku and the companies buying the HC service.

But the real consequence of this pricing scheme is that if you want to launch a geo-distributed application built on top of HC, you need to pay for rows in each region because Heroku's postgres does not have multi-master replication, only master-slave replication. So, if you had to pay for regions in the US, EU and China, then you'd need to pay for the same number of rows three times.

Underpowered Dynos

Dynos are basic units of computation for Heroku. In a generic sense, they are a virtual machine.

The lowest tier dyno is really quite pathetic. Its just 512 MBs of RAM and 1 GHz, and for the equivalent price, you could get a few t2.mirco instances from AWS and I'm pretty sure that you could get the same from Azure and Google Cloud Platform. Although you get auto-scaling as well as maintenance on those VMs, the price seems quite steep for the kind of product that they're offering.

Deploy Anywhere (but...)

You can theoretically deploy on pretty much any continent using Heroku. However, there are two Herokus. There is the public cloud, and then there is the private cloud. If this is news to you, then what you've been using all this time is Heroku's public cloud. The public cloud only has a few regions, whereas the private cloud has many regions from Virginia to Tokyo.

However, the two clouds have a completely different payment scheme. To just be able to use the private cloud you need to pay a certain amount every month, just for the privilege of being able to use it. You're not getting any dynos or anything. Just to be able to use the private cloud, you need to pay an amount upfront, and its not cheap either. Furthermore, the dynos in the private cloud are more expensive (yes, they have a little more compute power), which makes geo-distribution of your applications quite expensive.

Heroku is an Ecosystem

In Heroku the app is the center of the universe. You attach add-ons like redis or Postgresql to your application. You can indeed share add-ons between applications, but later on, it becomes difficult to monitor what add-on is owned by what app. It is far better if you have an interface, where you have the add-ons separate to the applications, so that you can share resources between applications more easily.

What Heroku is Doing Right

Heroku has probably the best service you can get. If something breaks or something goes wrong with your application, their support mechanism is truly exceptional, because they will ping up an on-call engineer to fix your production problems. Usually, these issues are resolved very quickly.

Secondly, even if you consider that Heroku is expensive, having your own devops team is even more expensive, and further still if you need them to be on-call. This is difficult for small companies.

Furthermore, Heroku also has some of the best account executives you can get. They are really quite helpful, and truly fight for you with upper management to give you the best deals possible if you buy bulk plans. I've only worked with two so far, but they really are fighting for you in the company, and what to see you succeed.