If you need to do some computation in your Flask App and you’re using Heroku, you’ve got few options:
- You do it synchronously and pray that it takes less than 30 secs.
- You configure a worker dyno for instance with Celeri (introduction here).
- You use this tutorial (Heroku + Python + Redis)
- Or you use my little technique and have something which is less conventional but doing okay (we have it in production for 1 year now).
One of the drawbacks with Heroku is that they cut every request which lasts more than 30 seconds (source) and, like everything in life, it has good and bad aspects:
The good aspects are:
- If the duration of the request is due to an issue in your code and you entered into an infinite loop or a very costly operation, it’s good to kill the request in order to minify the size of the queue.
- You need to keep this constraint in mind when you write code dealing with lots of data (i.e. administration dashboard which aggregates lots of data) and thinking about performance while coding is always a good thing. 🔥
- You’re forced to put a timeout on all the API requests you might do server-side. That’s also very positive. 👍
However, it also has some drawbacks:
- If your product scales, you could need some very computation-intensive reports (i.e. at JobNinja we create a daily report of all the traffic we buy) and you cannot have the possibility to compute them in less than 30 secs. 🐢
- If you send emails with attachments (we do it a lot) and you do not have any worker then you’ll need to pray that your SMTP gateway is reliable (it’s a major issue, believe me). 🙏
- If you have some background jobs to do, hmm… you just can’t…
In short, Heroku couldn’t work properly without timeout because it would potentially freeze your app if one of your users discovers a bug but it may be problematic when you scale up.
What can we do?
In order to solve this issue I just sat and thought about what I really needed and I came with these specs:
- We need to have any type of background jobs.
- This jobs need to have access to the app context (we use SQLAlchemy).
- It’s not time sensitive (I do not have jobs that need to be executed at a specific time, they just need to be executed somewhere in a large time-frame).
And I had some personal restrictions:
- I hate setup things and I’m bad at it.
- I hate giving money away (and having a Redis Queue worker on Heroku is not cheap) 💰.
- I do not know any of Redis, RabbitMQ or Amazon SQS.
Then, I found a cumbersome idea which turned to work very well; let me introduce the components:
- Flask-Script enables you to run some external tasks on your server while using all the code of your app.
- Heroku Scheduler is a … scheduler running on … Heroku. It creates another VM/Dyno (it’s not using the one running your web server) which starts every x minutes/hours/day and runs a specified command.
TaskTable in the database
Let’s code it!
Create the model
So first, we said that we want a
Task object, it should look like this:
with the corresponding SQLAlchemy Object (optional if you do not use SQLAlchemy):
I think that you now see the direction in which we are heading: I’ll call Flask Scripts every 10 minutes which will start to look for
Task in the database. And it will execute every task it finds, calling these by their
type (i.e. sendEmail) and passing them the required arguments (i.e. firstname.lastname@example.org). After the task has been successfully executed, the task will simply be removed (you can add an
executed_at column in the
Task object instead of removing the
Task if you think that it will provide you better logs).
Add a command to your app
To code it we need to add a function in the manager of Flask-Script like this one:
To understand the code, start at the bottom: the
@manager.command allows the code to be run with Flask-Script.
execute_pending_task is the interesting part where you use the power of keywords arguments (
kwargs ). It does the following: gets all the tasks in the table (the executed tasks have been removed, if you remember correctly), iterate through all of them and for each task, it tries to find the function associated to the
type of the task.
If it finds one, it calls this function passing a dictionary as
kwargs. At the end, it we didn’t enter the
except, everything went smooth and we can remove the task.
NB: One limitation of my approach is that all the arguments have to be JSON serialisable because that’s how they’re stored in the database.
Add some tasks
🏃 Now, the only thing we need is to create some tasks to make it run: 🏃
That’s a prod example to resize the pictures to their CSS requested size in order to minify the footprint of our page on the bandwidth of the users. If the image doesn’t exist yet, it returns the original but it also creates a
Task to create a resized (and resampled) version of this JPEG via the
Deploy it to Heroku
Do not forget to call the Heroku Scheduler in order to execute your tasks (else this is completely useless 😆).
There you go! You have a background worker for a fraction of the cost (you don’t need Redis neither an always working dyno). 💪 🎉