Most Rails applications start pretty simple. Users enter data; data gets saved in the database.
Then, we move on to something a bit more complex. Eventually, we realized we should not do all the work in one request, and we started using some form of job to push work onto a background process.
Great. However, as complexity increases, we realize we do too much work in a single background job. So, the next logical option is to split the background job into multiple jobs. Easy enough, of course, but then we run into some gotchas:
- Do any jobs require other jobs to be completed first? And, of course, do any of those sub-jobs require other sub-jobs (and so on)
- How do we mentally keep track of what is going on? How do we make it easy for someone to jump into our code base and understand what is happening?
Over the years, KickoffLabs has processed billions of jobs. Breaking tasks down into small chunks has been one way we have managed to scale. One of the things I have found challenging over the years is keeping track of when/what is processed in the background jobs.
So, when I started to experiment with a new product idea, I wanted to find a way to tame this problem (and eventually roll it back into KickoffLabs, too).
I had seen Shopify's JobIteration library before but never had a chance to use it.
Meet Iteration, an extension for ActiveJob that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
It recently popped up on my radar again, and I noticed it supports iterating over arrays. This gave me an idea. Typically, this library is used for iterating over a large number of items in a job and tracking your place. If the job restarts (or even raises an error), you can safely resume where you left off.
With that functionality alone, it is likely a quite helpful library for most projects. But what if we used it to define a series of steps a job needs to take? This way, we can have a single job that handles all of the processing for a necessary task.
If things can be run in parallel, one or more of the steps can create new child jobs as well.
With that in mind, here is "SteppedJob":
class SteppedJob < ApplicationJob
include JobIteration::Iteration
queue_as :default
class_attribute :steps, default: []
class << self
def steps(*args)
self.steps = args
end
end
def build_enumerator(*, cursor:)
raise ArgumentError, "No steps were defined" if steps.blank?
raise ArgumentError, "Steps must be an array" unless steps.is_a?(Array)
Rails.logger.info("Starting #{self.class.name} with at cursor #{steps[cursor || 0]}")
enumerator_builder.array(steps, cursor:)
end
def each_iteration(step, *)
Rails.logger.info("Running step #{step} for #{self.class.name}")
send(step, *)
Rails.logger.info("Completed step #{step} for #{self.class.name}")
end
end
This could also be a module, but I have it set up as a base class.
To use it:
- Create a job that derives from SteppedJob
- Define an array of steps
- Add a method for each step
Here is a sample job. This job is enqueued like any other ActiveJob: ProcessRssContentJob.perform_later(content)
From there, each job step is executed, and the content argument is passed along to each step.
class ProcessRssContentJob < SteppedJob
queue_as :default
steps :format_content, :create_content_parts, :enhance_content_parts
def format_content(content)
content.text = BlogPostFormatter.call(content:)
content.processing_status!
end
def create_content_parts(content)
ContentPartsForContentService.call(content:)
end
def enhance_content_parts(content)
EnhanceContentPartsService.call(content:)
end
end