mg website logo

Exiting a Node Process and the Node Event Loop

November 11, 2020

I recently ran into an issue with a Node.js service at work that caused me to dive deeper into how Node works.

It turned out to be a complicated issue due to the combination of long running tasks and reporting errors to an external API, coupled with how the Node event loop works and how Node’s process.exit / process.exitCode function.

Background Info

When it encounters an unrecoverable error, this Node service should send the error to an external service, then end the process with an exit code of 1 to indicate an error had occurred. We were seeing the errors in our logs, and the service was terminating, but no errors were reported through our monitoring.

The offending section of code was a fairly simple catch block that looked much like this:

catch (error) {
  // write the error to logs
  logService.error(error);
  // report the error to our monitoring service
  errorService.error(error);
  // exit the process
  process.exit(1);
}

Exiting a Node Process

process.exit()

Above we call process.exit(1) which exits as soon as possible. From the Node docs:

Calling process.exit() will force the process to exit as quickly as possible even if there are still asynchronous operations pending that have not yet completed fully

That’s problematic in this scenario, since errorService.error() makes a network request!

Looking into the error service more, we can see that it’s providing a synchronous API to log errors (meaning we can’t wait on it), but it performs an asynchronous task (a network request). This means that the code will schedule a network request on the Node queue, but it won’t ever execute, since Node terminates immediately on process.exit().

This is the cause of our problem, where our error service never receives the error. So how do we fix it?

process.exitCode

Reading the Node documentation, the next section mentions setting process.exitCode:

In most situations, it is not actually necessary to call process.exit() explicitly. The Node.js process will exit on its own if there is no additional work pending in the event loop. The process.exitCode property can be set to tell the process which exit code to use when the process exits gracefully.

Rather than calling process.exit() directly, the code should set the process.exitCode and allow the process to exit naturally by avoiding scheduling any additional work for the event loop

With this in mind, let’s change our catch block to set exitCode instead of calling .exit().

catch (error) {
  // write the error to logs
  logService.error(error);
  // report the error to our monitoring service
  errorService.error(error);
  // set the exitCode on the process
  process.exitCode = 1
}

Now we should see this error being reported, and the process should exit…however our process never exits! Why?

The Node.js Event Loop

Before we continue, it’s important to understand the Node Event Loop.

There’s two great videos below that explain it well, and are very much worth watching. They explain the JavaScript Event Loop in relation to the browser, but the same concepts apply in Node:

An overly simplified explanation is that Node has a call stack, a job queue, and a message queue. When you call a function, it’s added to the call stack. When you call a function that returns a Promise (including async/await code) it’s added to the job queue. When a network fetch response comes in, it’s added to the message queue.

The Node Event Loop will first process everything in the call stack. Next it will run through everything on the job queue. Finally it will start to pick tasks off of the message queue.

Our Issues Using process.exitCode

So we set process.exitCode = 1, and run the code again. This time, we see the error gets logged properly, but the service never terminates!

Using process.exitCode causes Node to terminate with the error code when there is no additional work pending in the event loop. This is an issue for us, since the event loop will never be empty.

The way this particular service is setup, it’s processing long running tasks off of a message queue. It’s continuously scheduling new work (via Promises) to check the queue and to call our onMessage callback when it sees a message. This new work is added to the job queue, meaning there is never going to be a time where there is no pending work in the event loop.

If we could tell the message queue code to stop running on an error within the onMessage callback, then we could safely use process.exitCode = 1. Since this is out of our control here, we need to find another solution.

Promises to the rescue!

So we have to call process.exit(1). This brings us back to our old issue, where we do exit the process, but our errors don’t get logged to our monitoring service.

Luckily our error service provides an option to pass it a callback. We can wrap this in a Promise, and then await our new Promise so that we pause execution until the Promise has resolved. This effectively converts their synchronous API into an asynchronous one.

catch (error) {
  // write the error to logs
  logService.error(error);
  // wait for the errorService to report our error
  await new Promise(resolve => {
    errorService.error(error, resolve)
  })
  // now it's safe to exit
  process.exit(1)
}

Now Node will not run process.exit(1) until after the Promise has resolved (and our error log was sent). This works by scheduling a task on Node’s job queue (with the Promise), then waiting for the task to run (via await).

When we run this again, we see the error reported to our error service, and then the process exits without processing another task!

A Generalized Approach

This solution can also be generalized to be used elsewhere.

const promisifiedReportError = (...args) =>
  new Promise((resolve, reject) => {
    errorService.error(...args, (internalError, response) => {
      if (internalError) {
        reject(internalError)
      } else {
        resolve(response)
      }
    })
  })

// and called like
await promisifiedReportError(error)

More Info

The following resources really helped me understand this issue:


Written by Mike Guida who enjoys prolonged outdoor adventures and building stuff with software. Follow him on Twitter