documentation

Job queue

When you run cycle job queue, the job enters a queue. Within about a minute, it gets assigned to an available machine and starts running. This page explains what happens between queuing and execution.

How jobs get assigned

The system checks for pending jobs roughly every 60 seconds. When it finds one, it picks the least-loaded machine that meets the job's requirements and assigns the job there. The machine picks it up within a few seconds.

A machine can run at most 1 job at once. If all machines are busy, the job waits until one is free.

GPU matching

If your manifest sets cuda in the [job] section (e.g. cuda = "13.0"), the job only runs on machines whose GPU driver supports that CUDA version or newer. This is a scheduling constraint, not something that installs a driver in the container. If cuda is omitted (the default), the job can run anywhere.

Running your own jobs while busy

When you're actively using your machine, it's marked BUSY. Other people's jobs won't get assigned to it. But your own jobs still can, by default, since you probably don't want your own work to stall just because you're at your desk.

Idle machines are always preferred. Your job only lands on a busy machine if nothing else is available.

To opt out of this behavior entirely, run cycle daemon forbid-busy-assign. See Daemon for details.

What happens when a machine goes down

Jobs don't get stuck. If a machine goes offline, runs too long, or becomes busy while running someone else's job, the job goes back into the queue and gets reassigned to a different machine on the next pass.

When the new machine picks up the job, it downloads whatever was already in /artifacts from the previous run before starting the container. This means your code can resume from where it left off, as long as you write checkpoints to /artifacts and check for them at startup.

Heads up

Your program should be written to resume from /artifacts. Save checkpoints there periodically, and on startup, look for existing checkpoints to continue from. If you don't do this, a reassigned job restarts from scratch every time it moves to a new machine. See Adapting Your Code for details.

Job states

StatusMeaning
ACTIVEQueued or currently running
PAUSEDYou paused it; the system skips it
SUCCESSThe container exited with code 0
FAILUREThe container exited with a non-zero code

Pausing and resuming

You can pause a job from the CLI (cycle job pause) or from the Jobs page. A paused job stays in the system but won't be assigned. Resume it to put it back in the queue.

Admins and Owners can pause or resume anyone's jobs. Regular members can only pause their own.

Email notifications

If the manifest sets email_on_success or email_on_failure to true, you get an email when the job finishes. The email goes to the address set in Settings, by default the email associated with your login. You can also override these flags per-job when queuing from the CLI.

Completed jobs in the dashboard

The Jobs page shows completed jobs from the last 30 days, capped at 100. Older jobs are still accessible via cycle job ls.