User Tools

Site Tools


jobqueue

This is an old revision of the document!


Long-running jobs with a jobqueue

By introducing simulations and predictions, we have created a job type that does not coexist well with the interactive website. These jobs must be outsourced to another machine (or a pool of machines). These jobs are executed as batch-jobs, i.e. they don't need user interaction.

Some requirements for this design:

  • Jobs must be self-contained. A job should describe where to get input data (fully qualified URL) and where result data should be published (also fully qualified URL).
  • The lingua franca of the mySmartGrid ecosystem is JSON. Most visualization is done directly using JSON. The (new) data submission API also employs JSON. Therefore, result data (such as a predicted timeseries) must be represented as JSON data and published via the HTTP protocol.
  • The simulations must run on a different system as the webserver.

Beanstalkd

Beanstalkd is a simple job queue designed for minimal overhead: http://kr.github.com/beanstalkd/

The beanstalk protocol is very simple: https://github.com/kr/beanstalkd/blob/v1.3/doc/protocol.txt. This is a very good thing(TM), because it leads to a bunch of client implementations that are ready to use.

The architecture for mySmartGrid will look as follows:

Super-sketchy sketch

The architecture consists of a webserver, several worker nodes and a beanstalkd instance. The webserver can submit jobs to the beanstalkd queue. A job is a JSON-formatted document containing the desired result URL and $foo as input data for the job. The worker node then retrieves the job from the queue and dispatches a process locally on the worker node. As soon as the process finishes, the job is deleted from the queue. The results of the job are published using the local HTTP server. Again, data is JSON-formatted. In order to be able to detect outdated information, a timestamp MUST be included in the result file.

Right now, there is no redundancy in the system. I assume that the webserver decides which server

jobqueue.1310474558.txt.gz · Last modified: 2012/10/30 10:34 (external edit)