Task model for long-running jobs in OpenCPU?

问题

To my knowledge, there is no task model in OpenCPU, i.e. one has to wait arbitrarily long with an open TCP connection until the request finishes.

One possibility for a task model implementation would be to return a dedicated task uri with status 200 OK immediately for a POST request when one wishes to running a function. The advantage would be that the client would get a result immediately while the job runs on the server in the background.

The client would then poll the task URI until it returns 201 created, meaning the job has successfully finished, or an error code for unsuccessful calls. In the success case, the body would contain the same resource list that is created directly by POST now.

What is the opinion on this model or a similar approach? How does everyone handle this? I think support for long running jobs without open TCP connection would be valuable. Optional features such as supplying progress information while polling the still running job etc. also comes to my mind.

回答1:

You are correct that the current version of OpenCPU does not include a task manager. The client has to keep the connection alive while waiting for the request to finish. This keeps the API nice and simple for the majority of use cases, but it is not optimal for scheduling long running jobs. However all time limits are configurable, so there is nothing stopping you from waiting 30 minutes for your job to finish.

As you suggest, an alternative design would be to return Accepted 202 for valid POST requests, and then let the client poll for the status of the result. This would be a cool addition to the API (and perhaps will be added one day) but it introduces quite some complexity in the client and server implementations.

On the server you would need to write a task manager, and probably worry about functionality to monitor, timeout and manually kill long running requests. Moreover, there is not that much information that R can give you while a function is still executing. For example, there is really no way to know how far a function call is from finishing.

One thing that would be possible is to capture intermediate stdout, so that you could implement your own progress indicator in the R function by regularly printing some status. The client could then repeatedly retrieve some URL to read stdout and inquire about the status of the request. However I doubt how useful this would be. I rarely see progress meters in R functions (unless debug=TRUE or something), so I am not sure this would be any different for R functions that are called remotely.

来源：https://stackoverflow.com/questions/25544546/task-model-for-long-running-jobs-in-opencpu

标签

opencpu