Julia language - State in @async tasks :: Current-Directory

狂风中的少年 提交于 2019-12-11 02:06:54

问题


I've noticed (read: caught a production bug) different Tasks in Julia - do not have their own working directory, but that the current directory - is shared. I realise in an OS level this is kind of obvious (a process has a working-directory).

My question is first - is there any other obvious or less obvious global state I should watch out for (obviously environment variables, or any global variables).

Second - should this be more documented, or avoided by the task abstraction, - a "Task" in an abstraction, it could (theoretically) have it's own semantics, like moving back to a working directory.

We've solved the product bug by removing any 'cd()' call from within the code, the point is - the cd() with closure abstraction was giving us the illusion that this might be safe to use.

ie:

cd("some_dir") do
  # stuff
end

We've had this sort of code working in Mux endpoints.

My minimal reproduction of the issue, is

function runme(path)
    mkpath(path)
    abs_path = realpath(path)
    return t = @async begin
        cd(abs_path) do
            sleep(1)
            println(path,"::",(pwd()|>splitdir)[2])
        end
    end
end

runme("a")
runme("b")

output: (obviously)

a::b
b::b

Edit: (summary) - though this is almost not a question - this should be searchable and documented (as it's a possible source of synchronisation bugs).

The difference to just a global variable (about the state of 'cd()') - a variable can be captured in a closure using a let statement, while the current directory cannot. While this is not even programming language specific (but a OS-process issue) - I think the syntax does give an illusion of locality (similar to python 'with' blocks, or many other devices).

Thus the bottom line is that the 'cd' abstraction should not be used in any production utility, unless one day there's a way to set a handler for 'switching back' into a Task/block/closure (similar to the finally blocks in a way)


回答1:


I'm not explicitly aware of the internals or particular implementation, and this is my personal educated guess and happy to be corrected by an actual julia dev, but I think it's not a case that Tasks share "the current directory" per se, but that they more generally share "state". Your example will behave the same way with global variables instead:

# in testscript.jl
var = 0;

function runme(val)
    global var = val+1;
    return t = @async begin
      sleep(1)
      println(val,"::",var);
    end
end

runme(1) 
runme(3) 

# in the REPL session
julia> include("testo.jl");
  1::4
  3::4

However, the sharing of (global) state is a feature, not a bug. This is in contrast to processes (which is the way in which julia achieves true parallelism), which do not share state, and therefore all communications between workers need to be done via sockets.

While one does need to be careful with this, it can also be very useful and necessary. Tasks (or coroutines) are not used to achieve parallelism or confinement in that regard. They are "a form of cooperative multitasking", i.e. a way to achieve multiply running operations on the same thread; this is not parallelism, the multiple operations are run "one at a time with appropriate scheduling, as supervised by the CPU". For instance "try/catch" blocks are (apparently) implemented using Tasks.

So, to answer your first question, yes, you need to be aware of the shared state, and to the second question, no, to the extent that you're using the Tasks in a way that accesses somehow the global state (of which the current directory is an aspect) I'm not entirely sure each Task should have its own semantics in the way you describe; instead you just need to design your tasks in such a way that they take the fact that state is shared into account, and act accordingly.

As a further example of the second, consider two separate tasks that "produce" outputs that need to be "consumed". If you rely on the appropriate consumption from either task based on a global state, then it is entirely possible that your task should behave appropriately with respect to the shared global state by design. Here's a trivial example of this:

d = 0;

function report()
  global d;
  for i in 1:4
    if iseven(d); produce("D is Even\n"); else; produce("D is Odd\n"); end
  end
end

task1 = Task( report );
task2 = Task( report );

for i in 1:4
  d = i;
  consume(task1) |> print;
  consume(task2) |> print;
end

D is Odd
D is Odd
D is Even
D is Even
D is Odd
D is Odd
D is Even
D is Even

PS. the latest julia build is informing me that "produce" and "consume" are being deprecated in favour of "Channels", but presumably the point stands.



来源:https://stackoverflow.com/questions/44571713/julia-language-state-in-async-tasks-current-directory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!