问题
I've noticed (read: caught a production bug) different Tasks in Julia - do not have their own working directory, but that the current directory - is shared. I realise in an OS level this is kind of obvious (a process has a working-directory).
My question is first - is there any other obvious or less obvious global state I should watch out for (obviously environment variables, or any global variables).
Second - should this be more documented, or avoided by the task abstraction, - a "Task" in an abstraction, it could (theoretically) have it's own semantics, like moving back to a working directory.
We've solved the product bug by removing any 'cd()' call from within the code, the point is - the cd() with closure abstraction was giving us the illusion that this might be safe to use.
ie:
cd("some_dir") do
# stuff
end
We've had this sort of code working in Mux endpoints.
My minimal reproduction of the issue, is
function runme(path)
mkpath(path)
abs_path = realpath(path)
return t = @async begin
cd(abs_path) do
sleep(1)
println(path,"::",(pwd()|>splitdir)[2])
end
end
end
runme("a")
runme("b")
output: (obviously)
a::b
b::b
Edit: (summary) - though this is almost not a question - this should be searchable and documented (as it's a possible source of synchronisation bugs).
The difference to just a global variable (about the state of 'cd()') - a variable can be captured in a closure using a let
statement, while the current directory cannot. While this is not even programming language specific (but a OS-process issue) - I think the syntax does give an illusion of locality (similar to python 'with' blocks, or many other devices).
Thus the bottom line is that the 'cd' abstraction should not be used in any production utility, unless one day there's a way to set a handler for 'switching back' into a Task/block/closure (similar to the finally
blocks in a way)
回答1:
I'm not explicitly aware of the internals or particular implementation, and this is my personal educated guess and happy to be corrected by an actual julia dev, but I think it's not a case that Tasks share "the current directory" per se, but that they more generally share "state". Your example will behave the same way with global variables instead:
# in testscript.jl
var = 0;
function runme(val)
global var = val+1;
return t = @async begin
sleep(1)
println(val,"::",var);
end
end
runme(1)
runme(3)
# in the REPL session
julia> include("testo.jl");
1::4
3::4
However, the sharing of (global) state is a feature, not a bug. This is in contrast to processes (which is the way in which julia achieves true parallelism), which do not share state, and therefore all communications between workers need to be done via sockets.
While one does need to be careful with this, it can also be very useful and necessary. Tasks (or coroutines) are not used to achieve parallelism or confinement in that regard. They are "a form of cooperative multitasking", i.e. a way to achieve multiply running operations on the same thread; this is not parallelism, the multiple operations are run "one at a time with appropriate scheduling, as supervised by the CPU". For instance "try/catch" blocks are (apparently) implemented using Tasks.
So, to answer your first question, yes, you need to be aware of the shared state, and to the second question, no, to the extent that you're using the Tasks in a way that accesses somehow the global state (of which the current directory is an aspect) I'm not entirely sure each Task should have its own semantics in the way you describe; instead you just need to design your tasks in such a way that they take the fact that state is shared into account, and act accordingly.
As a further example of the second, consider two separate tasks that "produce" outputs that need to be "consumed". If you rely on the appropriate consumption from either task based on a global state, then it is entirely possible that your task should behave appropriately with respect to the shared global state by design. Here's a trivial example of this:
d = 0;
function report()
global d;
for i in 1:4
if iseven(d); produce("D is Even\n"); else; produce("D is Odd\n"); end
end
end
task1 = Task( report );
task2 = Task( report );
for i in 1:4
d = i;
consume(task1) |> print;
consume(task2) |> print;
end
D is Odd
D is Odd
D is Even
D is Even
D is Odd
D is Odd
D is Even
D is Even
PS. the latest julia build is informing me that "produce" and "consume" are being deprecated in favour of "Channels", but presumably the point stands.
来源:https://stackoverflow.com/questions/44571713/julia-language-state-in-async-tasks-current-directory