python asyncio gets deadlock if multiple stdin input is needed

不打扰是莪最后的温柔 提交于 2021-02-07 05:27:19

问题


I wrote a command-line tool to execute git pull for multiple git repos using python asyncio. It works fine if all repos have ssh password-less login setup. It also works fine if only 1 repo needs password input. When multiple repos require password input, it seems to get deadlock.

My implementation is very simple. The main logic is

utils.exec_async_tasks(
        utils.run_async(path, cmds) for path in repos.values())

where run_async creates and awaits a subprocess call, and exec_async_tasks runs all the tasks.

async def run_async(path: str, cmds: List[str]):
    """
    Run `cmds` asynchronously in `path` directory
    """
    process = await asyncio.create_subprocess_exec(
        *cmds, stdout=asyncio.subprocess.PIPE, cwd=path)
    stdout, _ = await process.communicate()
    stdout and print(stdout.decode())


def exec_async_tasks(tasks: List[Coroutine]):
    """
    Execute tasks asynchronously
    """
    # TODO: asyncio API is nicer in python 3.7
    if platform.system() == 'Windows':
        loop = asyncio.ProactorEventLoop()
        asyncio.set_event_loop(loop)
    else:
        loop = asyncio.get_event_loop()

    try:
        loop.run_until_complete(asyncio.gather(*tasks))
    finally:
        loop.close()

The full code base is here on github.

I think the problem is something like the following. In run_async, asyncio.create_subprocess_exec, there is no redirection for stdin, and the system's stdin is used for all subprocesses (repos). When the first repo asks for password input, asyncio scheduler sees a blocking input, and switches to the second repo while waiting for the command-line input. But if the second repo asks password input before the password input for the first repo is finished, the system's stdin will be linked to the second repo. And the first repo will be waiting for input forever.

I am not sure how to deal with this situation. Do I have to redirect stdin for each subprocess? What if some repos have passwordless login and some don't?

Some ideas are as follows

  1. detect when password input is needed in create_subprocess_exec. If it does, then call input() and pass its result to process.communicate(input). But how can I detect that on the fly?

  2. detect which repos requires password input, and exclude them from async executions. What's the best way to do that?


回答1:


In the default configuration, when a username or password is needed git will directly access the /dev/tty synonym for better control over the 'controlling' terminal device, e.g. the device that lets you interact with the user. Since subprocesses by default inherit the controlling terminal from their parent, all the git processes you start are going to access the same TTY device. So yes, they'll hang when trying to read from and write to the same TTY with processes clobbering each other's expected input.

A simplistic method to prevent this from happening would be to give each subprocess its own session; different sessions each have a different controlling TTY. Do so by setting start_new_session=True:

process = await asyncio.create_subprocess_exec(
    *cmds, stdout=asyncio.subprocess.PIPE, cwd=path, start_new_session=True)

You can't really determine up-front what git commands might require user credentials, because git can be configured to get credentials from a whole range of locations, and these are only used if the remote repository actually challenges for authentication.

Even worse, for ssh:// remote URLs, git doesn't handle the authentication at all, but leaves it to the ssh client process it opens. More on that below.

How Git asks for credentials (for anything but ssh) is configurable however; see the gitcredentials documentation. You could make use of this if your code must be able to forward credentials requests to an end-user. I'd not leave it to the git commands to do this via a terminal, because how will the user know what specific git command is going to receive what credentials, let alone the issues you'd have with making sure the prompts arrive in a logical order.

Instead, I'd route all requests for credentials through your script. You have two options to do this with:

  • Set the GIT_ASKPASS environment variable, pointing to an executable that git should run for each prompt.

    This executable is called with a single argument, the prompt to show the user. It is called separately for each piece of information needed for a given credential, so for a username (if not already known), and a password. The prompt text should make it clear to the user what is being asked for (e.g. "Username for 'https://github.com': " or "Password for 'https://someusername@github.com': ".

  • Register a credential helper; this is executed as a shell command (so can have its own pre-configured command-line arguments), and one extra argument telling the helper what kind of operation is expected of it. If it is passed get as the last argument, then it is asked to provide credentials for a given host and protocol, or it can be told that certain credentials were successful with store, or were rejected with erase. In all cases it can read information from stdin to learn what host git is trying to authenticate to, in multi-line key=value format.

    So with a credential helper, you get to prompt for a username and password combination together as a single step, and you also get more information about the process; handling store and erase operations lets you cache credentials more effectively.

Git fill first ask each configured credential helper, in config order (see the FILES section to understand how the 4 config file locations are processed in order). You can add a new one-off helper configuration on the git command line with the -c credential.helper=... command-line switch, which is added to the end. If no credential helper was able to fill in a missing username or password, then the user is prompted with GIT_ASKPASS or the other prompting options.

For SSH connections, git creates a new ssh child process. SSH will then handle authentication, and could ask the user for credentials, or for ssh keys, ask the user for a passphrase. This again will be done via /dev/tty, and SSH is more stubborn about this. While you can set a SSH_ASKPASS environment variable to a binary to be used for prompting, SSH will only use this if there is no TTY session and DISPLAY is also set.

SSH_ASKPASS must be an executable (so no passing in arguments), and you won't be notified of the success or failure of the prompted credentials.

I'd also make sure to copy the current environment variables to the child processes, because if the user has set up an SSH key agent to cache ssh keys, you'd want the SSH processes that git starts to make use of them; a key agent is discovered through environment variables.

So, to create the connection for a credential helper, and one that also works for SSH_ASKPASS, you can use a simple synchronous script that takes the socket from an environment variable:

#!/path/to/python3
import os, socket, sys
path = os.environ['PROMPTING_SOCKET_PATH']
operation = sys.argv[1]
if operation not in {'get', 'store', 'erase'}:
    operation, params = 'prompt', f'prompt={operation}\n'
else:
    params = sys.stdin.read()
with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as s:
    s.connect(path)
    s.sendall(f'''operation={operation}\n{params}'''.encode())
    print(s.recv(2048).decode())

This should have the executable bit set.

This then could be passed to a git command as a temporary file or included pre-built, and you add a Unix domain socket path in the PROMPTING_SOCKET_PATH environment variable. It can double as a SSH_ASKPASS prompter, setting the operation to prompt.

This script then makes both SSH and git ask your UNIX domain socket server for user credentials, in a separate connection per user. I've used a generous receiving buffer size, I don't think you'll ever run into an exchange with this protocol that'll exceed it, nor do I see any reason for it to be under-filled. It keeps the script nice and simple.

You could instead use it as the GIT_ASKPASS command, but then you wouldn't get valuable information on the success of credentials for non-ssh connections.

Here is a demo implementation of a UNIX domain socket server that handles git and credential requests from the above credential helper, one that just generates random hex values rather than ask a user:

import asyncio
import os
import secrets
import tempfile

async def handle_git_prompt(reader, writer):
    data = await reader.read(2048)
    info = dict(line.split('=', 1) for line in data.decode().splitlines())
    print(f"Received credentials request: {info!r}")

    response = []
    operation = info.pop('operation', 'get')

    if operation == 'prompt':
        # new prompt for a username or password or pass phrase for SSH
        password = secrets.token_hex(10)
        print(f"Sending prompt response: {password!r}")
        response.append(password)

    elif operation == 'get':
        # new request for credentials, for a username (optional) and password
        if 'username' not in info:
            username = secrets.token_hex(10)
            print(f"Sending username: {username!r}")
            response.append(f'username={username}\n')

        password = secrets.token_hex(10)
        print(f"Sending password: {password!r}")
        response.append(f'password={password}\n')

    elif operation == 'store':
        # credentials were used successfully, perhaps store these for re-use
        print(f"Credentials for {info['username']} were approved")

    elif operation == 'erase':
        # credentials were rejected, if we cached anything, clear this now.
        print(f"Credentials for {info['username']} were rejected")

    writer.write(''.join(response).encode())
    await writer.drain()

    print("Closing the connection")
    writer.close()
    await writer.wait_closed()

async def main():
    with tempfile.TemporaryDirectory() as dirname:
        socket_path = os.path.join(dirname, 'credential.helper.sock')
        server = await asyncio.start_unix_server(handle_git_prompt, socket_path)

        print(f'Starting a domain socket at {server.sockets[0].getsockname()}')

        async with server:
            await server.serve_forever()

asyncio.run(main())

Note that a credential helper could also add quit=true or quit=1 to the output to tell git to not look for any other credential helpers and no further prompting.

You can use the git credential <operation> command to test out that the credential helper works, by passing in the helper script (/full/path/to/credhelper.py) with the git -c credential.helper=... command-line option. git credential can take a url=... string on standard input, it'll parse this out just like git would to contact the credential helpers; see the documentation for the full exchange format specification.

First, start the above demo script in a separate terminal:

$ /usr/local/bin/python3.7 git-credentials-demo.py
Starting a domain socket at /tmp/credhelper.py /var/folders/vh/80414gbd6p1cs28cfjtql3l80000gn/T/tmprxgyvecj/credential.helper.sock

and then try to get credentials from it; I included a demonstration of the store and erase operations too:

$ export PROMPTING_SOCKET_PATH="/var/folders/vh/80414gbd6p1cs28cfjtql3l80000gn/T/tmprxgyvecj/credential.helper.sock"
$ CREDHELPER="/tmp/credhelper.py"
$ echo "url=https://example.com:4242/some/path.git" | git -c "credential.helper=$CREDHELPER" credential fill
protocol=https
host=example.com:4242
username=5b5b0b9609c1a4f94119
password=e259f5be2c96fed718e6
$ echo "url=https://someuser@example.com/some/path.git" | git -c "credential.helper=$CREDHELPER" credential fill
protocol=https
host=example.com
username=someuser
password=766df0fba1de153c3e99
$ printf "protocol=https\nhost=example.com:4242\nusername=5b5b0b9609c1a4f94119\npassword=e259f5be2c96fed718e6" | git -c "credential.helper=$CREDHELPER" credential approve
$ printf "protocol=https\nhost=example.com\nusername=someuser\npassword=e259f5be2c96fed718e6" | git -c "credential.helper=$CREDHELPER" credential reject

and when you then look at the output from the example script, you'll see:

Received credentials request: {'operation': 'get', 'protocol': 'https', 'host': 'example.com:4242'}
Sending username: '5b5b0b9609c1a4f94119'
Sending password: 'e259f5be2c96fed718e6'
Closing the connection
Received credentials request: {'operation': 'get', 'protocol': 'https', 'host': 'example.com', 'username': 'someuser'}
Sending password: '766df0fba1de153c3e99'
Closing the connection
Received credentials request: {'operation': 'store', 'protocol': 'https', 'host': 'example.com:4242', 'username': '5b5b0b9609c1a4f94119', 'password': 'e259f5be2c96fed718e6'}
Credentials for 5b5b0b9609c1a4f94119 were approved
Closing the connection
Received credentials request: {'operation': 'erase', 'protocol': 'https', 'host': 'example.com', 'username': 'someuser', 'password': 'e259f5be2c96fed718e6'}
Credentials for someuser were rejected
Closing the connection

Note how the helper is given a parsed-out set of fields, for protocol and host, and the path is omitted; if you set the git config option credential.useHttpPath=true (or it has already been set for you) then path=some/path.git will be added to the information being passed in.

For SSH, the executable is simply called with a prompt to display:

$ $CREDHELPER "Please enter a super-secret passphrase: "
30b5978210f46bb968b2

and the demo server has printed:

Received credentials request: {'operation': 'prompt', 'prompt': 'Please enter a super-secret passphrase: '}
Sending prompt response: '30b5978210f46bb968b2'
Closing the connection

Just make sure to still set start_new_session=True when starting the git processes to ensure that SSH is forced to use SSH_ASKPASS.

env = {
    os.environ,
    SSH_ASKPASS='../path/to/credhelper.py',
    DISPLAY='dummy value',
    PROMPTING_SOCKET_PATH='../path/to/domain/socket',
}
process = await asyncio.create_subprocess_exec(
    *cmds, stdout=asyncio.subprocess.PIPE, cwd=path, 
    start_new_session=True, env=env)

Of course, how you then handle prompting your users is a separate issue, but your script now has full control (each git command will wait patiently for the credential helper to return the requested information) and you can queue up requests for the user to fill in, and you can cache credentials as needed (in case multiple commands are all waiting for credentials for the same host).




回答2:


General speaking, the recommended way to feed password to git is through "credential helpers" or GIT_ASKPASS, as pointed out by the answer of Martijn, but for Git+SSH, the situation is complicated (more discussion below). So it'd be difficult to set this up correctly across OS. If you just want a quick patch to your script, here is the code that works in both Linux and Windows:

async def run_async(...):
    ...
    process = await asyncio.create_subprocess_exec( *cmds, 
        stdin=asyncio.subprocess.PIPE, 
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE, 
        start_new_session=True, cwd=path)
    stdout, stderr = await process.communicate(password + b'\n')

The parameter start_new_session=True will set a new SID to the child process so that it got assigned a new session which have no controlling TTY by default. Then SSH will be forced to read the password from the stdin pipe. On Windows, start_new_session seems to have no effect (there is no concept of SID on Windows AFAIK).

Unless you plan to implement a Git-credential-manager (GCM) in your project "gita", I won't recommend to feed any password to Git at all (the unix philosophy). Simply set stdin=asyncio.subprocess.DEVNULL and pass None to process.communicate(). This will force Git and SSH to use the existing CM or abort (you can handle the error later). Moreover, I think "gita" doesn't want to mess up with the configuration of other CMs, such as GCM for windows. Thus, do not bother to touch the GIT_ASKPASS or SSH_ASKPASS variables, or any credential.* configuration. It's the user's responsibility (and freedom) to setup a proper GCM for each repo. Usually the Git distribution includes a GCM or an ASKPASS implementation already.

Discussion

There is a common misunderstanding to the problem: Git doesn't open the TTY for password input, SSH does! Actually, other ssh-related utilities, such as rsync and scp, share the same behavior (I figured this out the hard way when debugging a SELinux related problem a few months ago). See the appendix for verification.

Because Git calls SSH as a sub-process, it cannot know whether SSH will open TTY or not. The Git configurables, such as core.askpass or GIT_ASKPASS, will not prevent SSH from opening /dev/tty, at least not for me when testing with Git 1.8.3 on CentOS 7 (detail in the appendix). There are two common cases that you should expect a password prompt:

  • Server requires password authentication;
  • For public-key authentication, the private key storage (in a local file ~/.ssh/id_rsa or PKCS11 chip) is password protected.

In these cases, ASKPASS or GCM won't help you on the deadlock problem. You have to disable the TTY.

You may also want to read about the environment variable SSH_ASKPASS. It points to an executable that will be called when the following conditions are met:

  • No controlling TTY is available to the current session;
  • Env. variable DISPLAY is set.

On Windows, for example, it defaults to SSH_ASKPASS=/mingw64/libexec/git-core/git-gui--askpass. This program comes with the main-stream distribution and the official Git-GUI package. Therefore, on both Windows and Linux desktop environments, if you disable TTY by start_new_session=True and leave the other configurables unchanged, SSH will automatically popup a separate UI window for password prompt.

Appendix

To verify which process opens the TTY, you can run ps -fo pid,tty,cmd when a Git process is waiting for password.

$ ps -fo pid,tty,cmd
3839452 pts/0         \_ git clone ssh://username@hostname/path/to/repo ./repo
3839453 pts/0             \_ ssh username@hostname git-upload-pack '/path/to/repo'

$ ls -l /proc/3839453/fd /proc/3839452/fd
/proc/3839452/fd:
total 0
lrwx------. 1 xxx xxx 64 Apr  4 21:45 0 -> /dev/pts/0
lrwx------. 1 xxx xxx 64 Apr  4 21:45 1 -> /dev/pts/0
lrwx------. 1 xxx xxx 64 Apr  4 21:43 2 -> /dev/pts/0
l-wx------. 1 xxx xxx 64 Apr  4 21:45 4 -> pipe:[49095162]
lr-x------. 1 xxx xxx 64 Apr  4 21:45 5 -> pipe:[49095163]

/proc/3839453/fd:
total 0
lr-x------. 1 xxx xxx 64 Apr  4 21:42 0 -> pipe:[49095162]
l-wx------. 1 xxx xxx 64 Apr  4 21:42 1 -> pipe:[49095163]
lrwx------. 1 xxx xxx 64 Apr  4 21:42 2 -> /dev/pts/0
lrwx------. 1 xxx xxx 64 Apr  4 21:42 3 -> socket:[49091282]
lrwx------. 1 xxx xxx 64 Apr  4 21:45 4 -> /dev/tty



回答3:


I ended up using a simple solution suggested by @vincent, i.e., disable any existing password mechanism by setting the GIT_ASKPASS environment variable, run async on all repos, and re-run the failed ones synchronously.

The main logic changes to

cache = os.environ.get('GIT_ASKPASS')
os.environ['GIT_ASKPASS'] = 'echo'
errors = utils.exec_async_tasks(
    utils.run_async(path, cmds) for path in repos.values())
# Reset context and re-run
if cache:
    os.environ['GIT_ASKPASS'] = cache
else:
    del os.environ['GIT_ASKPASS']
for path in errors:
    if path:
        subprocess.run(cmds, cwd=path)

In run_async and exec_async_tasks, I simply redirect error and return the repo path if subprocess execution fails.

async def run_async(path: str, cmds: List[str]) -> Union[None, str]:
    """
    Run `cmds` asynchronously in `path` directory. Return the `path` if
    execution fails.
    """
    process = await asyncio.create_subprocess_exec(
        *cmds,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
        cwd=path)
    stdout, stderr = await process.communicate()
    stdout and print(stdout.decode())
    if stderr:
        return path

You can see this pull request for the complete change.

Further Update

The PR above resolves the problem when https type remote requires username/password input, but still has problem when ssh requires password input for multiple repos. Thanks to @gdlmx's comment below.

In version 0.9.1, I basically followed @gdlmx's suggestion: disable user input completely when running in the async mode, and the failed repos will run the delegated command again using subprocess serially.



来源:https://stackoverflow.com/questions/55155294/python-asyncio-gets-deadlock-if-multiple-stdin-input-is-needed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!