my problem is the following:
My pythons script receives data via sys.stdin, but it needs to wait until new data is available on sys.stdin.
As described in th
This actually works flawlessly (i.e. no runnaway CPU) - when you call the script from the shell, like so:
tail -f input-file | yourscript.py
Obviously, that is not ideal - since you then have to write all relevant stdout to that file -
but it works without a lot of overhead!
Namely because of using readline()
- I think:
while 1:
line = sys.stdin.readline()
It will actually stop and wait at that line until it gets more input.
Hope this helps someone!
The following should just work.
import sys
for line in sys.stdin:
# whatever
Rationale:
The code will iterate over lines in stdin as they come in. If the stream is still open, but there isn't a complete line then the loop will hang until either a newline character is encountered (and the whole line returned) or the stream is closed (and the whatever is left in the buffer is returned).
Once the stream has been closed, no more data can be written to or read from stdin. Period.
The reason that your code was overloading your cpu is that once the stdin has been closed any subsequent attempts to iterate over stdin will return immediately without doing anything. In essence your code was equivalent to the following.
for line in sys.stdin:
# do something
while 1:
pass # infinite loop, very CPU intensive
Maybe it would be useful if you posted how you were writing data to stdin.
EDIT:
Python will (for the purposes of for loops, iterators and readlines() consider a stream closed when it encounters an EOF character. You can ask python to read more data after this, but you cannot use any of the previous methods. The python man page recommends using
import sys
while True:
line = sys.stdin.readline()
# do something with line
When an EOF character is encountered readline will return an empty string. The next call to readline will function as normal if the stream is still open. You can test this out yourself by running the command in a terminal. Pressing ctrl+D will cause a terminal to write the EOF character to stdin. This will cause the first program in this post to terminate, but the last program will continue to read data until the stream is actually closed. The last program should not 100% your CPU as readline will wait until there is data to return rather than returning an empty string.
I only have the problem of a busy loop when I try readline from an actual file. But when reading from stdin, readline happily blocks.
I know I am bringing old stuff to life, but this seems to be one of the top hits on the topic. The solution Abalus has settled for has fixed time.sleep each cycle, regardles if the stdin is actually empty and the program should be idling or there are a lot of lines waiting to be processed. A small modification makes the program process all messages rapidly and wait only if the queue is actually empty. So only one line that arrives during the sleep period can wait, the others are processed without any lag.
This example is simply reversing the input lines, if you submit only one line it responds in a second (or whatever your sleep period is set), but can also process something like "ls -l | reverse.py" really quickly. The CPU load for such approach is minimal even on embedded systems like OpenWRT.
import sys
import time
while True:
line=sys.stdin.readline().rstrip()
if line:
sys.stdout.write(line[::-1]+'\n')
else:
sys.stdout.flush()
time.sleep(1)