Extracting specific text between strings

微笑、不失礼 提交于 2021-02-11 15:37:30

问题


I am trying to extract specific lines from a .txt file, corresponding to 7 particular devices (0-6), and then operate on that data.

Here is an example:

From a very large file, I extract an event (here 169139), which contains information from 6 of the 7 devices (here just 1,2,3,4,5,6 because Device 0 has no data). For each such event, I don't know a priori, how many devices will give active as their output. It can be all, it can be none, or it can be some.

=== 169139 ===
Start: 4.80374e+19
End:   4.80374e+19
--- 1 ---
Pix 9, 66
--- 2 ---
Pix 11, 31
Pix 12, 31
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
--- 4 ---
Pix 44, 64
--- 5 ---
Pix 49, 133
Pix 48, 133
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144 

The events are easily iterable and I can select the whole information on the screen until the next one (here, the next line from the .txt would be === 169140 ===).

I am able to extract information from a particular device using the following code:

def start_stop_plane (list, dev):
    start_reading = [i for i in range(len(list)) if list[i] == "--- " + str(dev) + " ---"][0]
    stop_reading = [i for i in range(len(list)) if list[i] == "--- " + str(int(dev)+1) + " ---"][0]
    return list[start_reading:stop_reading]

Here, list is the first code comment (the full event). It is a list produced in a similar manner to the code above, exchanging --- with === string occurrences (ie, the flag between events).

My problem: This works for everything from 0 to 5. For 6 it crashes because there is no int(dev)+1. I tried putting an or in the stop_reading to identify an occurrence of === but it did not work.

In this case, How can I signal the end of the list and make sure I don't lose any device?


回答1:


You should prepare your "--- plane ---" marker and let python find it for you using basic functions such as in and .index.

To get the subset of data lines up to the next marker, you could use takewhile from itertools:

data="""=== 169139 ===
Start: 4.80374e+19
End:   4.80374e+19
--- 1 ---
Pix 9, 66
--- 2 ---
Pix 11, 31
Pix 12, 31
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
--- 4 ---
Pix 44, 64
--- 5 ---
Pix 49, 133
Pix 48, 133
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144""".split("\n")

from itertools import takewhile
def planeData(data,plane):
    marker = f"--- {plane} ---"
    if marker not in data: return []
    start = data.index(marker)+1
    return list(takewhile(lambda d:not d.startswith("---"),data[start:]))

output:

for line in planeData(data,0): print(line)
# nothing printed

for line in planeData(data,5): print(line)
# Pix 49, 133
# Pix 48, 133

for line in planeData(data,6): print(line)
# Pix 49, 133
# Pix 48, 133
# Pix 109, 143
# Pix 108, 143
# Pix 108, 144 
# Pix 109, 144



回答2:


You could use string Index

Code

def start_stop_dev(lst, dev):
    " Assume you meant dev rather than plane "
    try:
      start_reading = lst.index("--- " + str(dev) + " ---")
    except:
      return ""   # No device

    try:
      stop_reading = lst.index("--- " + str(dev+1) + " ---") - 1
    except:
      stop_reading = len(lst)

    if start_reading:
        return lst[start_reading:stop_reading]
    else:
      return None  # not really possible since return "" earlier

Test

lst= """=== 169139 ===
Start: 4.80374e+19
End:   4.80374e+19
--- 1 ---
Pix 9, 66
--- 2 ---
Pix 11, 31
Pix 12, 31
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
--- 4 ---
Pix 44, 64
--- 5 ---
Pix 49, 133
Pix 48, 133
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144"""

# Retrieve and print data for each device
print('----------------Individual Device String Info-------------')
for dev in range(7):
  print(f'device {dev}\n{start_stop_dev(lst, dev)}')

print('----------------Splits of String Info----------------------')
for dev in range(7):
  dev_lst = start_stop_dev(lst,dev).split("\n")
  print(f'dev {dev}: {dev_lst}')

Output ----------------Individual Device String Info-------------

device 0

device 1
--- 1 ---
Pix 9, 66
device 2
--- 2 ---
Pix 11, 31
Pix 12, 31
device 3
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
device 4
--- 4 ---
Pix 44, 64
device 5
--- 5 ---
Pix 49, 133
Pix 48, 133
device 6
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144
----------------Splits of String Info----------------------
dev 0: ['']
dev 1: ['--- 1 ---', 'Pix 9, 66']
dev 2: ['--- 2 ---', 'Pix 11, 31', 'Pix 12, 31']
dev 3: ['--- 3 ---', 'Pix 17, 53', 'Pix 16, 53', 'Pix 16, 54']
dev 4: ['--- 4 ---', 'Pix 44, 64']
dev 5: ['--- 5 ---', 'Pix 49, 133', 'Pix 48, 133']
dev 6: ['--- 6 ---', 'Pix 109, 143', 'Pix 108, 143', 'Pix 108, 144 ', 'Pix 109, 144']


来源:https://stackoverflow.com/questions/61428166/extracting-specific-text-between-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!