Parsing a text file into a list in python

问题

I'm completely new to Python, and I'm trying to read in a txt file that contains a combination of words and numbers. I can read in the txt file just fine, but I'm struggling to get the string into a format I can work with.

import matplotlib.pyplot as plt
import numpy as np
from numpy import loadtxt

f= open("/Users/Jennifer/Desktop/test.txt", "r")

lines=f.readlines()

Data = []

list=lines[3]
i=4
while i<12:
        list=list.append(line[i])
        i=i+1

print list

f.close()

I want a list that contains all the elements in lines 3-12 (starting from 0), which is all numbers. When I do print lines[1], I get the data from that line. When I do print lines, or print lines[3:12], I get each character preceded by \x00. For example, the word "Plate" becomes: ['\x00P\x00l\x00a\x00t\x00e. Using lines = [line.strip() for line in f] gets the same result. When I try to put individual lines together in the while loop above, I get the error "AttributeError: 'str' object has no attribute 'append'."

How can I get a selection of lines from a txt file into a list? Thank you so much!!!

Edit: The txt file looks like this:

BLOCKS= 1 Plate: Phosphate Noisiness Assay 2000x 1.3 PlateFormat Endpoint Absorbance Raw FALSE 1 1 650 1 12 96 1 8
Temperature(¡C) 1 2 3 4 5 6 7 8 9 10 11 12
21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227
0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197
0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451
0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235
0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191
0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273
0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038
0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062

~End Original Filename: 2013-08-06 Phosphate Noisiness; Date Last Saved: 8/6/2013 7:00:55 PM

Update I used this code:

f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()

first_twelve = file_list[3:11]

data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]

print data

to get this result: [' 21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227 ', ' 0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197 ', ' 0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451 ', ' 0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235 ', ' 0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191 ', ' 0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273 ', ' 0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038 ', ' 0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062 ']

Which is (correct me if I'm wrong, very new to Python!) a list of lists, which I should be able to work with. Thank you so much to everyone who responded!!!

回答1:

When you write the code lines = f.readlines() a list of lines is being return to you. When you then say lines[3], you're getting the 3rd line. Thats why you're ending up with individual characters.

All you need to do is say

files = open("Your File.txt")

file_list =  files.readlines()

first_twelve = file_list[0:12] #returns a list with the first 12 lines

Once you've got the first_twelve array you can do whatever you want with it.

To print each line you would do:

for each_line in first_twelve:
    print each_line

That should work for you.

回答2:

You have the line list=lines[3] in your source code.

Two issues here.

Don't use list as a variable name. You silently overwrote the built-in list constructor when you did that.
When you take one item from a list lines[3] now you only have that object -- in this case a string. When you try to append to it you can't -- it isn't a list.

You can demonstrate your bug easily in the console:

>>> li=['1']
>>> li.append('2')
>>> li
['1', '2']
>>> st='1'
>>> st.append('2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'append'

Other comments, in general, on your code.

Assume you have a text file called '/tmp/test/txt' that contains this text:

Line 1
Line 2
...
Line 19

Reading the contents of that file is a simple as this:

with open('/tmp/test.txt', 'r') as fin:
    lines=fin.readlines()

If you want a subset of the lines, you can use a slice:

subset=lines[3:12]

If you want to process each line for something, like strip the carriage return, use the file object as an iterator:

with open('/tmp/test.txt', 'r') as fin:
    lines=[]
    for line in fin:
        lines.append(line.strip())

For your specific problem of having NULs in the data, perhaps you are reading a binary file masquerading as text? You need to post an example of the file.

Edit

Your file contains Unicode characters. (right after 'Temperature') which may be some of the odd characters you are seeing. If you are only interested in the lines with numbers, you can ignore them.

You do not YET have a list of lists, but it easy to get:

data=[]                               # will hold the lines of the file
with open(ur_file,'rU') as fin:       
    for line in fin:                  # for each line of the file
        line=line.strip()             # remove CR/LF
        if line:                      # skip blank lines
            data.append(line)

print data                            # list of STRINGS separated by spaces
matrix=[map(float,line.split()) for line in data[3:10]]  # convert the strings..
print matrix                          # NOW you have a list of list of floats...

回答3:

The tweak below might help you to get rid of the \00 character embedded in your data

f = open("/Users/Jennifer/Desktop/test.text", "r")

lines = f.readlines()
lines = [x.replace('\x00','') for x in lines]

for i in range(3,12):
    l = []
    l.append(lines[i])

I am not sure if your data has other delimiters (say comma or space) to separate the numbers. If so, a simple split will help to convert the line into a list:

line = '123.00,456.00,789.00'

l = line.split(',')  # list will become ['123.00','456.00','789.00']

Edit

Continue from Rachel's updated code:

f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()

first_twelve = file_list[3:11]

data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]

items = []
for dataline in data:
    items += dataline.split(' ')
items = [float(x) for x in items if len(x) > 0]  # remove dummy items left in the list

print items

来源：https://stackoverflow.com/questions/18304835/parsing-a-text-file-into-a-list-in-python

标签

python

string

list

lines