发表新帖

发表新帖

Python Extract data from file

前端未结

关注

 1  1109

I have a text file just say

text1 text2 text text
text text text text

I am looking to firstly count the number of strings in the file (all deli

相关标签:

1条回答

旧巷少年郎

2021-02-15 14:00
To read a file line by line, just loop over the open file object in a for loop:
```
for line in open(filename):
    # do something with line
```
To split a line by whitespace into a list of separate words, use str.split():
```
words = line.split()
```
To count the number of items in a python list, use len(yourlist):
```
count = len(words)
```
To select the first two items from a python list, use slicing:
```
firsttwo = words[:2]
```
I'll leave constructing the complete program to you, but you won't need much more than the above, plus an if statement to see if you already have your two words.

The three extra bytes you see at the start of your file are the UTF-8 BOM (Byte Order Mark); it marks your file as UTF-8 encoded, but it is redundant and only really used on Windows.

You can remove it with:
```
import codecs
if line.startswith(codecs.BOM_UTF8):
    line = line[3:]
```
You may want to decode your strings to unicode using that encoding:
```
line = line.decode('utf-8')
```
You could also open the file using codecs.open():
```
file = codecs.open(filename, encoding='utf-8')
```
Note that codecs.open() will not strip the BOM for you; the easiest way to do that is to use .lstrip():
```
import codecs
BOM = codecs.BOM_UTF8.decode('utf8')
with codecs.open(filename, encoding='utf-8') as f:
    for line in f:
        line = line.lstrip(BOM)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题