问题
I have a text file of 25GB. so i compressed it to tar.gz and it became 450 MB. now i want to read that file from python and process the text data.for this i referred question . but in my case code doesn't work. the code is as follows :
import tarfile
import numpy as np
tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
f=tar.extractfile(member)
content = f.read()
Data = np.loadtxt(content)
the error is as follows :
Traceback (most recent call last):
File "dataExtPlot.py", line 21, in <module>
content = f.read()
AttributeError: 'NoneType' object has no attribute 'read'
also, Is there any other method to do this task ?
回答1:
The docs tell us that None is returned by extractfile() if the member is a not a regular file or link.
One possible solution is to skip over the None results:
tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
f = tar.extractfile(member)
if f is not None:
content = f.read()
回答2:
tarfile.extractfile() can return None
if the member is neither a file nor a link. For example your tar archive might contain directories or device files. To fix:
import tarfile
import numpy as np
tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
f = tar.extractfile(member)
if f:
content = f.read()
Data = np.loadtxt(content)
回答3:
You may try this one
t = tarfile.open("filename.gz", "r")
for filename in t.getnames():
try:
f = t.extractfile(filename)
Data = f.read()
print filename, ':', Data
except :
print 'ERROR: Did not find %s in tar archive' % filename
回答4:
You cannot "read" the content of some special files such as links yet tar supports them and tarfile will extract them alright. When tarfile
extracts them, it does not return a file-like object but None. And you get an error because your tarball contains such a special file.
One approach is to determine the type of an entry in a tarball you are processing ahead of extracting it: with this information at hand you can decide whether or not you can "read" the file. You can achieve this by calling tarfile.getmembers()
returns tarfile.TarInfo
s that contain detailed information about the type of file contained in the tarball.
The tarfile.TarInfo
class has all the attributes and methods you need to determine the type of tar member such as isfile()
or isdir()
or tinfo.islnk()
or tinfo.issym()
and then accordingly decide what do to with each member (extract or not, etc).
For instance I use these to test the type of file in this patched tarfile to skip extracting special files and process links in a special way:
for tinfo in tar.getmembers():
is_special = not (tinfo.isfile() or tinfo.isdir()
or tinfo.islnk() or tinfo.issym())
...
来源:https://stackoverflow.com/questions/37474767/read-tar-gz-file-in-python