Trying to parse text files in python for data analysis

后端未结

关注

 3  1240

I do a lot of data analysis in perl and I am trying to replicate this work in python using pandas, numpy, matplotlib, etc.

The general workflow goes as follows:

相关标签:

3条回答

鱼传尺愫

2021-01-21 14:18
You are getting the following:
```
NameError: name 'MultiIndex' is not defined
```
because you are not importing MultiIndex directly when you import Series and DataFrame.

You have -
```
from pandas import Series, DataFrame
```
You need -
```
from pandas import Series, DataFrame, MultiIndex
```
or you can instead refer to MultiIndex using pd.MultiIndex since you are importing pandas as pd
0 讨论(0)
发布评论:

提交评论
- 加载中...

甜味超标

2021-01-21 14:31

Hopefully this helps you get started?

import sys, os

def regex_match(line) :
  return 'LOOPS' in line

my_hash = {}

for fd in os.listdir(sys.argv[1]) :           # for each file in this directory 
  for line in open(sys.argv[1] + '/' + fd) :  # get each line of the file
    if regex_match(line) :                    # if its a line I want
      line.rstrip('\n').split('\t')           # get the data I want
      my_hash[line[1]] = line[2]              # store the data

for key in my_hash : # data science can go here?
  do_something(key, my_hash[key] * 12)

# plots

p.s. make the first line

#!/usr/bin/python

(or whatever which python returns ) to run as an executable

0 讨论(0)

说谎

2021-01-21 14:41
To glob your files, use the built-in glob module in Python.

To read your csv files after globbing them, the read_csv function that you can import using from pandas.io.parsers import read_csv will help you do that.

As for MultiIndex feature in the pandas dataframe that you instantiate after using read_csv, you can then use them to organize your data and slice them anyway you want.

3 pertinent links for your reference.
- Understanding MultiIndex dataframes in pandas - understanding MultiIndex and Benefits of panda's multiindex?
- Using glob in a directory to grab and manipulate your files - extract values/renaming filename in python
0 讨论(0)
发布评论:

提交评论
- 加载中...