Trying to parse text files in python for data analysis

后端 未结 3 1234
执念已碎
执念已碎 2021-01-21 13:38

I do a lot of data analysis in perl and I am trying to replicate this work in python using pandas, numpy, matplotlib, etc.

The general workflow goes as follows:

相关标签:
3条回答
  • 2021-01-21 14:18

    You are getting the following:

    NameError: name 'MultiIndex' is not defined
    

    because you are not importing MultiIndex directly when you import Series and DataFrame.

    You have -

    from pandas import Series, DataFrame
    

    You need -

    from pandas import Series, DataFrame, MultiIndex
    

    or you can instead refer to MultiIndex using pd.MultiIndex since you are importing pandas as pd

    0 讨论(0)
  • 2021-01-21 14:31

    Hopefully this helps you get started?

    import sys, os
    
    def regex_match(line) :
      return 'LOOPS' in line
    
    my_hash = {}
    
    for fd in os.listdir(sys.argv[1]) :           # for each file in this directory 
      for line in open(sys.argv[1] + '/' + fd) :  # get each line of the file
        if regex_match(line) :                    # if its a line I want
          line.rstrip('\n').split('\t')           # get the data I want
          my_hash[line[1]] = line[2]              # store the data
    
    for key in my_hash : # data science can go here?
      do_something(key, my_hash[key] * 12)
    
    # plots
    

    p.s. make the first line

    #!/usr/bin/python
    

    (or whatever which python returns ) to run as an executable

    0 讨论(0)
  • 2021-01-21 14:41

    To glob your files, use the built-in glob module in Python.

    To read your csv files after globbing them, the read_csv function that you can import using from pandas.io.parsers import read_csv will help you do that.

    As for MultiIndex feature in the pandas dataframe that you instantiate after using read_csv, you can then use them to organize your data and slice them anyway you want.

    3 pertinent links for your reference.

    • Understanding MultiIndex dataframes in pandas - understanding MultiIndex and Benefits of panda's multiindex?
    • Using glob in a directory to grab and manipulate your files - extract values/renaming filename in python
    0 讨论(0)
提交回复
热议问题