Creating an ARFF file from python output

£可爱£侵袭症+ 提交于 2019-12-05 04:07:44

There are details on the ARFF file format here and it's very simple to generate. For example, using a cut-down version of your Python dictionary, the following script:

import re

d = { 'gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': 
      {'dail': 1,
       'focus': 1,
       'actions': 1,
       'trade': 2,
       'protest': 1,
       'identify': 1 }}

for original_filename in d.keys():
    m = re.search('^(.*)\.html$',original_filename,)
    if not m:
        print "Ignoring the file:", original_filename
        continue
    output_filename = m.group(1)+'.arff'
    with open(output_filename,"w") as fp:
        fp.write('''@RELATION wordcounts

@ATTRIBUTE word string
@ATTRIBUTE count numeric

@DATA
''')
        for word_and_count in d[original_filename].items():
            fp.write("%s,%d\n" % word_and_count)

Generates output of the form:

@RELATION wordcounts

@ATTRIBUTE word string
@ATTRIBUTE count numeric

@DATA
dail,1
focus,1
actions,1
trade,2
protest,1
identify,1

... in a file called gardai-plan-crackdown-on-troublemakers-at-protest-2438316.arff. If that's not exactly what you want, I'm sure you can easily alter it. (For example, if the "words" might have spaces or other punctuation in them, you probably want to quote them.)

I know it's pretty easy to generate an arff file on your own, but I still wanted to make it simpler so I wrote a python package

https://github.com/ubershmekel/arff

It's also on pypi so easy_install arff

This project seems to be a bit more up to date. You can install it via

pip:

$ pip install liac-arff

or easy_install:

$ easy_install liac-arff
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!