问题
Python 2.7, numpy, create levels in the form of a list of factors.
I have a data file which list independent variables, the last column indicates the class. For example:
2.34,4.23,0.001, ... ,56.44,2.0,"cloudy with a chance of rain"
Using numpy, I read all the numeric columns into a matrix, and the last column into an array which I call "classes". In fact, I don't know the class names in advance, so I do not want to use a dictionary. I also do not want to use Pandas. Here is an example of the problem:
classes = ['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd']
type (classes)
<type 'list'>
classes = numpy.array(classes)
type(classes)
<type 'numpy.ndarray'>
classes
array(['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd'],
dtype='|S1')
# requirements call for a list like this:
# [0, 1, 2, 2, 1, 0, 3]
Note that the target class may be very sparse, for example, a 'z', in perhaps 1 out of 100,000 cases. Also note that the classes may be arbitrary strings of text, for example, scientific names.
I'm using Python 2.7 with numpy, and I'm stuck with my environment. Also, the data has been preprocessed, so it's scaled and all values are valid - I do not want to preprocess the data a second time to extract the unique classes and build a dictionary before I process the data. What I'm really looking for was the Python equivalent to the stringAsFactors
parameter in R that automatically converts a string vector to a factor vector when the script reads the data.
Don't ask me why I'm using Python instead of R - I do what I'm told.
Thanks, CC.
回答1:
You could use np.unique with return_inverse=True
to return both the unique class names and a set of corresponding integer indices:
import numpy as np
classes = np.array(['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd'])
classnames, indices = np.unique(classes, return_inverse=True)
print(classnames)
# ['a' 'b' 'c' 'd']
print(indices)
# [0 1 2 2 1 0 0 3]
print(classnames[indices])
# ['a' 'b' 'c' 'c' 'b' 'a' 'a' 'd']
The class names will be sorted in lexical order.
来源:https://stackoverflow.com/questions/34682420/python-how-to-convert-a-string-array-to-a-factor-list