I\'m getting started on a Tensorflow project, and am in the middle of defining and creating my feature columns. However, I have hundreds and hundreds of features- it\'s a pr
I used your own answer. Just edited a little bit (there should be my_columns
instead of my_column
in for
loop) and posting it the way it worked for me.
import pandas.api.types as ptypes
my_columns = []
for col in df.columns:
if ptypes.is_string_dtype(df[col]): #is_string_dtype is pandas function
my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col,
hash_bucket_size= len(df[col].unique())))
elif ptypes.is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
my_columns.append(tf.feature_column.numeric_column(col))
What you have posted in the question makes sense. Small extension based on your own code:
import pandas.api.types as ptypes
my_columns = []
for col in df.columns:
if ptypes.is_string_dtype(df[col]):
my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col,
hash_bucket_size= len(df[col].unique())))
elif ptypes.is_numeric_dtype(df[col]):
my_columns.append(tf.feature_column.numeric_column(col))
elif ptypes.is_categorical_dtype(df[col]):
my_columns.append(tf.feature_column.categorical_column(col,
hash_bucket_size= len(df[col].unique())))
The above two methods works only if the data is provided in pandas data frame where you have column name for each column. But, in case you have all numeric column and you don't want to name those columns. for e.g. reading several numerical columns from a numpy array, you can use something like this:-
feature_column = [tf.feature_column.numeric_column(key='image',shape=(784,))]
input_fn = tf.estimator.inputs.numpy_input_fn(dict({'image':x_train})
where X_train is your numy array with 784 columns. You can check this post by Vikas Sangwan for more details.