I have this code:
for a in data_X:
for i in a:
if not i.isdigit():
x=hash(i)
data_X[column,row]=x
row=row+1
r
You're trying to use a list comprehension to create a new list, like this:
desired_array = [int(numeric_string) for numeric_string in data_X]
Since data_X
is a 2D array, each numeric_string
is a 1D array, as long as however many columns you have (at least 7). (The fact that you called it numeric_string
doesn't make it a string.) You can't call int
on that, for exactly the reason that the error message shows.
If this isn't clear, you should try printing out the values:
for numeric_string in data_X:
print(numeric_string)
… and it should be pretty clear that numeric_string
is not a numeric string.
You could fix this with a nested loop. If you don't understand comprehensions that well, write it with explicit loop statements first:
desired_array = []
for row in data_X:
desired_row = []
for col in row:
desired_row.append(int(col))
desired_array.append(desired_row)
… and then you can turn it into a comprehension once you're sure you understand it:
desired_array = [int(numeric_string) for numeric_string in row] for row in data_X]
However, that still doesn't give you a 2D array of ints, it gives you a list of list of ints. It's similar, but it's bigger and slower, and you can't call numpy methods on it. (Althouuh you can still pass it to global numpy functions, at least.)
If you wanted to create a 2D array by looping, you could do that.
But as always with numpy, what you want to do, if at all possible, is used vectorized operations instead of loops. It'll be both a lot simpler and a lot faster, with no real downside.
What you probably want is astype:
desired_array = data_X.astype(np.int64)
It's hard to get any simpler than that. And, unless you wanted an array of dtype=object
holding Python int
values (e.g., because some of your numbers are too big to fit in a native int64), it's exactly what you want.