I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm?
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
for line in table:
for value in line:
if value == None:
value = line[line.index(value)-1]
return table
print treat_missing_values(t)
This is probably how I'd do it:
>>> def treat_missing_values(table):
... for line in table:
... prev = None
... for i, value in enumerate(line):
... if value is None:
... line[i] = prev
... else:
... prev = value
... return table
...
>>> treat_missing_values([[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[1, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]
>>> treat_missing_values([[None, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[None, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]
When you do an assignment in python, you are merely creating a reference on an object in memory. You can't use value to set the object in the list because you're effectively making value reference another object in memory.
To do what you want, you need to set directly in the list at the right index.
As stated, your algorithm won't work if one of the inner lists has None as the first value.
So you can do it like this:
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table, default_value):
last_value = default_value
for line in table:
for index in xrange(len(line)):
if line[index] is None:
line[index] = last_value
else:
last_value = line[index]
return table
print treat_missing_values(t, 0)
That thing about looking up the index from the value won't work if the list start with None or if there's a duplicate value. Try this:
def treat(v):
p = None
r = []
for n in v:
p = p if n == None else n
r.append(p)
return r
def treat_missing_values(table):
return [ treat(v) for v in table ]
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
print treat_missing_values(t)
This better not be your homework, dude.
EDIT A functional version for all you FP fans out there:
def treat(l):
def e(first, remainder):
return [ first ] + ([] if len(remainder) == 0 else e(first if remainder[0] == None else remainder[0], remainder[1:]))
return l if len(l) == 0 else e(l[0], l[1:])
That's because the index
method returns the first occurence of the argument you pass to it. In the first line, for example, line.index(None) will always return 2, because that's the first occurence of None in that list.
Try this instead:
def treat_missing_values(table):
for line in table:
for i in range(len(line)):
if line[i] == None:
if i != 0:
line[i] = line[i - 1]
else:
#This line deals with your other problem: What if your FIRST value is None?
line[i] = 0 #Some default value here
return table
I'd use a global variable to keep track of the most recent valid value. And I'd use map()
for the iteration.
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
prev = 0
def vIfNone(x):
global prev
if x:
prev = x
else:
x = prev
return x
print map( lambda line: map( vIfNone, line ), t )
EDIT: Malvolio, here. Sorry to be writing in your answer, but there were too many mistakes to corrected in a comment.
if x:
will fail for all falsy values (notably 0 and the empty string).- Mutable global values are bad. They aren't thread-safe and produce other peculiar behaviors (in this case, if a list starts with None, it is set to the last value that happened to be processed by your code.
- The re-writing of
x
is unnecessary;prev
always has the right value. - In general, things like this should be wrapped in functions, for naming and for scoping
So:
def treat(n):
prev = [ None ]
def vIfNone(x):
if x is not None:
prev[0] = x
return prev[0]
return map( vIfNone, n )
(Note the weird use of prev as a closed variable. It will be local to each invocation of treat
, and global across all invocations of vIfNone from the same treat
invocation, exactly what you need. For dark and probably disturbing Python reasons I don't understand, it has to be an array.)
EDIT1
# your algorithm won't work if the line start with None
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
for line in table:
for index in range(len(line)):
if line[index] == None:
line[index] = line[index-1]
return table
print treat_missing_values(t)
来源:https://stackoverflow.com/questions/8997279/substituting-missing-values-in-python