Substituting missing values in Python

问题

I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm?

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
    for line in table:
        for value in line:
            if value == None:
                value = line[line.index(value)-1]
    return table

print treat_missing_values(t)

回答1:

This is probably how I'd do it:

>>> def treat_missing_values(table):
...     for line in table:
...         prev = None
...         for i, value in enumerate(line):
...             if value is None:
...                 line[i] = prev
...             else:
...                 prev = value
...     return table
... 
>>> treat_missing_values([[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[1, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]
>>> treat_missing_values([[None, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[None, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]

回答2:

When you do an assignment in python, you are merely creating a reference on an object in memory. You can't use value to set the object in the list because you're effectively making value reference another object in memory.

To do what you want, you need to set directly in the list at the right index.

As stated, your algorithm won't work if one of the inner lists has None as the first value.

So you can do it like this:

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table, default_value):
    last_value = default_value
    for line in table:
        for index in xrange(len(line)):
            if line[index] is None:
                line[index] = last_value
            else:
                last_value = line[index]
    return table

print treat_missing_values(t, 0)

回答3:

That thing about looking up the index from the value won't work if the list start with None or if there's a duplicate value. Try this:

def treat(v):
   p = None
   r = []
   for n in v:
     p = p if n == None else n
     r.append(p)
   return r

def treat_missing_values(table):
   return [ treat(v) for v in table ]

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
print treat_missing_values(t)

This better not be your homework, dude.

EDIT A functional version for all you FP fans out there:

def treat(l):
  def e(first, remainder):
     return [ first ] + ([] if len(remainder) == 0 else e(first if remainder[0] == None else remainder[0], remainder[1:]))
  return l if len(l) == 0 else e(l[0], l[1:])

回答4:

That's because the index method returns the first occurence of the argument you pass to it. In the first line, for example, line.index(None) will always return 2, because that's the first occurence of None in that list.

Try this instead:

    def treat_missing_values(table):
        for line in table:
            for i in range(len(line)):
                if line[i] == None:
                    if i != 0:
                        line[i] = line[i - 1]
                    else:
                        #This line deals with your other problem: What if your FIRST value is None?
                        line[i] = 0 #Some default value here
        return table

回答5:

I'd use a global variable to keep track of the most recent valid value. And I'd use map() for the iteration.

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]

prev = 0
def vIfNone(x):
    global prev
    if x:
       prev = x
    else:
       x = prev
    return x

print map( lambda line: map( vIfNone, line ), t )

EDIT: Malvolio, here. Sorry to be writing in your answer, but there were too many mistakes to corrected in a comment.

if x: will fail for all falsy values (notably 0 and the empty string).
Mutable global values are bad. They aren't thread-safe and produce other peculiar behaviors (in this case, if a list starts with None, it is set to the last value that happened to be processed by your code.
The re-writing of x is unnecessary; prev always has the right value.
In general, things like this should be wrapped in functions, for naming and for scoping

So:

def treat(n):
    prev = [ None ]
    def vIfNone(x):
        if x is not None:
           prev[0] = x
        return prev[0]
    return map( vIfNone, n )

(Note the weird use of prev as a closed variable. It will be local to each invocation of treat, and global across all invocations of vIfNone from the same treat invocation, exactly what you need. For dark and probably disturbing Python reasons I don't understand, it has to be an array.)

回答6:

EDIT1

# your algorithm won't work if the line start with None
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
    for line in table:
        for index in range(len(line)):
            if line[index] == None:
                line[index] = line[index-1]
    return table

print treat_missing_values(t)

来源：https://stackoverflow.com/questions/8997279/substituting-missing-values-in-python

标签

python

missing-data