How to tell if a value changed over dimension(s) in Pandas

问题

Let's say that I have some customer data over some dates and I want to see if for example their address has changed. Over those dates. Ideally, i'd like to copy the two columns where the changes occurred into a new table or just get a metric for the amount of total changes.

So, if I had a table like

Date , Customer , Address
12/31/14, Cust1, 12 Rocky Hill Rd
12/31/15, Cust1, 12 Rocky Hill Rd
12/31/16, Cust1, 14 Rocky Hill Rd
12/31/14, Cust2, 12 Testing Rd
12/31/15, Cust2, 12 Testing Ln
12/31/16, Cust2, 12 Testing Rd

I would end up with a count of two changes, Customer 1's change between12 Rocky Hill Rd between 12/31/15 and 12/31/16 and Cust2's change between 12/31/14 and 12/31/15.

Ideally I could get a table like this

Dates , Customer , Change
12/31/15 to 12/31/16, Cust1, 12 Rocky Hill Rd to 14 Rocky Hill Rd
12/31/14 to 12/31/15, Cust2, 12 Testing Rd to 12 Testing Ln

Or even just a total count of changes would be great. Any ideas? Ideally, i'd have any more dates, possibly multiple changes across those dates, and potentially additional columns i'd like to be checking for changes in as well. Really, just a summation of changes to a customer record over some date period for each column would suffice.

I'm new to Panda's and not really sure exactly where to start on this.

Edit: As I note on the solution below, i'd like to be able to pass a larger dataframe with more than just an address to detect changes. In example, I've accomplished this in R with something like the following: `enter code here

`#How many changes have occured (unique values - 1)
UniLen <-  function(x){
  x <- length(unique(x))-1
  return(x)
}
#Create a vector of Address Features to check for changes in
Address_Features <- c("AddrLine1", "AddrLine2", "AddrLine3", "CityName", "State", "ZipCodeNum", "County")
#Check for changes in each address 'use this address for description' for each customer
AddressChanges_Detail <- mktData[,c("CustomerNumEID","AddressUniqueRelationDesc",Address_Features)] %>%
  group_by(CustomerNumEID, AddressUniqueRelationDesc) %>%
  summarise_each(funs(UniLen))

#Summarise results (how many changes for each feature)
AddressChanges_Summary <- AddressChanges_Detail[,Address_Features] %>%
  summarise_each(funs(sum))

This allows us to count how many changes occur, but i'm missing out on the date the change occurred and what the feature was changed from and to... It seems the Python solution you've proposed solves for that with the use of .shift instead of just a summary of unique values on some group. Ideally i'd like the best of both worlds :).

回答1:

df

Input dataframe

    Date    Customer    Address
0   12/31/14    Cust1   12 Rocky Hill Rd
1   12/31/15    Cust1   12 Rocky Hill Rd
2   12/31/16    Cust1   14 Rocky Hill Rd
3   12/31/14    Cust2   12 Testing Rd
4   12/31/15    Cust2   12 Testing Ln
5   12/31/16    Cust2   12 Testing Rd

Address change function:

def changeAdd(x):
    x=x[x.Address != x.shift(-1).Address]
    df1 = pd.DataFrame({'Date':x.shift(1).Date + ' to '+ x.Date,
              'Customer':x.Customer.max(),
              'Address':x.shift(1).Address +' to ' + x.Address})
    return df1[df1.Address.notnull()]


dm = df.groupby('Customer')\
   .apply(changeAdd)\
   .reset_index(drop=True)[['Date','Customer','Address']]

dm

Output dataframe:

Date    Customer    Address
0   12/31/15 to 12/31/16    Cust1   12 Rocky Hill Rd to 14 Rocky Hill Rd
1   12/31/14 to 12/31/15    Cust2   12 Testing Rd to 12 Testing Ln
2   12/31/15 to 12/31/16    Cust2   12 Testing Ln to 12 Testing Rd

来源：https://stackoverflow.com/questions/42959330/how-to-tell-if-a-value-changed-over-dimensions-in-pandas

标签

python

pandas

difference

array-difference