问题
This is a follow up question to Implementing lists in a for loop in R to produce a table of column names and datatypes from multiple dbfs.
I’m trying to extract the column names and associated datatypes from a number of dbfs and put the results into a table to cross reference which column names and datatypes appear in which dbfs. The dbfs have different numbers of columns so I’ve used rbind
and lapply
to fill missing values with NULL in the resulting table. Although the script I have works to an extent, the column names are only kept from the initial dbf. When new column names appear, the data is added to the table but the columns are given the names V35, V36 etc. instead of the actual column names.
library(foreign)
files <- list.files("path/", full.names = TRUE, pattern = "*.dbf$") #List files
#Get column names and datatypes from dbfs and put into list
colnamesDTList <- list()
for (i in 1:14){
dbfs <- read.dbf(files[i])
ColnamesDT <- lapply(dbfs,class)
ColnamesDTList[[i]] <- ColnamesDT
}
maxLength <- max(lengths(ColnamesDTList)) #Get max length of the lists in ColnamesDTList
#Create a df from the lists in ColnamesDTList, with equal length columns
ColnamesDTDf <- as.data.frame(do.call(rbind, lapply(ColnamesDTList, `length<-`, maxLength)))
#Rename rows
years <- 2005:2018
new.names <-NULL
for(i in 1:14){
new.names[i]<-paste("dbf", years[i], sep="")
}
row.names(ColnamesDTDf)<-new.names
This produces a table like this:
cname1 cname2 cname3 V4 V5
dbf2005 factor factor numeric NULL NULL
dbf2006 numeric factor NULL factor numeric
So instead of producing the actual column names from 2006 they are instead given the generic ‘V’ plus the column number in which they appear. How can I get the table to include the column names from dbf2006?
回答1:
I found a much simpler solution using the compare_df_cols()
function in the janitor
package.
来源:https://stackoverflow.com/questions/64589397/how-to-reproduce-all-column-names-when-producing-a-table-to-cross-reference-colu