问题
I am trying to import
hundreds of U.S. county xls
files together to form a complete dataset in Stata. The problem is that for every county, I have several files for different years, so that my list of file names looks like this:
county1-year1970.xls
county1-year1975.xls
county2-year1960.xls
county2-year1990.xls
For each county, I only want the file from the most recent year (which varies across counties).
So far, I have written code to loop through each possible file name, and if the file exists, to store the year in a local macro maxyear
:
local years = 0
forvalues i = 1/500 {
forvalues yr = 1900/2018 {
capture confirm file county`i'-year`yr'.xls
if _rc == 0 {
local years `years' `yr'
}
}
/* [code to extract the max value in `years'] */
import excel county`i'-year`maxyear'.xls, clear
}
The loop seems to work, but it is still missing code that will extract the maximum value from the local list `years'. I want to use that maximum value to import the Excel sheet.
How can I identify the maximum value in a local macro or is there a simpler way to get what I want?
回答1:
As you are looping over years from first possible to last possible, all you need is to keep track of the last valid year:
forval i = 1/500 {
local maxyear
forval yr = 1900/2018 {
capture confirm file county`i'-year`yr'.xls
if _rc == 0 local maxyear `yr'
}
if "`maxyear'" != "" {
import excel county`i'-year`maxyear'.xls, clear
}
}
Otherwise put, keeping a record of all the years that were valid, and then finding the maximum over those, is more work than you need to do. (But notice that as you loop over increasing years, the maximum would just be the last item in your list.)
This answer is close to the question, but @Pearly Spencer's answer is a neater solution in this case.
回答2:
The following works for me and is more efficient:
forvalues i = 1 / 2 {
local files `: dir . files "county`i'*"'
display "`: word `: word count `files'' of `files''"
}
county1-year1975.xls
county2-year1990.xls
I use the display
command here for illustration but you can also use import
instead.
The idea here is that if you know the number of files beginning with the county
prefix (county1
, county2
etc.), you can get the files names for each prefix in a local macro using the macro extended function dir
. Then you simply count the number of words there and get the last one.
Note that in this case the local macro will already be sorted alphabetically. However, more generally you can sort the items in a macro with the macro extended function list sort
.
For example:
local files : list sort files
The following uses mata
to circumvent the maximum character limitation in Stata's local macros:
forvalues i = 1 / 2 {
mata: fl = sort(dir(".", "files", "county`i'*"), 1); st_local("file", fl[rows(fl)])
display "`file'"
}
This approach will be useful if you have a large number of files, the names of which cannot all fit in a local macro.
回答3:
May I borrow Nick's code?
forval i = 1/500 {
foreach yr of numlist 2018(-1)1900 {
capture confirm file county`i'-year`yr'.xls
if _rc == 0 {
import excel county`i'-year`yr'.xls, clear
continue, break
}
}
}
Please let me know if this does not work as I can't test it on my side. However, my logic is to start with the largest number in yr
, find the first one for a county
, then break
the loop, move to the next county.
来源:https://stackoverflow.com/questions/58721698/how-can-i-import-specific-files