问题
I have quite a large conflict dataset (71 million observations) with many variables and date (daily).
This is from the GDELT project for which the way the dataset is structured is that for each day, there is a target country and a source country of aggression. Namely, the first of January of 2000, many countries engaged in aggressive behaviour against others or themselves, and this dataset tracks this.
It looks like this:
clear
input long date_01 str18 source_01 str19 target_01 str4 cameocode_01
20000101 "AFG" "AFGGOV" "020"
20000101 "AFG" "AFGGOV" "0841"
20000101 "AFG" "ARE" "036"
20000101 "AFG" "CVL" "043"
20000101 "AFG" "GOV" "010"
20000101 "AFG" "GOV" "043"
20000101 "AFGGOV" "kasUAF" "0353"
20000101 "AFGGOV" "kasUAF" "084"
20000101 "AFG" "IGOUNO" "030"
20000101 "AFG" "IND" "042"
20000101 "AFG" "IND" "043"
end
What I would like to do is to isolate these events per country.
For instance, I would like to create a variable for the US where, for each date, I have all the times that the US was either a target or a source, and their respective cameo code. I have a considerable number of countries but only need a subset of them and I know their names in advance.
As you can see in the example, the first variable is date, which for these cells is always 2000101
but after a couple of hundreds observations it changes to 2000102
, denoting a change in day.
The second variable source_01
is a country attacking another one. In the example, IND
is India, AFG
is Afghanistan and the other codes are other countries.
The third variable target_01
is just the victim of the conflict.
Finally, cameocode_01
is a level of intensity of conflict measured with some algorithm that tracks the news in each language.
What I am after is to create a new (per country) variable that extracts the cameo code of that event if a specific country is involved either as source or target.
For this specific example, below is my desired output for the case of India (code IND
), which is involved in two events on the specific date:
date INDIAcameo
20000101 "042"
20000101 "043"
I have tried this:
replace INDIA cameo=cameocode if "target" ~ "source" ==IND
However, it says type mismatch and I doubt it would give me what I look for anyway.
回答1:
If you know in advance the countries you are interested in, then the following will work:
clear
input long date_01 str18 source_01 str19 target_01 str4 cameocode_01
20000101 "AFG" "AFGGOV" "020"
20000101 "AFG" "IND" "043"
20000101 "AFG" "AFGGOV" "0841"
20000101 "AFG" "ARE" "036"
20000101 "AFG" "CVL" "043"
20000101 "AFG" "GOV" "010"
20000101 "AFG" "GOV" "043"
20000101 "AFGGOV" "kasUAF" "0353"
20000101 "AFGGOV" "kasUAF" "084"
20000101 "AFG" "IGOUNO" "030"
20000102 "AFG" "IND" "042"
end
foreach c in AFG IND ARE {
generate ind_`c' = cameocode_01 if strmatch(source_01, "`c'") | ///
strmatch(target_01, "`c'")
}
Note that I have slightly modified your example for better illustration.
To see the results:
list, sepby(date) abbreviate(15)
+-------------------------------------------------------------------------------+
| date_01 source_01 target_01 cameocode_01 ind_AFG ind_IND ind_ARE |
|-------------------------------------------------------------------------------|
1. | 20000101 AFG AFGGOV 020 020 |
2. | 20000101 AFG IND 043 043 043 |
3. | 20000101 AFG AFGGOV 0841 0841 |
4. | 20000101 AFG ARE 036 036 036 |
5. | 20000101 AFG CVL 043 043 |
6. | 20000101 AFG GOV 010 010 |
7. | 20000101 AFG GOV 043 043 |
8. | 20000101 AFGGOV kasUAF 0353 |
9. | 20000101 AFGGOV kasUAF 084 |
10. | 20000101 AFG IGOUNO 030 030 |
|-------------------------------------------------------------------------------|
11. | 20000102 AFG IND 042 042 042 |
+-------------------------------------------------------------------------------+
or
foreach v of varlist ind* {
sort date `v'
list date `v' if !missing(`v'), sepby(date) abbreviate(15)
}
+--------------------+
| date_01 ind_AFG |
|--------------------|
3. | 20000101 010 |
4. | 20000101 020 |
5. | 20000101 030 |
6. | 20000101 036 |
7. | 20000101 043 |
8. | 20000101 043 |
9. | 20000101 043 |
10. | 20000101 0841 |
|--------------------|
11. | 20000102 042 |
+--------------------+
+--------------------+
| date_01 ind_IND |
|--------------------|
10. | 20000101 043 |
|--------------------|
11. | 20000102 042 |
+--------------------+
+--------------------+
| date_01 ind_ARE |
|--------------------|
10. | 20000101 036 |
+--------------------+
来源:https://stackoverflow.com/questions/55934849/creating-variable-out-of-conditional-values-in-another-one