问题
I got a piece of code on SAS that predicts consumer behavior. So far I did 50 samples with 50 logistic regression by hand, but I'd like to automate this process. Steps are as follows:
- Create a table with all client having value "1"
- Create a table with all client having value "0"
- (below code starts here) Start a loop that:
- Get a sample of 3000 people from client having value "1"
- Get a sample of 3000 people from client having value "0"
- Join those two tables
- Logistic regression which should get as output (ROC value, and Maximum Likelihood Estimates)
- Get a all probabilities in the same file MODELE_RESULTS
You'll find below a piece of the code. Can you advice me on how to loop this logistic regression 50 times please? So far I can't make it work... I'm beginner in SQL
%macro RunReg (DSName, NumVars) ;
%do i=1 %to &NumVars
/* Create a 3000 people sample called TOP_1*/
PROC SURVEYSELECT DATA= TOP_1
OUT= ALEA_1
METHOD=SRS
N=3000;
QUIT;
/* Create a 3000 people sample called TOP_0*/
PROC SURVEYSELECT DATA= TOP_0
OUT= ALEA_0
METHOD=SRS
N=3000;
QUIT;
/*Append both tables */
PROC SQL;
CREATE TABLE BOTH_SAMPLES As
SELECT * FROM TOP_1
OUTER UNION CORR
SELECT * FROM TOP_0;
QUIT;
/* Logistic regression*/
DATA WORK.&DSName noprint
Outset=PE(rename(x&i=Value));
Model Y = x&I;
SET WORK.APPEND_TABLE(IN=__ORIG) WORK.BASE_PREDICT_2;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
RUN;
PROC SQL;
CREATE VIEW WORK.SORTTempTableSorted AS
SELECT *
FROM WORK.TMP0TempTableAddtnlPredictData
;
QUIT;
TITLE;
TITLE1 "Résultats de la régression logistique";
FOOTNOTE;
FOOTNOTE1 "Généré par le Système SAS (&_SASSERVERNAME, &SYSSCPL) le %TRIM(%QSYSFUNC(DATE(), NLDATE20.)) à %TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC LOGISTIC DATA=WORK.SORTTempTableSorted
PLOTS(ONLY)=NONE
;
CLASS age_classe (PARAM=EFFECT) Flag_bq_principale (PARAM=EFFECT) flag_univers_detenus (PARAM=EFFECT) csp_1 (PARAM=EFFECT) SGMT_FIDELITE (PARAM=EFFECT) situ_fam_1 (PARAM=EFFECT);
MODEL TOP_CREDIT_HABITAT_2017 (Event = '1')=top_situ_particuliere top_chgt_csp_6M top_produit_monetaire_bloque top_CREDIT top_chgt_contrat_travail_6M top_credit_CONSO top_credit_HABITAT top_produit_monetaire_dispo top_VM_autres top_Sicav top_produit_epargne_logement top_Predica top_ferm_prod_6M top_ouv_prod_6M top_produit_Assurance top_produit_Cartes top_produit_Credit "moy_surface_financière_6M"n moy_surf_financiere_ecart_6M moy_encours_dav_6M moy_encours_dav_ecart_6M moy_monetaire_dispo_6M moy_monetaire_dispo_ecart_6M moy_emprunts_6M moy_emprunts_ecarts_6M moy_sicav_6M moy_sicav_ecart_6M moy_vm_autres_6M moy_vm_autres_ecart_6M moy_predica_6M moy_predica_ecart_6M moy_bgpi_6M moy_bgpi_ecart_6M moy_epargne_logement_6M moy_epargne_logement_ecart_6M "moy.an_mt_flux_cred_norme_B2"n "moy.an_mt_op_cred_ep_a_terme"n "moy.an_mt_op_debit_ep_a_terme"n "moy.an_mt_ope_credit_depot"n "moy.an_mt_ope_credit_ep_a_vue"n "moy.an_mt_ope_debit_depot"n "moy.an_mt_ope_debit_ep_a_vue"n "moy.an_mt_pmts_carte_etr"n "moy.an_mt_remise_chq"n "moy.an_mt_paie_carte"n "moy.an_mt_paie_chq"n "moy.an_nb_paie_carte"n "moy.an_nb_paie_chq"n "moy.an_mt_ret_carte_Aut_bq"n "moy.an_mt_ret_carte_CRCA"n "moy.an_mt_ret_carte_etr"n "moy.an_nb_flux_cred_normeB2"n "moy.an_nb_ope_credit_ep_a_terme"n "moy.an_nb_ope_debit_ep_a_terme"n "moy.an_nb_ope_credit_depot"n "moy.an_nb_ope_credit_ep_a_vue"n "moy.an_nb_ope_debit_depot"n "moy.an_nb_ope_debit_ep_a_vue"n "moy.an_nb_pmts_carte_etr"n "moy.an_nb_remise_chq"n "moy.an_nb_ret_carte_Aut_bq"n "moy.an_nb_ret_carte_CRCA"n "moy.an_nb_ret_carte_etr"n "moy.an_nb_ret_carte"n "moy.an_mt_factu_ttc"n "moy.an_mt_reduc_ttc"n "moy.an_mt_rist_ttc"n "moy.an_mt_mvt_domicilie_mktg"n "moy.an_nb_mvt_M_domicilie_mktg"n top_produit_Epargne top_ouverture_reclam age_classe Flag_bq_principale flag_univers_detenus csp_1 SGMT_FIDELITE situ_fam_1 /
SELECTION=STEPWISE
SLE=0.05
SLS=0.05
INCLUDE=0
LINK=LOGIT
OUTROC=_PROB_
ALPHA=95
EXPEST
PARMLABEL
CORRB
NOPRINT
;
OUTPUT OUT=WORK.PREDLogRegPredictions(LABEL="Statistiques et prédictions de régression logistique pour WORK.APPEND_TABLE" WHERE=(NOT __FLAG))
PREDPROBS=INDIVIDUAL;
RUN;
QUIT;
%end;
%mend;
DATA WORK.PREDLogRegPredictions;
set WORK.PREDLogRegPredictions;
TOP_CREDIT_HABITAT_2017=__DEP;
_FROM_=__DEP;
DROP __DEP;
DROP __FLAG;
RUN ;
QUIT ;
Thank you in advance
回答1:
If you're trying to do a bootstrap algorithm or something similar to that, the seminal paper on the topic is David Cassell's Don't be LOOPy from the 2007 SGF. In broad strokes, this describes the "old" way to do this (involving a loop, where you sample a new sample and then perform an analysis 50 times; and the new way, where you use PROC SURVEYSELECT
with the rep
option.
From the paper, the example:
proc surveyselect data=YourData out=outboot
seed=30459584
method=urs samprate=1 outhits
rep=1000;
run;
This generates a dataset with a Replicate
variable, which you can then use as a by
variable in most analyses. This then performs the analysis separately for each value of the variable, which is presumably what you want. You can use the various options on proc surveyselect
to get the samples you want (sample size/rate, method of sampling, etc.)
If you're trying just to split your dataset up into chunks so you can either do a smaller analysis (as perhaps it might take too long to run the big one) or to do test and validation subsamples, but don't care about how nicely random things are, you can just add a variable in the datastep like so:
data for_regression;
set your_data;
sample_group = mod(_n_,50);
run;
proc sort data=for_regression;
by sample_group;
run;
And then you have 50 groups; you can sort by something random first if you prefer them be more "randomized" and don't think they are now, but PROC SURVEYSELECT
is usually better for that sort of thing ultimately.
来源:https://stackoverflow.com/questions/49976788/how-to-loop-a-logistic-regression-n-number-of-times