Project
1.AML
有label的: - down sampling /xgboost/Hql
无label的: - Autoencoder
2.CRANE: 改正features / add new features
3. Branchpiitsstop
- R/R shiny/Xgboost explainer/Shap value
4. Spark
- 改写pyspark
- Audit report 重新clustering (LDA)
hql和sql 的区别:https://blog.csdn.net/qq_28633249/article/details/77884062
项目用到的算法:
Xgboost(原理 https://zhuanlan.zhihu.com/p/92229766/调参 https://zhuanlan.zhihu.com/p/29649128);
boosting/bagging/stacking https://zhuanlan.zhihu.com/p/41809927;Decisoin tree;Autoencoder;LDA
机器学习算法
1.常用算法
LR https://zhuanlan.zhihu.com/p/40994642
SVM https://zhuanlan.zhihu.com/p/84796233
GBDT
Decision tree https://blog.csdn.net/sinat_30353259/article/details/80917362(CART/IDR3/C4.5)
random forest
LightGBM https://zhuanlan.zhihu.com/p/99069186
GBDT/DT https://zhuanlan.zhihu.com/p/81368182
https://zhuanlan.zhihu.com/p/34534004
2. 常用异常检测算法
Isolation forest https://zhuanlan.zhihu.com/p/27777266
dbscan https://zhuanlan.zhihu.com/p/88747614
autoencoder https://blog.csdn.net/Jasminexjf/article/details/88720999
3. 常用图概念 https://zhuanlan.zhihu.com/p/28298952
Pagerank
autority
hub score
4. 聚类 https://zhuanlan.zhihu.com/p/37381630
神经网络
Autoencoder:https://blog.csdn.net/Jasminexjf/article/details/88720999
CNN:https://zhuanlan.zhihu.com/p/44255667
RNN LSTM https://zhuanlan.zhihu.com/p/88892937
参数/如何调参:https://zhuanlan.zhihu.com/p/45091568
神经网络优化算法总结:https://zhuanlan.zhihu.com/p/89957194
LDA:https://zhuanlan.zhihu.com/p/92229766
基本排序算法 https://blog.csdn.net/weixin_39840982/article/details/100751141
树的遍历算法 https://zhuanlan.zhihu.com/p/70720129
Python:https://zhuanlan.zhihu.com/p/54430650
sql:https://zhuanlan.zhihu.com/p/38354000
pyspark:https://www.jianshu.com/p/7a8fca3838a4
一般流程
需求/数据- 做特征- 特征工程PCA/featuresel/建立新特征- 数据层面(downsampling/upsampling)-normalize/scaler -feature selection -train-val_test- model -metrics(auc/roc curve/precison/recall/f1 score) - overfitting/underfitting- explainer
PCA:https://zhuanlan.zhihu.com/p/77151308
roc曲线: https://www.zhihu.com/question/22844912/answer/246037337
shap ratio: https://zhuanlan.zhihu.com/p/85791430
特征选择:https://www.zhihu.com/question/28641663/answer/110165221
评估方式:https://zhuanlan.zhihu.com/p/106649884
https://www.zhihu.com/question/23259302/answer/527513387
Sparkml lib
来源:CSDN
作者:汪喵行
链接:https://blog.csdn.net/weixin_39840982/article/details/104574457