Sklearn does few tweaks in the implementation of its version of TFIDF vectorizer, so to replicate the exact results you would need to add following things to your custom imp