问题
I am new to tensorflow and machine learning. I am facing issues with writing a tensorflow code which does the text classification similar to one I tried using sklearn libraries. I am facing major issues with vectorising the dataset and providing the input to tensorflow layers.
I do remember being successful in one hot encoding the labels but the tensorflow layer ahead did not accept the created array. Please note, I have read majority of text clasification answered questions on stackoverflow but they are too specific or have complex needs to resolve. My problem case is too narrow and requires very basic solution.
It would be great help if anyone could tell me the steps or tensorflow code similar to my sklearn machine learning algorithm.
Dataset used is avaialable at : https://www.kaggle.com/virajgala/classifying-text
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
#Reading the csv dataset
df = pd.read_csv(('/Classifyimg_text.csv'), index_col=False).sample(frac=1)
#Splitting the dataset
train_data, test_data, train_labels, test_labels = train_test_split(df['sentence'], df['label'], test_size=0.2)
#Vectorization and Classification
streamline = Pipeline([('vect', TfidfVectorizer(max_features=int(1e8))),
('clf', SGDClassifier())]).fit(train_data, train_labels)
#Prediction
Output = streamline.predict(["This is my action to classify the text."])
回答1:
this question is a bit broad. Perhaps you can take a look at the tutorial posted on Tensorflow's website for binary text classification (positive and negative) and try to implement it. During the process, if you come across any problems or concepts that need further explanation, search StackOverflow to see if someone has asked a question similar to yours. If not, take the time to write a question following these guidelines so people with the ability to answer will have all the information they need. I hope this information gets you off to a good start and welcome to Stack Overflow!
回答2:
If you want to achieve seminal scores I'd rather use some embedder. Natural language is rather quite hyper-dimensional. Nowadays there's a lot of pretrained architectures. So, you simply encode your text to latent space and later train your model on those features. It's also much easier to apply resampling techniques, once you have numerical feature vector.
Myself, I mostly use LASER embedder from Facebook. Read more about it here. There's unofficial pypi package, which works just fine. Additionally, your model will be working on dozens of languages out-of-the-box, which is quite cute.
There's also BERT from Google, but the pretrained model is rather bare, so you have to push it a bit further first.
来源:https://stackoverflow.com/questions/58856515/how-to-do-text-classification-using-tensorflow