data-cleaning

custom validation error for two fields unique together django

蓝咒 提交于 2020-07-23 06:13:21
问题 i want to write my own validation error , for two fields unique together class MyModel(models.Model): name = models.CharField(max_length=20) second_field = models.CharField(max_length=10) #others class Meta: unique_together = ('name','second_field') and my forms.py class MyModelForm(forms.ModelForm): class Meta: model = MyModel fields = '__all__' error_messages= {#how to write my own validation error whenever `name and second_field` are unique together }: how to write my own validation error

removing stop words using spacy

二次信任 提交于 2020-07-05 11:41:05
问题 I am cleaning a column in my data frame , Sumcription, and am trying to do 3 things: Tokenize Lemmantize Remove stop words import spacy nlp = spacy.load('en_core_web_sm', parser=False, entity=False) df['Tokens'] = df.Sumcription.apply(lambda x: nlp(x)) spacy_stopwords = spacy.lang.en.stop_words.STOP_WORDS spacy_stopwords.add('attach') df['Lema_Token'] = df.Tokens.apply(lambda x: " ".join([token.lemma_ for token in x if token not in spacy_stopwords])) However, when I print for example: df.Lema

R - How to delete rows before a certain phrase and after a certain phrase in a single column

谁都会走 提交于 2020-06-28 05:11:37
问题 Hi I would like to delete rows before a certain phrase and then after the same (almost) phrase which appears later on. I guess another way to look at it would be keep only the data from the start and end of a certain section. My data is as follows: df <- data.frame(time = as.factor(c(1,2,3,4,5,6,7,8,9,10,11,12,13)), type = c("","","GMT:yyyy-mm-dd_HH:MM:SS_LT:2016-10-18_06:09:53","(K)","","","","(K)","(K)","","(K)","GMT:yyyy-mm-dd_HH:MM:SS_CAM:2016-10-18_06:20:03",""), names = c("J","J","J","J

How to remove the index name in pandas dataframe?

浪尽此生 提交于 2020-06-13 11:51:14
问题 In my dataframe, I get a '2' written over my index column's name. when I check for the columns name it doesn't show up there but as df.columns give this as output. I don't know how to remove that '2' from my dataset. I have tried removing index name but it hasn't solved my issue. df.columns ==> Output Index(['name', 'census 1981', 'census 1998', 'estimate 2000', 'calculation 2010', 'annual growth', 'latitude', 'longitude', 'parent division', 'name variants'], dtype='object', name=2) I expect

How to remove the index name in pandas dataframe?

微笑、不失礼 提交于 2020-06-13 11:50:07
问题 In my dataframe, I get a '2' written over my index column's name. when I check for the columns name it doesn't show up there but as df.columns give this as output. I don't know how to remove that '2' from my dataset. I have tried removing index name but it hasn't solved my issue. df.columns ==> Output Index(['name', 'census 1981', 'census 1998', 'estimate 2000', 'calculation 2010', 'annual growth', 'latitude', 'longitude', 'parent division', 'name variants'], dtype='object', name=2) I expect

Pandas | Group by with all the values of the group as comma separated

﹥>﹥吖頭↗ 提交于 2020-04-09 19:36:33
问题 As per application requirement, I need to show all the data which is part of group by in comma separated format so the admin can take decision, I am new to Python and not sure how to do it. Sample reproducible data import pandas as pd compnaies = ['Microsoft', 'Google', 'Amazon', 'Microsoft', 'Facebook', 'Google'] products = ['OS', 'Search', 'E-comm', 'X-box', 'Social Media', 'Android'] df = pd.DataFrame({'company' : compnaies, 'product':products }) -------------------------------------------

R separate words from numbers in string

我只是一个虾纸丫 提交于 2020-03-26 04:53:32
问题 I need to clean up some data strings that have words and numbers or just numbers. below is a toy sample library(tidyverse) c("555","Word 123", "two words 123", "three words here 123") %>% sub("(\\w+) (\\d*)", "\\1|\\2", .) The result is this: [1] "555" "Word|123" "two|words 123" "three|words here 123" but I want to place the '|' before the last set of numbers like shown below [1] "|555" "Word|123" "two words|123" "three words here|123" 回答1: We can use sub to match zero or more spaces ( \\s* )

How to remove carriage return in a dataframe

喜欢而已 提交于 2020-03-17 11:26:31
问题 I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r' attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \r attached, it always creates a new row. id 29 location Uttar Pradesh\r country_name India total_deaths 20 I want to remove \r . I tried df.replace({'\r': ''}, regex=True) . It isn't

How to remove carriage return in a dataframe

馋奶兔 提交于 2020-03-17 11:25:44
问题 I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r' attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \r attached, it always creates a new row. id 29 location Uttar Pradesh\r country_name India total_deaths 20 I want to remove \r . I tried df.replace({'\r': ''}, regex=True) . It isn't

joining on inexact strings in R

≯℡__Kan透↙ 提交于 2020-03-05 04:01:50
问题 I am looking to join two tables.. however the data I am looking to join on does not match exactly.. joining on NFL player names.. data sets below.. > dput(att75a) structure(list(rusher_player_name = c("A.Ekeler", "A.Jones", "A.Kamara", "A.Mattison", "A.Peterson", "B.Hill"), mean_epa = c(-0.110459963350783, 0.0334332018597805, -0.119488111742492, -0.155261835310445, -0.123485646124451, -0.0689611296359916), success_rate = c(0.357664233576642, 0.40495867768595, 0.401129943502825, 0