data-processing

Lexicon dictionary for synonym words

别说谁变了你拦得住时间么 提交于 2019-12-04 09:10:12
There are few dictionaries available for natural language processing. Like positive, negative words dictionaries etc. Is there any dictionary available which contains list of synonym for all dictionary words? Like for nice synonyms: enjoyable, pleasant, pleasurable, agreeable, delightful, satisfying, gratifying, acceptable, to one's liking, entertaining, amusing, diverting, marvellous, good; alvas Although WordNet is a good resource to start for finding synonym, one must note its limitations, here's an example with python API in NLTK library: Firstly, words have multiple meanings (i.e. senses)

Remove rows from dataframe that contains only 0 or just a single 0

孤街醉人 提交于 2019-12-04 05:31:34
I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns. Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary. I have tried to paste some of my data here with the results I want to obtain. unfiltered: ID GeneName DU145small DU145total PC3small PC3total 1 MIR22HG 33221.5 1224.55 2156.43 573.315 2 MIRLET7E 87566.1 7737.99 25039.3 16415.6 3 MIR612 0 0 530.068 0 4 MIR218-1 0 0

What is the difference between mini-batch vs real time streaming in practice (not theory)?

只谈情不闲聊 提交于 2019-12-02 23:31:28
What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame whereas real time streaming is more like do something as the data arrives but my biggest question is why not have mini batch with epsilon time frame (say one millisecond) or I would like to understand reason why one would be an effective solution than other? I recently came across one example where mini-batch (Apache Spark) is used for Fraud detection and real time streaming (Apache Flink) used for Fraud Prevention.

Data processing with adding columns dynamically in Python Pandas Dataframe

穿精又带淫゛_ 提交于 2019-12-02 15:39:26
问题 I have the following problem. Lets say this is my CSV id f1 f2 f3 1 4 5 5 1 3 1 0 1 7 4 4 1 4 3 1 1 1 4 6 2 2 6 0 .......... So, I have rows which can be grouped by id. I want to create a csv like below as an output. f1 f2 f3 f1_n f2_n f3_n f1_n_n f2_n_n f3_n_n f1_t f2_t f3_t 4 5 5 3 1 0 7 4 4 1 4 6 So, I want to be able to chose the number of rows I will grab to convert into columns (always starting from the first row of an id). In this case I grabbed 3 rows. I will also then skip one or

Model is not learning

巧了我就是萌 提交于 2019-12-02 07:43:54
问题 I am trying to train a Tensor-flow js model on images coming in from my web cam. Basically I'm trying to recreate the pac-man tensor-flow game. The model isn't converging and is pretty much useless after training. I have a feeling its how I'm prepping the data. Grabbing the image from the canvas function takePhoto(label) { let canv = document.getElementById("canv") let cont = canv.getContext("2d") cont.drawImage(video, 0, 0, width, height) let data = tf.browser.fromPixels(canv, 3) data

Model is not learning

白昼怎懂夜的黑 提交于 2019-12-02 05:35:08
I am trying to train a Tensor-flow js model on images coming in from my web cam. Basically I'm trying to recreate the pac-man tensor-flow game. The model isn't converging and is pretty much useless after training. I have a feeling its how I'm prepping the data. Grabbing the image from the canvas function takePhoto(label) { let canv = document.getElementById("canv") let cont = canv.getContext("2d") cont.drawImage(video, 0, 0, width, height) let data = tf.browser.fromPixels(canv, 3) data.toFloat().div(tf.scalar(127)).sub(tf.scalar(1)) return data } function addExample(label){ let data =

how to use pandas filter with IQR?

给你一囗甜甜゛ 提交于 2019-11-30 10:56:54
Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested will be appreciated. As far as I know, the most compact notation seems to be brought by the query method. # Some test data np.random.seed(33454) df = ( # A standard distribution pd.DataFrame({'nb': np.random.randint(0, 100, 20)}) # Adding some outliers .append(pd.DataFrame({'nb': np.random.randint(100, 200, 2)})) # Reseting the index .reset_index(drop=True) ) # Computing IQR Q1 = df['nb'].quantile(0.25) Q3 = df['nb']

how to use pandas filter with IQR?

对着背影说爱祢 提交于 2019-11-29 16:53:03
问题 Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested will be appreciated. 回答1: As far as I know, the most compact notation seems to be brought by the query method. # Some test data np.random.seed(33454) df = ( # A standard distribution pd.DataFrame({'nb': np.random.randint(0, 100, 20)}) # Adding some outliers .append(pd.DataFrame({'nb': np.random.randint(100, 200, 2)})) #

How to read 4GB file on 32bit system

六眼飞鱼酱① 提交于 2019-11-29 11:44:38
In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit MS Windows or on 64bit with small amount of RAM (min 4GB). You can also assume that processing of these lines isn't bottleneck. In current solution I read that file by ifstream and copy to some string. Here is snippet how it looks like. std::ifstream file(filename_xml.c_str()); uintmax_t m_numLines = 0; std::string str; while (std::getline(file, str)) { m_numLines++; } And ok, that's working but to

Hibernate out of memory exception while processing large collection of elements

雨燕双飞 提交于 2019-11-29 07:40:21
I am trying to process collection of heavy weight elements (images). Size of collection varies between 8000 - 50000 entries. But for some reason after processing 1800-1900 entries my program falls with java.lang.OutOfMemoryError: Java heap space. In my understanding each time when I call session.getTransaction().commit() program should free heap memory, but looks like it never happens. What do I do wrong? Here is the code: private static void loadImages( LoadStrategy loadStrategy ) throws IOException { log.info( "Loading images for: " + loadStrategy.getPageType() ); Session session =