tokenize

Splitting comma separated string in a PL/SQL stored proc

梦想与她 提交于 2019-11-27 04:04:55
I've CSV string 100.01,200.02,300.03 which I need to pass to a PL/SQL stored procedure in Oracle. Inside the proc,I need to insert these values in a Number column in the table. For this, I got a working approach from over here: How to best split csv strings in oracle 9i [2) Using SQL's connect by level.]. Now,I've another requirement. I need to pass 2 CSV strings[equal in length] as input to PL/SQL stored proc.And, I need to break this string and insert each value from two CSV strings into two different columns in the table.Could you please let me know how to go about it? Example of CSV inputs

How do you parse a filename in bash?

天涯浪子 提交于 2019-11-27 02:32:02
问题 I have a filename in a format like: system-source-yyyymmdd.dat I'd like to be able to parse out the different bits of the filename using the "-" as a delimiter. 回答1: You can use the cut command to get at each of the 3 'fields', e.g.: $ echo "system-source-yyyymmdd.dat" | cut -d'-' -f2 source "-d" specifies the delimiter, "-f" specifies the number of the field you require 回答2: A nice and elegant (in my mind :-) using only built-ins is to put it into an array var='system-source-yyyymmdd.dat'

Tokenizing Error: java.util.regex.PatternSyntaxException, dangling metacharacter '*'

社会主义新天地 提交于 2019-11-27 01:23:23
I am using split() to tokenize a String separated with * following this format: name*lastName*ID*school*age % name*lastName*ID*school*age % name*lastName*ID*school*age I'm reading this from a file named "entrada.al" using this code: static void leer() { try { String ruta="entrada.al"; File myFile = new File (ruta); FileReader fileReader = new FileReader(myFile); BufferedReader reader = new BufferedReader(fileReader); String line = null; while ((line=reader.readLine())!=null){ if (!(line.equals("%"))){ String [] separado = line.split("*"); //SPLIT CALL names.add(separado[0]); lastNames.add

How to get rid of punctuation using NLTK tokenizer?

蓝咒 提交于 2019-11-26 22:30:23
问题 I'm just starting to use NLTK and I don't quite understand how to get a list of words from text. If I use nltk.word_tokenize() , I get a list of words and punctuation. I need only the words instead. How can I get rid of punctuation? Also word_tokenize doesn't work with multiple sentences: dots are added to the last word. 回答1: Take a look at the other tokenizing options that nltk provides here. For example, you can define a tokenizer that picks out sequences of alphanumeric characters as

Using Boost Tokenizer escaped_list_separator with different parameters

拥有回忆 提交于 2019-11-26 21:08:19
问题 Hello i been trying to get a tokenizer to work using the boost library tokenizer class. I found this tutorial on the boost documentation: http://www.boost.org/doc/libs/1 _36 _0/libs/tokenizer/escaped _list _separator.htm problem is i cant get the argument's to escaped _list _separator("","",""); but if i modify the boost/tokenizer.hpp file it work's. but that's not and ideal solution was wondering if there's anything i am missing to get diferent arguments into the escaped _list _separator. i

How to use a Lucene Analyzer to tokenize a String?

﹥>﹥吖頭↗ 提交于 2019-11-26 19:17:55
问题 Is there a simple way I could use any subclass of Lucene's Analyzer to parse/tokenize a String ? Something like: String to_be_parsed = "car window seven"; Analyzer analyzer = new StandardAnalyzer(...); List<String> tokenized_string = analyzer.analyze(to_be_parsed); 回答1: As far as I know, you have to write the loop yourself. Something like this (taken straight from my source tree): public final class LuceneUtils { public static List<String> parseKeywords(Analyzer analyzer, String field, String

Convert comma separated string to array in PL/SQL

冷暖自知 提交于 2019-11-26 18:40:49
How do I convert a comma separated string to a array? I have the input ' 1,2,3' , and I need to convert it into an array. Rob van Wijk Oracle provides the builtin function DBMS_UTILITY.COMMA_TO_TABLE . Unfortunately, this one doesn't work with numbers: SQL> declare 2 l_input varchar2(4000) := '1,2,3'; 3 l_count binary_integer; 4 l_array dbms_utility.lname_array; 5 begin 6 dbms_utility.comma_to_table 7 ( list => l_input 8 , tablen => l_count 9 , tab => l_array 10 ); 11 dbms_output.put_line(l_count); 12 for i in 1 .. l_count 13 loop 14 dbms_output.put_line 15 ( 'Element ' || to_char(i) || 16 '

Generating a custom Tokenizer for new TokenStream API using JFlex/ Java CC

廉价感情. 提交于 2019-11-26 18:30:29
问题 We are currently using Lucene 2.3.2 and want to migrate to 3.4.0 . We have our own custom Tokenizer generated using Java CC which has been in use ever since we started using Lucene and we want to continue with the same behavior. I appreciate pointers to any resources that deal with building a Tokenizer for new TokenStream API from grammar. UPDATE: I found the grammar used to generate StandardTokenizer at http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/analysis

Python - RegEx for splitting text into sentences (sentence-tokenizing) [duplicate]

早过忘川 提交于 2019-11-26 17:38:32
问题 This question already has answers here : Python split text on sentences (10 answers) Closed 9 months ago . I want to make a list of sentences from a string and then print them out. I don't want to use NLTK to do this. So it needs to split on a period at the end of the sentence and not at decimals or abbreviations or title of a name or if the sentence has a .com This is attempt at regex that doesn't work. import re text = """\ Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he

Looking for a clear definition of what a “tokenizer”, “parser” and “lexers” are and how they are related to each other and used?

ぃ、小莉子 提交于 2019-11-26 17:29:29
问题 I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a program will go through c/h source files to extract data declaration and definitions. I have been looking for examples and can find some info, but I really struggling to grasp the underlying concepts like grammar rules, parse trees and abstract syntax tree and how they interrelate to each other.