word | 易学教程

PHP word index, performance and reasonable results

阅读更多关于 PHP word index, performance and reasonable results

问题 I'm currently working on an indexer for a search feature. The indexer will work over data from "fields". Fields looks like: Field_id Field_type Field_name Field_Data - 101 text Name Intel i7 - 102 integer Cores 4 physical, 4 virtual - 103 select Vendor Intel - 104 multitext Description The i7 is intel's next gen range of cpus. The indexer would generate the following results/index: Keyword Occurrences - intel 101, 103, 104 - i7 101, 104 - physical 102 - virtual 102 - next 104 - gen 104 -

PHP - Search String for a Specific Word Array and Match with an Optional + or -

阅读更多关于 PHP - Search String for a Specific Word Array and Match with an Optional + or -

问题 I need to search a string for a specific word and have the match be a variable. I have a specific list of words in an array: $names = array ("Blue", "Gold", "White", "Purple", "Green", "Teal", "Purple", "Red"); $drag = "Glowing looks to be +Blue."; $match = "+Blue"; echo $match +Blue What I need to do is search $drag with the $names and find matches with an option + or - character and have $match become the result. 回答1: Build a regular expression by joining the terms of the array with | , and

How to make this random text generator more efficient in Python?

阅读更多关于 How to make this random text generator more efficient in Python?

问题 I'm working on a random text generator -without using Markov chains- and currently it works without too many problems. Firstly, here is my code flow: Enter a sentence as input -this is called trigger string, is assigned to a variable- Get longest word in trigger string Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase- Return the longest sentence that has the word I spoke about in step 3 Append the sentence in Step 1 and Step4

Elastic Search中normalization和分词器

阅读更多关于 Elastic Search中normalization和分词器

为key_words提供更加完整的倒排索引。如：时态转化（like | liked），单复数转化（man | men），全写简写（china | cn），同义词（small | little）等。如：china 搜索时，如果条件为cn是否可搜索到。如：dogs，搜索时，条件为dog是否可搜索到数据。如果可以使用简写（cn）或者单复数（dog&dogs）搜索到想要的结果，那么称为搜索引擎normalization人性化。 normalization是为了提升召回率的（recall），就是提升搜索能力的。 normalization是配合分词器(analyzer)完成其功能的。分词器的功能就是处理Document中的field的。就是创建倒排索引过程中用于切分field数据的。如：I think dogs is human’s best friend.在创建倒排索引的时候，使用分词器实现数据的切分。上述的语句切分成若干的词条，分别是： think dog human best friend。常见搜索条件有：think、 human、 best、 friend，很少使用is、a、the、i这些数据作为搜索条件。 1 ES默认提供的常见分词器要切分的语句：Set the shape to semi-transparent by calling set_trans(5)

Complex XSLT split?

阅读更多关于 Complex XSLT split?

问题 Is it possible to split a tag at lower to upper case boundaries i.e. for example, tag 'UserLicenseCode' should be converted to 'User License Code' so that the column headers look a little nicer. I've done something like this in the past using Perl's regular expressions, but XSLT is a whole new ball game for me. Any pointers in creating such a template would be greatly appreciated! Thanks Krishna 回答1: Using recursion, it is possible to walk through a string in XSLT to evaluate every character.

Emacs regular expression: what \< and \> can do that \b cannot do?

阅读更多关于 Emacs regular expression: what \< and \> can do that \b cannot do?

问题 Regexp Backslash - GNU Emacs Manual says that \< matches at the beginning of a word, \> matches at the end of a word, and \b matches a word boundary. \b is just as in other non-Emacs regular expressions. But it seems that \< and \> are particular to Emacs regular expressions. Are there cases where \< and \> are needed instead of \b ? For instance, \bword\b would match the same as \<word\> would, and the only difference is that the latter is more readable. 回答1: You can get unexpected results

Unicode-ready wordsearch - Question

阅读更多关于 Unicode-ready wordsearch - Question

Is this code OK? I don't really have a clue which normalization-form I should us (the only thing I noticed is with NFD I get a wrong output). #!/usr/local/bin/perl use warnings; use 5.014; use utf8; binmode STDOUT, ':encoding(utf-8)'; use Unicode::Normalize; use Unicode::Collate::Locale; use Unicode::GCString; my $text = "my taxt täxt"; my %hash; while ( $text =~ m/(\p{Alphabetic}+(?:'\p{Alphabetic}+)?)/g ) { #' my $word = $1; my $NFC_word = NFC( $word ); $hash{$NFC_word}++; } my $collator = Unicode::Collate::Locale->new( locale => 'DE' ); for my $word ( $collator->sort( keys %hash ) ) { my

基于word分词提供的文本相似度算法来实现通用的网页相似度检测

阅读更多关于基于word分词提供的文本相似度算法来实现通用的网页相似度检测

实现代码：基于word分词提供的文本相似度算法来实现通用的网页相似度检测运行结果：检查的博文数：128 1、检查博文：192本软件著作用词分析（五）用词最复杂99级，相似度分值：Simple=0.968589 Cosine=0.955598 EditDistance=0.916884 EuclideanDistance=0.00825 ManhattanDistance=0.001209 Jaccard=0.859838 JaroDistance=0.824469 JaroWinklerDistance=0.894682 SørensenDiceCoefficient=0.924638 SimHashPlusHammingDistance=0.976563 博文地址1： http://my.oschina.net/apdplat/blog/388816 博文地址2： http://yangshangchuan.iteye.com/blog/2194214 2、检查博文：APDPlat的系统启动和关闭流程剖析，相似度分值：Simple=0.837996 Cosine=0.711649 EditDistance=0.55001 EuclideanDistance=0.003669 ManhattanDistance=0.000992 Jaccard=0.549422

SOLR4.2+NUTCH1.6

阅读更多关于 SOLR4.2+NUTCH1.6

1、SOLR4.2集成NUTCH1.6 wget http://archive.apache.org/dist/lucene/solr/4.2.0/solr-4.2.0.tgz tar -xzvf solr-4.2.0.tgz cd solr-4.2.0/example 复制 nutch 的 conf 目录中的 schema-solr4.xml 文件到 solr/collection1/conf 目录，改名为 schema.xml ，覆盖原来文件修改 solr/collection1/conf/schema.xml ，在 <fields> 下增加： <field name="_version_" type="long" indexed="true" stored="true"/> 2、给SOLR4.2配置中文分词器word分词参考 https://github.com/ysc/word 的 Solr插件部分 3、运行SOLR4.2 启动 SOLR4.2 服务器: java -jar start.jar & SOLR4.2 Web 界面: http://host2:8983 4、运行NUTCH提交索引运行 solrindex命令 : bin/nutch solrindex http://host2:8983/solr data/crawldb -linkdb data

Pass multi-word arguments to a bash function

阅读更多关于 Pass multi-word arguments to a bash function

Inside a bash script function, I need to work with the command-line arguments of the script, and also with another list of arguments. So I'm trying to pass two argument lists to a function, the problem is that multi-word arguments get split. function params() { for PARAM in $1; do echo "$PARAM" done echo . for ITEM in $2; do echo "$ITEM" done } PARAMS="$@" ITEMS="x y 'z t'" params "$PARAMS" "$ITEMS" calling the script gives me myscript.sh a b 'c d' a b c d . x y 'z t' Since there are two lists they must be passed as a whole to the function, the question is, how to iterate the elements while