R: find largest common substring starting at the beginning

后端 未结 11 2733
星月不相逢
星月不相逢 2021-02-19 18:33

I\'ve got 2 vectors:

word1 <- \"bestelling\"   
word2 <- \"bestelbon\"

Now I want to find the largest common substring that starts at the

11条回答
  •  时光取名叫无心
    2021-02-19 18:49

    I realize I'm coming late to this party but determining pairwise alignment is a fundamental problem in biological research and there is already a package (or a package-family) that attacks this problem. The Bioconductor package named Biostrings is available (and it is big at least if you install all the default dependencies, so patience is needed in the install process). It returns S4 objects so different extraction functions are needed. This is perhaps a sledgehammer to extract a nut, but here's the code to give the desired result:

    install.packages("Biostrings", repo="http://www.bioconductor.org/packages/2.14/bioc/", dependencies=TRUE)
    library(Biostrings)
    psa1 <- pairwiseAlignment(pattern = c(word1) ,word2,type="local")
    psa1@pattern
    #[1] bestel 
    

    However, it is not set up to default to restriction of the match to alignment at the first character for both strings. We can hope @MartinMorgan will come along a fix my errors.

提交回复
热议问题