Is there a string similarity measure available in Python+Sqlite, for example with the sqlite3
module?
Example of use case:
import sqli
Here is a ready-to-use example test.py
:
import sqlite3
db = sqlite3.connect(':memory:')
db.enable_load_extension(True)
db.load_extension('./spellfix') # for Linux
#db.load_extension('./spellfix.dll') # <-- UNCOMMENT HERE FOR WINDOWS
db.enable_load_extension(False)
c = db.cursor()
c.execute('CREATE TABLE mytable (id integer, description text)')
c.execute('INSERT INTO mytable VALUES (1, "hello world, guys")')
c.execute('INSERT INTO mytable VALUES (2, "hello there everybody")')
c.execute('SELECT * FROM mytable WHERE editdist3(description, "hel o wrold guy") < 600')
print c.fetchall()
# Output: [(1, u'hello world, guys')]
Important note: The distance editdist3 is normalized so that
the value of 100 is used for insertion and deletion and 150 is used for substitution
Download https://sqlite.org/2016/sqlite-src-3110100.zip, https://sqlite.org/2016/sqlite-amalgamation-3110100.zip and unzip them
Replace C:\Python27\DLLs\sqlite3.dll
by the new sqlite3.dll from here. If skipping this, you'd get a sqlite3.OperationalError: The specified procedure could not be found
later
Run:
call "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat"
or
call "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" x64
cl /I sqlite-amalgamation-3110100/ sqlite-src-3110100/ext/misc/spellfix.c /link /DLL /OUT:spellfix.dll
python test.py
(With MinGW, it would be: gcc -g -shared spellfix.c -I ~/sqlite-amalgation-3230100/ -o spellfix.dll
)
(based on this answer)
apt-get -y install unzip build-essential libsqlite3-dev
wget https://sqlite.org/2016/sqlite-src-3110100.zip
unzip sqlite-src-3110100.zip
gcc -shared -fPIC -Wall -Isqlite-src-3110100 sqlite-src-3110100/ext/misc/spellfix.c -o spellfix.so
python test.py
If your distribution's Python is a bit old, it will require another method. As sqlite3
module is built-in in Python, it seems not straightforward to upgrade it (pip install --upgrade pysqlite
would only upgrade the pysqlite module, not the underlying SQLite library). Thus this method works for example if import sqlite3; print sqlite3.sqlite_version
is 3.8.2:
wget https://www.sqlite.org/src/tarball/27392118/SQLite-27392118.tar.gz
tar xvfz SQLite-27392118.tar.gz
cd SQLite-27392118 ; sh configure ; make sqlite3.c ; cd ..
gcc -g -fPIC -shared SQLite-27392118/ext/misc/spellfix.c -I SQLite-27392118/src/ -o spellfix.so
python test.py # [(1, u'hello world, guys')]