I have an sqlite database full of huge number of URLs and it\'s taking huge amount of diskspace, and accessing it causes many disk seeks and is slow. Average URL path length is
Abstract:
A common problem of large scale search engines and web spiders is how to handle a huge number of encountered URLs. Traditional search engines and web spiders use hard disk to store URLs without any compression. This results in slow performance and more space requirement. This paper describes a simple URL compression algorithm allowing efficient compression and decompression. The compression algorithm is based on a delta encoding scheme to extract URLs sharing common prefixes and an AVL tree to get efficient search speed. Our results show that the 50 % of size reduction is achieved. 1.
-- Kasom Koht-arsa Department of Computer Engineering.
Resource