I know that Google’s search algorithm is mainly based on pagerank. However, it also does analysis and uses the structure of the document H1
, H2
,
You can also try searching the 'Computer Science' section of arXiv: http://arxiv.org for "search engine" and the various terms that others have suggested.
It contains many academic papers, all freely available... hopefully some of them will be relevant to your research. (Of course the caveat of validating any paper's content applies.)