I know that Google’s search algorithm is mainly based on pagerank. However, it also does analysis and uses the structure of the document H1, H2,
H1
H2
I have found this paper:
A New Study on Using HTML Structures to Improve Retrieval
however it is an old paper 1999,
still looking for more recent papers.