How to retrieve the exact HTML as in a browser

前端 未结 2 597
无人共我
无人共我 2021-01-22 21:24

I\'m using a Python script to render web pages and retrieve their HTML\'s. It works fine with most of the pages, but with some of them the HTML retrieved is incomplete. And I do

相关标签:
2条回答
  • 2021-01-22 21:40

    If you want headless browsing you can combine phantomjs with selenium, the following gets all the source:

    url = "http://www.pullandbear.com/es/es/mujer/vestidos-c29016.html"
    from selenium import webdriver
    
    dr = webdriver.PhantomJS()
    dr.get(url)
    
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    element = WebDriverWait(dr, 5).until(
        EC.presence_of_element_located((By.CLASS_NAME, "grid_itemContainer"))
    )
    

    Just using selenium without the WebDriverWait did not always return the full source, adding the wait until the a tags with the grid_itemContainer class were visible makes sure the html has been generated, the xpath below returns all your links:

    print([a.get_attribute('href') for a in dr.find_elements_by_xpath("//a[@class='grid_itemContainer']")])
    
    [u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-detalle-crochet-pechera-c29016p100064004.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-bordado-escote-pico-c29016p100123006.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-manga-larga-espalda-abierta-c29016p100147503.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-hombros-descubiertos-beads-c29016p100182001.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-jacquard-capa-c29016p100255505.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-vaquero-eyelets-c29016p100336010.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-liso-oversized-c29016p100289013.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-liso-oversized-c29016p100289013.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-camisero-oversized-c29016p100036616.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-pico-c29016p100166506.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-estampado-rayas-c29016p100234507.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-manga-corta-liso-c29016p100262008.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-largo-cuello-halter-liso-c29016p100036162.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-capa-jacquard-%C3%A9tnico-c29016p100259002.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-largo-cuello-halter-rayas-c29016p100036161.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-capa-jacquard-tri%C3%A1ngulo-c29016p100255506.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-marinero-escote-bardot-c29016p100259003.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-rayas-escote-espalda-c29016p100262007.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cruzado-c29016p100216013.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-flores-canes%C3%BA-bordado-c29016p100203011.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-bordados-c29016p100037160.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-flores-volante-c29016p100216014.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-lencero-c29016p100104515.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuadros-detalle-encaje-c29016p100216016.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-drapeado-abertura-bajo-c29016p100129011.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-drapeado-abertura-bajo-c29016p100129011.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-vaquero-bolsillo-plastr%C3%B3n-c29016p100036822.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-rayas-bajo-desigual-c29016p100123010.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-camisero-vaquero-c29016p100036575.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-midi-estampado-rayas-c29016p100189011.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-midi-rayas-manga-3-4-c29016p100149507.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-midi-canal%C3%A9-ajustado-c29016p100149508.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-estampado-bolsillos-c29016p100212503.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-corte-evas%C3%A9-bolsillos-c29016p100189012.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-vaquero-camisero-cuadros-c29016p100036624.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/pichi-vaquero-c29016p100073526.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-estampado-geom%C3%A9trico-cuello-halter-c29016p100037021.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-perkins-manga-larga-c29016p100036882.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-perkins-manga-larga-c29016p100036882.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-perkins-manga-larga-c29016p100036882.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-perkins-manga-larga-c29016p100036882.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-jacquard-evas%C3%A9-c29016p100037207.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cr%C3%AApe-evas%C3%A9-estampado-flores-manga-3-4-c29016p100036932.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cr%C3%AApe-evas%C3%A9-estampado-flores-manga-3-4-c29016p100037280.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-perkins-parche-c29016p100037464.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cr%C3%AApe-evas%C3%A9-liso-manga-3-4-c29016p100036930.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cr%C3%AApe-evas%C3%A9-liso-manga-3-4-c29016p100036930.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-alto-liso-c29016p100037156.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-alto-estampado-flores-c29016p100036921.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-cuello-alto-estampado-corbatero-c29016p100037155.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-largo-manga-sisa-c29016p100170011.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-largo-manga-sisa-rayas-c29016p100170012.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-manga-acampanada-c29016p100149506.html', u'http://www.pullandbear.com/es/es/mujer/vestidos/vestido-punto-espalda-abierta-c29016p100195504.html']
    

    If you want to write the source:

    with open("out.html", "w") as f:
           f.write(dr.page_source)
    
    0 讨论(0)
  • 2021-01-22 21:58

    I think you can use http://ghost-py.readthedocs.org/en/latest/ for this case. It's loads web page like real browser and run JavaScript. Also you can try PhantomJS for example, but it written on nodeJS.

    0 讨论(0)
提交回复
热议问题