windows 10 pro office 2019 pro office 365 pro windows 10 home windows 10 enterprise office 2019 home and business office 2016 pro windows 10 education visio 2019 microsoft project 2019 microsoft project 2016 visio professional 2016 windows server 2012 windows server 2016 windows server 2019 Betriebssysteme office software windows server https://softhier.com/ instagram takipçi instagram beğeni instagram görüntüleme instagram otomatik beğeni facebook beğeni facebook sayfa beğenisi facebook takipçi twitter takipçi twitter beğeni twitter retweet youtube izlenme youtube abone instagram

Creating POST request to scrape website with python where no network form data changes

Asked By: Anonymous

I am scraping a website that dynamically renders with javascript. The urls don’t change when hitting the > button So I have been trying to look at the inspector in the network section and more specifically the "General" section for the "Request Url" and the "Request Method" as well as in the "Form Data" section looking for any sort of ID that could be unique to distinguish each successive page. However when recording a log of clicking the > button from page to page the "Form Data" data seems to be the same each time (See images):
enter image description here
enter image description here

Currently my code doesn’t incorporate this method because I can’t see it helping until I can find a unique identifier in the "Form Data" section. However, I can show my code if helpful. In essence it just pulls the first page of data over and over again in my while loop even though I’m using a driver with selenium and using driver.find_elements_by_xpath("xpath of > button").click() before trying to get the data with BeautifulSoup.

(Updated code see comments)

_x000D_

_x000D_

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
from pandas import *
masters_list = []


def extract_info(html_source):
    # html_source will be inner HTMl of table
    global lst
    soup = BeautifulSoup(html_source, 'html.parser')
    lst = soup.find('tbody').find_all('tr')[0]
    masters_list.append(lst)

    # i am printing just id because it's id set as crypto name you have to do more scraping to get more info


chrome_driver_path = '/Users/Justin/Desktop/Python/chromedriver'
driver = webdriver.Chrome(executable_path=chrome_driver_path)
url = 'https://cryptoli.st/lists/fixed-supply'
driver.get(url)
loop = True

while loop:  # loop for extrcting all 120 pages
    crypto_table = driver.find_element(By.ID, 'DataTables_Table_0').get_attribute(
        'innerHTML')  # this is for crypto data table

    extract_info(crypto_table)

    paginate = driver.find_element(
        By.ID, "DataTables_Table_0_paginate")  # all table pagination
    pages_list = paginate.find_elements(By.TAG_NAME, 'li')
    # we clicking on next arrow sign at last not on 2,3,.. etc anchor link
    next_page_link = pages_list[-1].find_element(By.TAG_NAME, 'a')

    # checking is there next page available
    if "disabled" in next_page_link.get_attribute('class'):
        loop = False

    pages_list[-1].click()  # if there next page available then click on it
df = pd.DataFrame(masters_list)
print(df)
df.to_csv("crypto_list.csv")
driver.quit()

_x000D_

_x000D_

x000D

enter image description here


Solution

Answered By: Anonymous

I am using my own code to show how i am getting the table i add explanation as comment for important line

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

def extract_info(html_source):
    soup = BeautifulSoup(html_source,'html.parser') #html_source will be inner HTMl of table 
    lst = soup.find('tbody').find_all('tr')
    for i in lst:
        print(i.get('id')) # i am printing just id because it's id set as crypto name you have to do more scraping to get more info



driver = webdriver.Chrome()
url = 'https://cryptoli.st/lists/fixed-supply'
driver.get(url)
loop = True

while loop: #loop for extrcting all 120 pages 
    crypto_table = driver.find_element(By.ID,'DataTables_Table_0').get_attribute('innerHTML') # this is for crypto data table 

    print(extract_info(crypto_table))

    paginate = driver.find_element(By.ID, "DataTables_Table_0_paginate") # all table pagination 
    pages_list  = paginate.find_elements(By.TAG_NAME,'li')
    next_page_link = pages_list[-1].find_element(By.TAG_NAME,'a') # we clicking on next arrow sign at last not on 2,3,.. etc anchor link

    if "disabled" in next_page_link.get_attribute('class'): # checking is there next page available 
        loop = False

    pages_list[-1].click() # if there next page available then click on it 

so main answer of your question is when you click on button, selenium update the page then you can use driver.page_source to get updated html. some times (*not this url) page can have ajax request which can take some time so you have to wait till the selenium load the full page.

techinplanet staff

Porno Gratuit Porno Français Adulte XXX Brazzers Porn College Girls Film érotique Hard Porn Inceste Famille Porno Japonais Asiatique Jeunes Filles Porno Latin Brown Femmes Porn Mobile Porn Russe Porn Stars Porno Arabe Turc Porno caché Porno de qualité HD Porno Gratuit Porno Mature de Milf Porno Noir Regarder Porn Relations Lesbiennes Secrétaire de Bureau Porn Sexe en Groupe Sexe Gay Sexe Oral Vidéo Amateur Vidéo Anal

Windows 10 Kaufen Windows 10 Pro Office 2019 Kaufen Office 365 Lizenz Windows 10 Home Lizenz Office 2019 Home Business Kaufen windows office 365 satın al follower kaufen instagram follower kaufen porno