Hunter的大杂烩 技术学习笔记

2020-05-10

centos7.5 使用python3 selenium爬虫

Filed under: 技术话题 — hunter @ 10:14 pm

不要使用google-chrome-stable_current_x86_64.rpm,因为centos7.5仓库中的chromedriver是79的,而最新的google-chrome是80,两者匹配不起来

  1.  安装
    1.  yum install python3
    2. yum install chromedriver
    3. yum install chromium
    4. pip3 install selenium
  2. 测试程序

[code language=”python”]

# -*- coding: utf-8 -*

from logging import getLogger
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from scrapy.http import HtmlResponse
from selenium.common.exceptions import TimeoutException
import time

class ChromeTest:
def __init__(self):
self.chrome_options = webdriver.ChromeOptions()
#self.chrome_options.binary_location = "/usr/lib64/chromium-browser/headless_shell"
self.chrome_options.add_argument(‘–headless’)

self.timeout = 20
self.browser = webdriver.Chrome(chrome_options=self.chrome_options, executable_path="/usr/bin/chromedriver")
self.browser.set_window_size(1400, 700)
self.browser.set_page_load_timeout(self.timeout)
self.wait = WebDriverWait(self.browser, self.timeout)

def test(self):
try:
self.browser.get(‘https://search.taobao.com/’)
print("[1] %s" % self.browser.current_url)
self.wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, ‘#J_seckill > div’))) # 等待秒杀模块加载

# body = str.encode(browser.page_source)
body1 = str.encode(self.browser.page_source)
print("[2] %s" % self.browser.current_url)
next_link = self.wait.until(
EC.presence_of_element_located((By.CLASS_NAME, ‘pn-next’)))
next_link_class = next_link.get_attribute("class")
if next_link_class.find(‘disabled’) < 0:
next_click = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME, ‘pn-next’)))
next_click.click()

print("[3] %s" % self.browser.current_url)
except Exception as e:
print("[err] %s" % str(e))

self.browser.quit()

if __name__ == "__main__":
t = ChromeTest()
t.test()

[/code]

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress