如何设置爬虫的异常处理？（代码示例）

在编写爬虫程序时，异常处理是一个不可或缺的部分。它不仅可以保护爬虫免受中断和封禁，还能在遇到问题时提供更多的调试信息。本文将介绍几种常见的异常处理策略，并提供相应的Python代码示例。

1. 网络请求异常处理

在爬虫过程中，网络请求可能会遇到各种问题，如超时、连接错误等。以下是如何处理这些网络请求异常的代码示例：

import requests
from requests.exceptions import Timeout, HTTPError, RequestExceptiontry:response = requests.get('http://www.example.com', timeout=5)response.raise_for_status()  # 如果响应状态码不是200, 引发HTTPError异常
except Timeout:print("请求超时，请稍后重试。")
except HTTPError as err:print(f"HTTP错误发生：{err}")
except RequestException as e:print(f"请求出错：{e}")

2. 页面解析异常处理

在解析网页时，可能会因为元素不存在或页面结构变化导致解析失败。以下是如何处理页面解析异常的代码示例：

from bs4 import BeautifulSouphtml = "<div class='product'>Price: $100</div>"
try:soup = BeautifulSoup(html, "html.parser")price = soup.find("span", class_="price").text
except AttributeError:price = "N/A"
print(price)

3. 重试机制

当遇到网络异常或超时时，设置重试机制可以让爬虫重新尝试获取数据。以下是如何设置重试机制的代码示例：

from retrying import retry@retry(stop_max_attempt_number=3, wait_fixed=2000)
def fetch_data(url):response = requests.get(url)return response.json()try:data = fetch_data('http://www.example.com/api/data')
except Exception as e:print('获取数据失败：', str(e))

4. 日志记录

使用日志记录工具（如Python的 logging 模块）记录错误和异常信息。这样可以方便地查看和分析程序的运行情况，并帮助调试和排查问题。

import logginglogging.basicConfig(filename="error.log", level=logging.ERROR)try:# 可能引发异常的代码块pass
except Exception as e:logging.error("发生异常: %s", str(e))

5. 动态调整XPath或CSS选择器

针对不同HTML结构设计备选方案，增加容错机制，使用 try-except 捕获异常。

try:price = soup.find("span", class_="price").text
except AttributeError:try:price = soup.find("div", class_="price").textexcept AttributeError:price = "N/A"
print(price)

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.rhkb.cn/news/480367.html

如若内容造成侵权/违法违规/事实不符，请联系长河编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！