Python批量查字典和爬取双语例句

最近，有网友反映，我的批量查字典工具换到其它的网站就不好用了。对此，我想说的是，互联网包罗万象，网站的各种设置也有所不同，并不是所有的在线字典都可以用Python爬取的。事实上，很多网站为了防止被爬取内容，早就提高了网站的安全级别，不会让用户轻意爬取内容的。

由于这名网友想要的是韩语翻译，所以我就不能拿原来的网站来操作了，只好去网上查询网速快、又不对爬虫有限制的网站来操作。终于，探索出了爬取某字典网站上内容的方法。

一、用BeautifulSoup获取翻译

这是一个字典网站，也是一个双语句库网站，对于汉语的韩语翻译，我们可以通过requests来获取网页源文，再用BeautifulSoup进行解析，然后用soup.find()查找想要的标签信息和Class，提取文本信息，然后再写入到xls文件就可以了，代码如下：

import xlwt
import requests
from bs4 import BeautifulSoupheaders = {"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36 Edg/114.0.1823.37"}def get_word(word):url=f"https://zh.glosbe.com/zh/ko/{word}"resp = requests.get(url,headers=headers)soup = BeautifulSoup(resp.text, 'html.parser')# 查找查询结果result = soup.find('div', class_="inline leading-10")if result:return result.text.split()[0]else:return "未找到翻译"def process_txt_file(filename):# 创建工作簿wb = xlwt.Workbook()# 创建表单sh = wb.add_sheet("sheet 1")with open(filename, 'r', encoding='utf-8') as file:words = [i.strip() for i in file.readlines()]for index,word in enumerate(words):sh.write(index,0,word)sh.write(index,1,get_word(word))wb.save('translation_results.xls')
#调用函数并传入txt文件路径
process_txt_file('words.txt')

二、用openpyxl来写入xlsx文件

上面的代码中采用的是xlwt来写入到xls文件，我们也可以改用openpyxl，同时，我们还可以通过soup.h3.string来更快地定位所需要的位置信息。这次我们把查询的内容由韩语改为英文，代码优化如下：

import requests
from bs4 import BeautifulSoup
import openpyxl
headers = {"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36 Edg/114.0.1823.37"}
def get_word(word):url=f"https://zh.glosbe.com/zh/en/{word}"resp = requests.get(url,headers=headers)soup = BeautifulSoup(resp.text, 'html.parser')# 查找查询结果#results = soup.find_all('div', class_="py-2 flex")results = soup.h3.stringif results:return results.strip()else:return "未找到翻译"
#     if results:
#         for result in results:
#             print(result.replace("\n\n\n","\n").strip()) 
#     else:
#         return "未找到翻译"
def process_txt_file(filename):workbook = openpyxl.Workbook()sheet = workbook.activewith open(filename, 'r', encoding='utf-8') as file:words = [i.strip() for i in file.readlines()]for index, word in enumerate(words):translation = get_word(word)sheet.cell(row=index + 1, column=1).value = wordsheet.cell(row=index + 1, column=2).value = translationworkbook.save('translation_results.xlsx')#调用函数并传入txt文件路径
process_txt_file('words.txt')