20230809在WIN10下使用python3将DOCX文件转换为TXT文件
2023/8/9 11:38
python docx txt
https://blog.51cto.com/u_16175446/6620474
如何实现Python读取word内容转为TXT的具体操作步骤
如何实现Python读取word内容转为TXT的具体操作步骤 原创
mob649e81576de12023-07-04 14:08:13
文章标签PythonWordtxt文件文章分类Python后端开发阅读数234
Python读取word内容转为TXT
作为一名经验丰富的开发者,我很乐意教会你如何使用Python读取word内容并将其转换为txt格式。下面是整个流程的步骤和需要使用的代码。
步骤
步骤 描述
步骤 1 安装Python-docx库
步骤 2 打开Word文档
步骤 3 读取文档内容
步骤 4 将内容保存为txt文件
代码解释
步骤 1: 安装Python-docx库
Python-docx是一个Python库,可以用于读取、查询和修改Microsoft Word 2007以上版本的docx文件。在开始之前,首先需要安装Python-docx库。在命令行中运行以下命令:
pip install python-docx
1.
步骤 2: 打开Word文档
要打开Word文档,我们需要使用Python-docx库中的Document类。以下是打开Word文档的代码:
from docx import Document
doc_path = "path_to_your_word_file.docx"
doc = Document(doc_path)
请将path_to_your_word_file.docx替换为你要读取的Word文档的完整路径。
步骤 3: 读取文档内容
在这一步中,我们将使用Document对象的paragraphs属性来访问文档的每个段落,然后使用text属性来获取段落的文本内容。以下是读取文档内容的代码:
text_content = ""
for paragraph in doc.paragraphs:
text_content += paragraph.text
在上述代码中,我们创建了一个空字符串变量text_content,然后使用for循环遍历文档中的每个段落。通过访问paragraph.text属性,我们可以获取每个段落的文本内容,并将其添加到text_content字符串中。
步骤 4: 将内容保存为txt文件
最后一步是将读取的内容保存为txt文件。以下是将内容保存为txt文件的代码:
txt_file_path = "path_to_save_txt_file.txt"
with open(txt_file_path, "w") as txt_file:
txt_file.write(text_content)
请将path_to_save_txt_file.txt替换为你希望保存txt文件的完整路径。
在上述代码中,我们使用open函数打开一个txt文件,并将其赋值给txt_file变量。然后,我们使用write方法将之前读取的内容text_content写入txt文件中。
至此,你已经完成了将Word内容转换为txt的整个过程。
希望这篇文章能帮助到你,让你能够轻松地使用Python读取Word文档并将其内容转换为txt格式。如果还有其他问题,请随时提问。
txt2docx1.py【删除了全部的换行符号!】
from docx import Document
doc_path = "path_to_your_word_file.docx"
doc = Document(doc_path)
text_content = ""
for paragraph in doc.paragraphs:
text_content += paragraph.text
txt_file_path = "path_to_save_txt_file.txt"
with open(txt_file_path, "w") as txt_file:
txt_file.write(text_content)
txt2docx2.py【处理完成换行了】
from docx import Document
doc_path = "path_to_your_word_file.docx"
doc = Document(doc_path)
text_content = ""
for paragraph in doc.paragraphs:
text_content += paragraph.text
text_content += '\n'
txt_file_path = "path_to_save_txt_file.txt"
with open(txt_file_path, "w") as txt_file:
txt_file.write(text_content)
txt2docx3utf8.py【处理UTF8编码】
from docx import Document
doc_path = "path_to_your_word_file.docx"
doc = Document(doc_path)
text_content = ""
for paragraph in doc.paragraphs:
text_content += paragraph.text
text_content += '\n'
#with open("path_to_save_utf8_file.txt", "w", encoding="UTF-8") as utf8_file:
#txt_file_path = "path_to_save_txt_file.txt"
#with open(txt_file_path, "w") as txt_file:
txt_file_path = "path_to_save_txt+utf8_file.txt"
with open(txt_file_path, "w", encoding="UTF-8") as txt_file:
txt_file.write(text_content)
转存为TXT文件,以ANSI编码和以UTF-8编码的,内容比对相同!
docx2txt2all.py/docx2txt+ansi3all.py【处理当前目录的DOCX为ANSI编码的TXT】
# coding=utf-8
import os
import docx
# 获取当前目录
path = os.getcwd()
# 查看当前目录下所有文件
files = os.listdir(path)
# 遍历所有文件
for file in files:
# 判断文件是否为 txt 文件
#if file.endswith('.txt'):
if file.endswith('.docx'):
# 构建新的文件名
#new_file = file.replace('.txt', '.json')
#new_file = file.replace('.docx', '.srt')
new_file = file.replace('.docx', '.txt')
# 重命名文件
#os.rename(os.path.join(path, file), os.path.join(path, new_file))
from docx import Document
#doc_path = "path_to_your_word_file.docx"
#doc = Document(doc_path)
doc = Document(file)
text_content = ""
for paragraph in doc.paragraphs:
text_content += paragraph.text
text_content += '\n'
#txt_file_path = "path_to_save_txt_file.txt"
#with open(txt_file_path, "w") as txt_file:
with open(new_file, "w") as txt_file:
txt_file.write(text_content)
utf8docx2tx4all.py【处理当前目录的DOCX为UTF8编码的TXT】
# coding=utf-8
import os
import docx
# 获取当前目录
path = os.getcwd()
# 查看当前目录下所有文件
files = os.listdir(path)
# 遍历所有文件
for file in files:
# 判断文件是否为 txt 文件
#if file.endswith('.txt'):
if file.endswith('.docx'):
# 构建新的文件名
#new_file = file.replace('.txt', '.json')
#new_file = file.replace('.docx', '.srt')
new_file = file.replace('.docx', '.txt')
# 重命名文件
#os.rename(os.path.join(path, file), os.path.join(path, new_file))
from docx import Document
#doc_path = "path_to_your_word_file.docx"
#doc = Document(doc_path)
doc = Document(file)
text_content = ""
for paragraph in doc.paragraphs:
text_content += paragraph.text
text_content += '\n'
#txt_file_path = "path_to_save_txt_file.txt"
#with open(txt_file_path, "w") as txt_file:
#with open(new_file, "w") as txt_file:
#txt_file_path = "path_to_save_txt+utf8_file.txt"
#with open(txt_file_path, "w", encoding="UTF-8") as txt_file:
with open(new_file, "w", encoding="UTF-8") as txt_file:
txt_file.write(text_content)