关于自动编号的知识可以参考《在 Open XML WordprocessingML 中使用编号列表》
链接:https://learn.microsoft.com/zh-cn/previous-versions/office/ee922775(v=office.14)
python-docx库并不能直接解析出Word文档的自动编号,因为原理较为复杂,但我们希望python能够读取自动编号对应的文本。
基本解析原理
为了测试验证,我们创建一个带有编号的文档进行测试,例如:
然后我们先看看主文档中,对应的xml存储:
from docx import Documentdoc = Document(r"编号测试1.docx")
for paragraph in doc.paragraphs:print(paragraph._element.xml)break
结果:
<w:p ...><w:pPr><w:numPr><w:ilvl w:val="0"/><w:numId w:val="1"/></w:numPr><w:bidi w:val="0"/><w:ind w:left="0" w:leftChars="0" w:firstLine="0" w:firstLineChars="0"/><w:rPr><w:rFonts w:hint="eastAsia"/><w:lang w:val="en-US" w:eastAsia="zh-CN"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:hint="eastAsia"/><w:lang w:val="en-US" w:eastAsia="zh-CN"/></w:rPr><w:t>第一章</w:t></w:r>
</w:p>
在微软的文档中,说明了最重要的部分:
w:numPr 元素包含自动编号元素。w:ilvl 元素从零开始表示编号等级,w:numId 元素是编号部件的索引。
w:numId 为 0 值时 ,表示编号已经被删除段落不含列表项。
所以我们可以根据段落是否存在w:numPr
并且w:numId
的值不为0判断段落是否存在自动编号。
然后我们需要获取每个w:numId
对应的自动编号状态,这个信息存储在zip压缩包的\word\numbering.xml
文件中,可以参考微软文档的示例:
w:numbering
同时包含w:num
和w:abstractNum
两种节点,其中w:num
记录了 每个numId
对应的abstractNumId
,而w:abstractNum
记录了每个abstractNumId
对应的编号格式,包含了每个级别的编号样式信息。对于w:num
,python-docx库已经帮我们解析好,可以直接读取,但w:abstractNum
节点python-docx库却并未进行解析,只能我们自己进行xml解析。
可以通过如下代码获取每个numId
对应的abstractNumId
:
from docx import Documentdoc = Document(r"编号测试1.docx")
numbering_part = doc.part.numbering_part._element
numId2abstractId = {num.numId: num.abstractNumId.val for num in numbering_part.num_lst
}
接下来我们需要解析w:abstractNum
节点,查阅python-docx库的源码可以知道,它使用lxml的etree进行xml解析。
初步解析代码为:
from docx.oxml.ns import qnabstractNumId2style = {}
for abstractNumIdTag in numbering_part.findall(qn("w:abstractNum")):abstractNumId = abstractNumIdTag.get(qn("w:abstractNumId"))for lvlTag in abstractNumIdTag.findall(qn("w:lvl")):ilvl = lvlTag.get(qn("w:ilvl"))style = {tag.tag[tag.tag.rfind("}") + 1:]: tag.get(qn("w:val"))for tag in lvlTag.xpath("./*[@w:val]", namespaces=numbering_part.nsmap)}abstractNumId2style[(int(abstractNumId), int(ilvl))] = style
print(abstractNumId2style)
注意:docx.oxml.ns的qn函数可以将w:转换为对应的命名空间名称,但对于xpath表达式却无法正确处理,所以对于xpath表达式使用namespaces传入对应的命名空间。
除了上面的解析方法以外,还可以事先将节点的所有命名空间清除后再解析,清除代码如下:
def remove_namespace(node):node_tag = node.tagif '}' in node_tag:node.tag = node_tag[node_tag.rfind("}") + 1:]for attr_key in list(node.attrib):if '}' in attr_key:new_attr_key = attr_key[attr_key.rfind("}") + 1:]node.attrib[new_attr_key] = node.attrib.pop(attr_key)for child in node:remove_namespace(child)return node
这样可以递归消除目标节点所有子节点的命名空间。
可以每个类别每个级别的自动编号的属性信息:
{(0, 0): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.', 'lvlJc': 'left'}, (0, 1): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.', 'lvlJc': 'left'}, (0, 2): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.', 'lvlJc': 'left'}, (0, 3): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.%4.', 'lvlJc': 'left'}, (0, 4): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.%4.%5.', 'lvlJc': 'left'}, (0, 5): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.%4.%5.%6.', 'lvlJc': 'left'}, (0, 6): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.%4.%5.%6.%7.', 'lvlJc': 'left'}, (0, 7): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.%4.%5.%6.%7.%8.', 'lvlJc': 'left'}, (0, 8): {'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.%4.%5.%6.%7.%8.%9.', 'lvlJc': 'left'}}
当然我们只测试了最基本的数值型自动编号,有些自动编号对应的节点没有直接的w:numFmt
节点,解析代码还需针对性调整。
微软的文档中提到,对多级列表的某一级列表进行特殊设定时,w:num
内会出现w:lvlOverride
节点,但本人使用wps反复测试过后并没有出现。估计这种格式的xml只会在老版的office中出现,而且我们也不会故意在多级列表的某一级进行特殊设定,所以我们不考虑这种情况。
还需要考虑 w:suff
元素控制的列表后缀,即列表项与段落之间的空白内容,有可能为制表符和空格,也可以什么都没有。处理代码为:
{"space": " ", "nothing": ""}.get(style.get("suff"), "\t")
多级编号处理
首先尝试读取每个段落对应的自动编号样式:
for paragraph in doc.paragraphs:numpr = paragraph._element.pPr.numPrif numpr is not None and numpr.numId.val != 0:numId = numpr.numId.valilvl = numpr.ilvl.valabstractId = numId2abstractId[numId]style = abstractNumId2style[(abstractId, ilvl)]print(style)print(paragraph.text)
结果:
{'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.', 'lvlJc': 'left'}
第一章
{'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.', 'lvlJc': 'left'}
第一节
{'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.', 'lvlJc': 'left'}
第二节
{'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.', 'lvlJc': 'left'}
第一条
{'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.%2.%3.', 'lvlJc': 'left'}
第二条
{'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.', 'lvlJc': 'left'}
第二章
{'start': '1', 'numFmt': 'decimal', 'lvlText': '%1.', 'lvlJc': 'left'}
第三章
我们需要一个计数器来记录每个样式出现的次数,从而生成其对应的编号。
cache = {}
for paragraph in doc.paragraphs:numpr = paragraph._element.pPr.numPrlvlText = ""if numpr is not None and numpr.numId.val != 0:numId = numpr.numId.valilvl = numpr.ilvl.valabstractId = numId2abstractId[numId]style = abstractNumId2style[(abstractId, ilvl)]if (abstractId, ilvl) in cache:cache[(abstractId, ilvl)] += 1else:cache[(abstractId, ilvl)] = int(style["start"])lvlText = style.get("lvlText")for i in range(0, ilvl + 1):lvlText = lvlText.replace(f'%{i + 1}', str(cache[(abstractId, i)]))suff_text = {"space": " ", "nothing": ""}.get(style.get("suff"), "\t")lvlText += suff_textprint(lvlText + paragraph.text)
结果:
1. 第一章
1.1. 第一节
1.2. 第二节
1.2.1. 第一条
1.2.2. 第二条
2. 第二章
3. 第三章
各种其他类型的编号生成
为了尽量多的支持更多类型的编号,我创建了如下测试文件:
我们没有必要获取对应的圆圈数字,圆圈就获取对应的整数。
除了三种日文编号,上面的示例几乎包含所有的编号类型。需要注意三位数以上的数字格式,其xml有些特殊,例如:
<w:lvl><w:start w:val="1"/><mc:AlternateContent><mc:Choice Requires="w14"><w:numFmt w:val="custom" w:format="001, 002, 003, ..."/></mc:Choice><mc:Fallback><w:numFmt w:val="decimal"/></mc:Fallback></mc:AlternateContent><w:suff w:val="space"/><w:lvlText w:val="%1"/><w:lvlJc w:val="left"/><w:pPr><w:tabs><w:tab w:val="left" w:pos="0"/></w:tabs></w:pPr><w:rPr><w:rFonts w:hint="default"/></w:rPr>
</w:lvl>
基于此,解析格式的代码也作出如下调整:
abstractNumId2style = {}
for abstractNumIdTag in numbering_part.findall(qn("w:abstractNum")):abstractNumId = abstractNumIdTag.get(qn("w:abstractNumId"))for lvlTag in abstractNumIdTag.findall(qn("w:lvl")):ilvl = lvlTag.get(qn("w:ilvl"))style = {tag.tag[tag.tag.rfind("}") + 1:]: tag.get(qn("w:val"))for tag in lvlTag.xpath("./*[@w:val]", namespaces=numbering_part.nsmap)}if "numFmt" not in style:numFmtVal = lvlTag.xpath("./mc:AlternateContent/mc:Fallback/w:numFmt/@w:val",namespaces=numbering_part.nsmap)if numFmtVal and numFmtVal[0] == "decimal":numFmt_format = lvlTag.xpath("./mc:AlternateContent/mc:Choice/w:numFmt/@w:format",namespaces=numbering_part.nsmap)if numFmt_format:style["numFmt"] = "decimal" + numFmt_format[0].split(",")[0]if style.get("numFmt") == "decimalZero":style["numFmt"] = "decimal01"abstractNumId2style[(int(abstractNumId), int(ilvl))] = style
目前只发现这种基于decimal
的格式,所以只针对这种自定义格式处理,其他类型的统一认为是没有自动编号。另外既然三位数的整数格式已经被我们命名为decimal001
,那么也将二位数的decimalZero
修改为decimal01
。
目前测试出这个文件有以下这些numFmt
:
bullet,cardinalText,chineseCounting,chineseLegalSimplified,decimal,decimalEnclosedCircleChinese,ideographTraditional,ideographZodiac,lowerLetter,lowerRoman,ordinal,ordinalText,upperLetter,upperRoman
下面我们预先选择一些可能比较复杂的转换编写相应的函数:
正整数转换为大写字母
代码如下:
def int2upperLetter(num):result = []while num > 0:num -= 1remainder = num % 26result.append(chr(remainder + ord('A')))num //= 26return "".join(reversed(result))
正整数转换为罗马数字
def int2upperRoman(num):t = [(1000, 'M'), (900, 'CM'), (500, 'D'),(400, 'CD'), (100, 'C'), (90, 'XC'),(50, 'L'), (40, 'XL'), (10, 'X'),(9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')]roman_num = ''i = 0while num > 0:val, syb = t[i]for _ in range(num // val):roman_num += sybnum -= vali += 1return roman_num
正整数转换为英文基数字
def int2cardinalText(num):if not isinstance(num, int) or num < 0 or num > 999999999999:raise ValueError("Invalid number: must be a positive integer within four digits")base = ["Zero", "One", "Two", "Three", "Four", "Five", "Six","Seven", "Eight", "Nine", "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen","Fifteen", "Sixteen", "Seventeen", "Eighteen", "Nineteen"]tens = ["", "", "Twenty", "Thirty", "Fourty","Fifty", "Sixty", "Seventy", "Eighty", "Ninety"]thousands = ["", "Thousand", "Million", "Billion"]def two_digits(n):if n < 20:return base[n]ten, unit = divmod(n, 10)if unit == 0:return f"{tens[ten]}"else:return f"{tens[ten]}-{base[unit]}"def three_digits(n):hundred, rest = divmod(n, 100)if hundred == 0:return two_digits(rest)result = f"{base[hundred]} hundred "if rest > 0:result += two_digits(rest)return result.strip()if num < 99:return two_digits(num)chunks = []while num > 0:num, remainder = divmod(num, 1000)chunks.append(remainder)words = []for i in range(len(chunks) - 1, -1, -1):if chunks[i] == 0:continuechunk_word = three_digits(chunks[i])if thousands[i]:chunk_word += f" {thousands[i]}"words.append(chunk_word)words = " ".join(words).lower()return words[0].upper()+words[1:]
正整数转换为英文序数字
def int2ordinalText(num):if not isinstance(num, int) or num < 0 or num > 999999:raise ValueError("Invalid number: must be a positive integer within four digits")base = ["Zero", "One", "Two", "Three", "Four", "Five", "Six","Seven", "Eight", "Nine", "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen","Fifteen", "Sixteen", "Seventeen", "Eighteen", "Nineteen"]baseth = ['Zeroth', 'First', 'Second', 'Third', 'Fourth', 'Fifth', 'Sixth', 'Seventh','Eighth', 'Ninth', 'Tenth', 'Eleventh', 'Twelfth', 'Thirteenth', 'Fourteenth','Fifteenth', 'Sixteenth', 'Seventeenth', 'Eighteenth', 'Nineteenth', 'Twentieth']tens = ["", "", "Twenty", "Thirty", "Fourty","Fifty", "Sixty", "Seventy", "Eighty", "Ninety"]tensth = ["", "", "Twentieth", "Thirtieth", "Fortieth","Fiftieth", "Sixtieth", "Seventieth", "Eightieth", "Ninetieth"]def two_digits(n):if n <= 20:return baseth[n]ten, unit = divmod(n, 10)result = tensth[ten]if unit != 0:result = f"{tens[ten]}-{baseth[unit]}"return resultthousand, num = divmod(num, 1000)result = []if thousand > 0:if num == 0:return f"{int2cardinalText(thousand)} thousandth"result.append(f"{int2cardinalText(thousand)} thousand")hundred, num = divmod(num, 100)if hundred > 0:if num == 0:result.append(f"{base[hundred]} hundredth")return " ".join(result)result.append(f"{base[hundred]} hundred")result.append(two_digits(num))result = " ".join(result).lower()return result[0].upper() + result[1:]
会复用前面的基数字转换规则。
正整数转换为中文数字
import redef int2Chinese(num, ch_num, units):if not (0 <= num <= 99999999):raise ValueError("仅支持小于一亿以内的正整数")def int2Chinese_in(num, ch_num, units):if not (0 <= num <= 9999):raise ValueError("仅支持小于一万以内的正整数")result = [ch_num[int(i)] + unit for i, unit in zip(reversed(str(num).zfill(4)), units)]result = "".join(reversed(result))zero_char = ch_num[0]result = re.sub(f"(?:{zero_char}[{units}])+", zero_char, result)result = result.rstrip(units[0])if result != zero_char:result = result.rstrip(zero_char)if result.lstrip(zero_char).startswith("一十"):result = result.replace("一", "")return resultif num < 10000:result = int2Chinese_in(num, ch_num, units)else:left = num // 10000right = num % 10000result = int2Chinese_in(left, ch_num, units) + "万" + int2Chinese_in(right, ch_num, units)if result != ch_num[0]:result = result.strip(ch_num[0])return resultdef int2ChineseCounting(num):return int2Chinese(num, ch_num='〇一二三四五六七八九', units='个十百千')def int2ChineseLegalSimplified(num):return int2Chinese(num, ch_num='零壹贰叁肆伍陆柒捌玖', units='个拾佰仟')
整体封装并改进
最终封装成为一个类:
import refrom docx import Document
from docx.oxml.ns import qnclass WithNumberDocxReader:ideographTraditional = "甲乙丙丁戊己庚辛壬癸"ideographZodiac = "子丑寅卯辰巳午未申酉戌亥"def __init__(self, docx, gap_text="\t"):self.docx = Document(docx)self.numId2style = self.get_style_data()self.gap_text = gap_textself.cnt = {}self.cache = {}self.result = []@propertydef texts(self):if self.result:return self.result.copy()self.cnt.clear()self.cache.clear()for paragraph in self.docx.paragraphs:number_text = self.get_number_text(paragraph._element.pPr.numPr)self.result.append(number_text + paragraph.text)return self.result.copy()def get_style_data(self):numbering_part = self.docx.part.numbering_part._elementabstractId2numId = {num.abstractNumId.val: num.numId for num in numbering_part.num_lst}numId2style = {}for abstractNumIdTag in numbering_part.findall(qn("w:abstractNum")):abstractNumId = abstractNumIdTag.get(qn("w:abstractNumId"))numId = abstractId2numId[int(abstractNumId)]for lvlTag in abstractNumIdTag.findall(qn("w:lvl")):ilvl = lvlTag.get(qn("w:ilvl"))style = {tag.tag[tag.tag.rfind("}") + 1:]: tag.get(qn("w:val"))for tag in lvlTag.xpath("./*[@w:val]", namespaces=numbering_part.nsmap)}if "numFmt" not in style:numFmtVal = lvlTag.xpath("./mc:AlternateContent/mc:Fallback/w:numFmt/@w:val",namespaces=numbering_part.nsmap)if numFmtVal and numFmtVal[0] == "decimal":numFmt_format = lvlTag.xpath("./mc:AlternateContent/mc:Choice/w:numFmt/@w:format",namespaces=numbering_part.nsmap)if numFmt_format:style["numFmt"] = "decimal" + numFmt_format[0].split(",")[0]if style.get("numFmt") == "decimalZero":style["numFmt"] = "decimal01"numId2style[(numId, int(ilvl))] = stylereturn numId2style@staticmethoddef int2upperLetter(num):result = []while num > 0:num -= 1remainder = num % 26result.append(chr(remainder + ord('A')))num //= 26return "".join(reversed(result))@staticmethoddef int2upperRoman(num):t = [(1000, 'M'), (900, 'CM'), (500, 'D'),(400, 'CD'), (100, 'C'), (90, 'XC'),(50, 'L'), (40, 'XL'), (10, 'X'),(9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')]roman_num = ''i = 0while num > 0:val, syb = t[i]for _ in range(num // val):roman_num += sybnum -= vali += 1return roman_num@staticmethoddef int2cardinalText(num):if not isinstance(num, int) or num < 0 or num > 999999999:raise ValueError("Invalid number: must be a positive integer within four digits")base = ["Zero", "One", "Two", "Three", "Four", "Five", "Six","Seven", "Eight", "Nine", "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen","Fifteen", "Sixteen", "Seventeen", "Eighteen", "Nineteen"]tens = ["", "", "Twenty", "Thirty", "Fourty","Fifty", "Sixty", "Seventy", "Eighty", "Ninety"]thousands = ["", "Thousand", "Million", "Billion"]def two_digits(n):if n < 20:return base[n]ten, unit = divmod(n, 10)if unit == 0:return f"{tens[ten]}"else:return f"{tens[ten]}-{base[unit]}"def three_digits(n):hundred, rest = divmod(n, 100)if hundred == 0:return two_digits(rest)result = f"{base[hundred]} hundred "if rest > 0:result += two_digits(rest)return result.strip()if num < 99:return two_digits(num)chunks = []while num > 0:num, remainder = divmod(num, 1000)chunks.append(remainder)words = []for i in range(len(chunks) - 1, -1, -1):if chunks[i] == 0:continuechunk_word = three_digits(chunks[i])if thousands[i]:chunk_word += f" {thousands[i]}"words.append(chunk_word)words = " ".join(words).lower()return words[0].upper() + words[1:]@staticmethoddef int2ordinalText(num):if not isinstance(num, int) or num < 0 or num > 999999:raise ValueError("Invalid number: must be a positive integer within four digits")base = ["Zero", "One", "Two", "Three", "Four", "Five", "Six","Seven", "Eight", "Nine", "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen","Fifteen", "Sixteen", "Seventeen", "Eighteen", "Nineteen"]baseth = ['Zeroth', 'First', 'Second', 'Third', 'Fourth', 'Fifth', 'Sixth', 'Seventh','Eighth', 'Ninth', 'Tenth', 'Eleventh', 'Twelfth', 'Thirteenth', 'Fourteenth','Fifteenth', 'Sixteenth', 'Seventeenth', 'Eighteenth', 'Nineteenth', 'Twentieth']tens = ["", "", "Twenty", "Thirty", "Fourty","Fifty", "Sixty", "Seventy", "Eighty", "Ninety"]tensth = ["", "", "Twentieth", "Thirtieth", "Fortieth","Fiftieth", "Sixtieth", "Seventieth", "Eightieth", "Ninetieth"]def two_digits(n):if n <= 20:return baseth[n]ten, unit = divmod(n, 10)result = tensth[ten]if unit != 0:result = f"{tens[ten]}-{baseth[unit]}"return resultthousand, num = divmod(num, 1000)result = []if thousand > 0:if num == 0:return f"{WithNumberDocxReader.int2cardinalText(thousand)} thousandth"result.append(f"{WithNumberDocxReader.int2cardinalText(thousand)} thousand")hundred, num = divmod(num, 100)if hundred > 0:if num == 0:result.append(f"{base[hundred]} hundredth")return " ".join(result)result.append(f"{base[hundred]} hundred")result.append(two_digits(num))result = " ".join(result).lower()return result[0].upper() + result[1:]@staticmethoddef int2Chinese(num, ch_num, units):if not (0 <= num <= 99999999):raise ValueError("仅支持小于一亿以内的正整数")def int2Chinese_in(num, ch_num, units):if not (0 <= num <= 9999):raise ValueError("仅支持小于一万以内的正整数")result = [ch_num[int(i)] + unit for i, unit in zip(reversed(str(num).zfill(4)), units)]result = "".join(reversed(result))zero_char = ch_num[0]result = re.sub(f"(?:{zero_char}[{units}])+", zero_char, result)result = result.rstrip(units[0])if result != zero_char:result = result.rstrip(zero_char)if result.lstrip(zero_char).startswith("一十"):result = result.replace("一", "")return resultif num < 10000:result = int2Chinese_in(num, ch_num, units)else:left = num // 10000right = num % 10000result = int2Chinese_in(left, ch_num, units) + "万" + int2Chinese_in(right, ch_num, units)if result != ch_num[0]:result = result.strip(ch_num[0])return result@staticmethoddef int2ChineseCounting(num):return WithNumberDocxReader.int2Chinese(num, ch_num='〇一二三四五六七八九', units='个十百千')@staticmethoddef int2ChineseLegalSimplified(num):return WithNumberDocxReader.int2Chinese(num, ch_num='零壹贰叁肆伍陆柒捌玖', units='个拾佰仟')def get_number_text(self, numpr):if numpr is None or numpr.numId.val == 0:return ""numId = numpr.numId.valilvl = numpr.ilvl.valstyle = self.numId2style[(numId, ilvl)]numFmt: str = style.get("numFmt")lvlText = style.get("lvlText")if (numId, ilvl) in self.cnt:self.cnt[(numId, ilvl)] += 1else:self.cnt[(numId, ilvl)] = int(style["start"])pos = self.cnt[(numId, ilvl)]num_text = str(pos)if numFmt.startswith('decimal'):num_text = num_text.zfill(numFmt.count("0") + 1)elif numFmt == 'upperRoman':num_text = self.int2upperRoman(pos)elif numFmt == 'lowerRoman':num_text = self.int2upperRoman(pos).lower()elif numFmt == 'upperLetter':num_text = self.int2upperLetter(pos)elif numFmt == 'lowerLetter':num_text = self.int2upperLetter(pos).lower()elif numFmt == 'ordinal':num_text = f"{pos}{'th' if 11 <= pos <= 13 else {1: 'st', 2: 'nd', 3: 'rd'}.get(pos % 10, 'th')}"elif numFmt == 'cardinalText':num_text = self.int2cardinalText(pos)elif numFmt == 'ordinalText':num_text = self.int2ordinalText(pos)elif numFmt == 'ideographTraditional':if 1 <= pos <= 10:num_text = self.ideographTraditional[pos - 1]elif numFmt == 'ideographZodiac':if 1 <= pos <= 12:num_text = self.ideographZodiac[pos - 1]elif numFmt == 'chineseCounting':num_text = self.int2ChineseCounting(pos)elif numFmt == 'chineseLegalSimplified':num_text = self.int2ChineseLegalSimplified(pos)elif numFmt == 'decimalEnclosedCircleChinese':passself.cache[(numId, ilvl)] = num_textfor i in range(0, ilvl + 1):lvlText = lvlText.replace(f'%{i + 1}', self.cache.get((numId, i), ""))suff_text = {"space": " ", "nothing": ""}.get(style.get("suff"), self.gap_text)lvlText += suff_textreturn lvlText
调用测试:
if __name__ == '__main__':doc = WithNumberDocxReader(r"编号测试2.docx", "")for text in doc.texts:print(text)
顺利达到打印出对应的字符:
点符
1.十进制数
01.零加十进制数
001 零零加十进制数
0001 零零零加十进制数
I 大写罗马数字 (I)
II 大写罗马数字 (II)
i 小写罗马数字
A.大写字母A
a 小写字母 (a)
0th 序数 (1st, 2nd, 3rd)
Twelve 基数字 (One, Two Three)
First 序数字 (First, Second, Third)癸 甲乙丙丁戊己庚辛壬癸
壹 中文大写数字
10 圆圈数字
子 子丑寅卯辰巳午未申酉戌亥第一章 中文数字