Python實(shí)現(xiàn)Word文檔與JSON格式雙向轉(zhuǎn)換的完整教程與代碼解析
在現(xiàn)代辦公自動(dòng)化和數(shù)據(jù)處理中,Word文檔與JSON格式之間的轉(zhuǎn)換需求日益增多。本文將詳細(xì)介紹如何使用Python實(shí)現(xiàn).docx文件與JSON格式之間的高效雙向轉(zhuǎn)換,并提供一個(gè)完整的解決方案。
一、功能概述與應(yīng)用場(chǎng)景
Word文檔與JSON格式的轉(zhuǎn)換在多個(gè)場(chǎng)景下非常有用:
- 文檔內(nèi)容提取與分析:將Word文檔內(nèi)容轉(zhuǎn)換為結(jié)構(gòu)化JSON數(shù)據(jù),便于后續(xù)處理和分析
- 自動(dòng)化報(bào)告生成:將JSON數(shù)據(jù)自動(dòng)填充到預(yù)定義的Word模板中
- 文檔格式轉(zhuǎn)換:作為Word與其他格式(如Markdown、HTML)轉(zhuǎn)換的中間步驟
- 內(nèi)容管理系統(tǒng):實(shí)現(xiàn)文檔內(nèi)容的版本控制和結(jié)構(gòu)化存儲(chǔ)
二、核心技術(shù)與庫選擇
實(shí)現(xiàn)Word與JSON轉(zhuǎn)換主要依賴以下Python庫:
- python-docx:專門用于讀寫Word
.docx文件的主流庫 - json:Python標(biāo)準(zhǔn)庫,處理JSON格式數(shù)據(jù)
與其他方案相比,如Simplify-Docx 或FastMCP框架 ,直接使用python-docx提供了更大的靈活性和控制力,適合需要精細(xì)處理文檔樣式的場(chǎng)景。
三、代碼實(shí)現(xiàn)詳解
3.1 從Word文檔提取JSON數(shù)據(jù)
docx_to_json函數(shù)負(fù)責(zé)將Word文檔轉(zhuǎn)換為結(jié)構(gòu)化JSON數(shù)據(jù),其核心邏輯如下:
def docx_to_json(docx_path):
document = Document(docx_path)
doc_data = {
"paragraphs": [],
"styles": [],
"tables": []
}
# 提取文檔樣式信息
styles = document.styles
for style in styles:
if style.type == WD_STYLE_TYPE.PARAGRAPH:
style_info = {}
# 只提取非空的樣式屬性
if style.name:
style_info["name"] = style.name
if style.font.name:
style_info["font_name"] = style.font.name
# 更多樣式屬性提取...
if style_info:
doc_data["styles"].append(style_info)
這種方法不僅提取文本內(nèi)容,還完整保留樣式信息,確保轉(zhuǎn)換后的JSON數(shù)據(jù)能夠準(zhǔn)確還原原始文檔格式 。
3.2 從JSON數(shù)據(jù)還原Word文檔
json_to_docx函數(shù)實(shí)現(xiàn)反向轉(zhuǎn)換,其關(guān)鍵技術(shù)點(diǎn)包括:
def json_to_docx(json_data, output_path):
document = Document()
# 處理段落和文本樣式
for para_data in json_data.get("paragraphs", []):
style_name = para_data.get("style", "Normal")
try:
paragraph = document.add_paragraph(style=style_name)
except:
paragraph = document.add_paragraph(style="Normal")
# 設(shè)置段落對(duì)齊方式
alignment_str = para_data.get("alignment")
if alignment_str:
if "CENTER" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
# 其他對(duì)齊方式處理...
# 處理文本運(yùn)行(runs)及其樣式
runs_data = para_data.get("runs", [])
if runs_data:
for run_data in runs_data:
text = run_data.get("text", "")
run = paragraph.add_run(text)
# 設(shè)置粗體、斜體、下劃線等樣式
run.bold = run_data.get("bold", False)
run.italic = run_data.get("italic", False)
# 更多樣式設(shè)置...
此實(shí)現(xiàn)特別注重樣式還原的準(zhǔn)確性,為缺失的樣式屬性提供合理的默認(rèn)值,確保生成的文檔具有良好的可讀性 。
3.3 表格處理機(jī)制
代碼還包含對(duì)Word表格的完整處理:
# 處理表格數(shù)據(jù)
for table_data in json_data.get("tables", []):
if table_data.get("rows"):
# 動(dòng)態(tài)創(chuàng)建表格
first_row = table_data["rows"][0]
num_rows = len(table_data["rows"])
num_cols = len(first_row["cells"]) if first_row.get("cells") else 1
table = document.add_table(rows=num_rows, cols=num_cols)
# 填充表格內(nèi)容
for i, row_data in enumerate(table_data["rows"]):
row = table.rows[i]
for j, cell_data in enumerate(row_data.get("cells", [])):
if j < len(row.cells):
cell = row.cells[j]
cell.text = cell_data.get("text", "")
表格處理采用動(dòng)態(tài)結(jié)構(gòu)創(chuàng)建方式,根據(jù)JSON數(shù)據(jù)自動(dòng)確定行列數(shù),保證表格結(jié)構(gòu)的準(zhǔn)確性 。
四、使用教程
4.1 環(huán)境準(zhǔn)備
首先安裝必要的依賴庫:
pip install python-docx
4.2 基本使用示例
將Word文檔轉(zhuǎn)換為JSON:
from docx_to_json_converter import docx_to_json
# 轉(zhuǎn)換Word文檔為JSON
json_data = docx_to_json("示例文檔.docx")
# 保存JSON文件
import json
with open("文檔數(shù)據(jù).json", "w", encoding="utf-8") as f:
json.dump(json_data, f, ensure_ascii=False, indent=2)
將JSON數(shù)據(jù)還原為Word文檔:
from docx_to_json_converter import json_to_docx
# 讀取JSON數(shù)據(jù)
with open("文檔數(shù)據(jù).json", "r", encoding="utf-8") as f:
json_data = json.load(f)
# 轉(zhuǎn)換為Word文檔
json_to_docx(json_data, "還原的文檔.docx")
4.3 高級(jí)功能使用
代碼還提供了交互式命令行界面,直接運(yùn)行腳本即可選擇轉(zhuǎn)換方向:
python docx_to_json_converter.py
根據(jù)提示選擇操作類型(1或2),然后輸入文件路徑即可完成轉(zhuǎn)換 。
五、擴(kuò)展應(yīng)用與進(jìn)階技巧
5.1 樣式模板復(fù)用
在實(shí)際應(yīng)用中,可以結(jié)合模板復(fù)用機(jī)制提高效率:
# 創(chuàng)建樣式模板
def create_style_template(docx_path):
json_data = docx_to_json(docx_path)
# 提取并保存樣式信息
template = {
"styles": json_data["styles"],
"metadata": {"created_time": "2023-01-01", "type": "report"}
}
return template
這種方法特別適用于批量生成標(biāo)準(zhǔn)化文檔的場(chǎng)景,如報(bào)告、合同等 。
5.2 與LangChain集成
可以將此工具與LangChain等AI框架集成,實(shí)現(xiàn)智能文檔處理:
from langchain.document_loaders import Docx2txtLoader
# 加載生成的Word文檔
loader = Docx2txtLoader("還原的文檔.docx")
documents = loader.load()
# 后續(xù)進(jìn)行文本分析、問答等AI處理
這種結(jié)合為文檔處理提供了更多可能性,如自動(dòng)摘要、內(nèi)容分類等 。
六、性能優(yōu)化建議
- 大文件處理:對(duì)于大型Word文檔,可以采用分塊處理策略,避免內(nèi)存溢出
- 緩存機(jī)制:對(duì)常用樣式模板實(shí)施緩存,提高轉(zhuǎn)換效率
- 批量處理:大量文檔轉(zhuǎn)換時(shí),可以實(shí)現(xiàn)并行處理機(jī)制
七、總結(jié)
本文介紹的Word文檔與JSON雙向轉(zhuǎn)換方案具有以下優(yōu)勢(shì):
- 完整性:支持文本、樣式、表格等Word文檔核心元素的轉(zhuǎn)換
- 靈活性:提供了API和命令行兩種使用方式,適應(yīng)不同場(chǎng)景需求
- 實(shí)用性:代碼可直接用于生產(chǎn)環(huán)境,且易于擴(kuò)展
這種轉(zhuǎn)換工具在文檔自動(dòng)化處理、內(nèi)容管理系統(tǒng)和數(shù)據(jù)遷移等場(chǎng)景下具有重要價(jià)值。通過進(jìn)一步集成其他工具(如pandoc、OCR技術(shù)等),還可以擴(kuò)展更多文檔處理能力 。
完整代碼已在文章開頭提供,您可以直接復(fù)制使用或根據(jù)需要進(jìn)行修改。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Docx to JSON and JSON to Docx converter
可以將docx文件的所有樣式抽取成為json對(duì)象,也可以將json對(duì)象還原為docx文件
"""
import json
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.style import WD_STYLE_TYPE
from docx.shared import RGBColor, Pt
from docx.oxml.ns import qn
import os
def docx_to_json(docx_path):
"""
將docx文件轉(zhuǎn)換為JSON格式
忽略值為null的樣式屬性
"""
document = Document(docx_path)
# 存儲(chǔ)所有內(nèi)容的字典
doc_data = {
"paragraphs": [],
"styles": [],
"tables": []
}
# 獲取所有樣式
styles = document.styles
for style in styles:
if style.type == WD_STYLE_TYPE.PARAGRAPH:
style_info = {}
# 只添加非空的屬性
if style.name:
style_info["name"] = style.name
if style.type:
style_info["type"] = "paragraph"
if style.font.name:
style_info["font_name"] = style.font.name
if style.font.size:
style_info["font_size"] = style.font.size.pt
if style.font.bold is not None:
style_info["bold"] = style.font.bold
if style.font.italic is not None:
style_info["italic"] = style.font.italic
if style.font.underline is not None:
style_info["underline"] = style.font.underline
if style.font.color.rgb:
style_info["color"] = str(style.font.color.rgb)
# 添加段落格式信息
if style.paragraph_format:
paragraph_format = {}
if style.paragraph_format.alignment is not None:
paragraph_format["alignment"] = str(style.paragraph_format.alignment)
if style.paragraph_format.left_indent:
paragraph_format["left_indent"] = style.paragraph_format.left_indent.pt
if style.paragraph_format.right_indent:
paragraph_format["right_indent"] = style.paragraph_format.right_indent.pt
if style.paragraph_format.first_line_indent:
paragraph_format["first_line_indent"] = style.paragraph_format.first_line_indent.pt
if style.paragraph_format.space_before:
paragraph_format["space_before"] = style.paragraph_format.space_before.pt
if style.paragraph_format.space_after:
paragraph_format["space_after"] = style.paragraph_format.space_after.pt
# 限制line_spacing值避免溢出
if style.paragraph_format.line_spacing and style.paragraph_format.line_spacing <= 100:
paragraph_format["line_spacing"] = style.paragraph_format.line_spacing
if style.paragraph_format.keep_with_next is not None:
paragraph_format["keep_with_next"] = style.paragraph_format.keep_with_next
if style.paragraph_format.keep_together is not None:
paragraph_format["keep_together"] = style.paragraph_format.keep_together
if style.paragraph_format.page_break_before is not None:
paragraph_format["page_break_before"] = style.paragraph_format.page_break_before
if style.paragraph_format.widow_control is not None:
paragraph_format["widow_control"] = style.paragraph_format.widow_control
if paragraph_format:
style_info["paragraph_format"] = paragraph_format
# 只有當(dāng)style_info不為空時(shí)才添加
if style_info:
doc_data["styles"].append(style_info)
# 獲取所有段落
for para in document.paragraphs:
para_info = {}
# 只添加非空的屬性
if para.text:
para_info["text"] = para.text
if para.style and para.style.name:
para_info["style"] = para.style.name
# 添加段落格式信息
if para.paragraph_format:
paragraph_format = {}
if para.paragraph_format.alignment is not None:
paragraph_format["alignment"] = str(para.paragraph_format.alignment)
if para.paragraph_format.left_indent:
paragraph_format["left_indent"] = para.paragraph_format.left_indent.pt
if para.paragraph_format.right_indent:
paragraph_format["right_indent"] = para.paragraph_format.right_indent.pt
if para.paragraph_format.first_line_indent:
paragraph_format["first_line_indent"] = para.paragraph_format.first_line_indent.pt
if para.paragraph_format.space_before:
paragraph_format["space_before"] = para.paragraph_format.space_before.pt
if para.paragraph_format.space_after:
paragraph_format["space_after"] = para.paragraph_format.space_after.pt
# 限制line_spacing值避免溢出
if para.paragraph_format.line_spacing and para.paragraph_format.line_spacing <= 100:
paragraph_format["line_spacing"] = para.paragraph_format.line_spacing
if para.paragraph_format.keep_with_next is not None:
paragraph_format["keep_with_next"] = para.paragraph_format.keep_with_next
if para.paragraph_format.keep_together is not None:
paragraph_format["keep_together"] = para.paragraph_format.keep_together
if para.paragraph_format.page_break_before is not None:
paragraph_format["page_break_before"] = para.paragraph_format.page_break_before
if para.paragraph_format.widow_control is not None:
paragraph_format["widow_control"] = para.paragraph_format.widow_control
if paragraph_format:
para_info["paragraph_format"] = paragraph_format
# 處理runs
runs_list = []
for run in para.runs:
run_info = {}
# 只添加非空的屬性
if run.text:
run_info["text"] = run.text
if run.bold is not None:
run_info["bold"] = run.bold
if run.italic is not None:
run_info["italic"] = run.italic
if run.underline is not None:
run_info["underline"] = run.underline
if run.font.name:
run_info["font_name"] = run.font.name
if run.font.size:
run_info["font_size"] = run.font.size.pt
if run.font.color.rgb:
run_info["color"] = str(run.font.color.rgb)
if run.font.highlight_color:
run_info["highlight_color"] = str(run.font.highlight_color)
if run.font.strike is not None:
run_info["strike"] = run.font.strike
if run.font.superscript is not None:
run_info["superscript"] = run.font.superscript
if run.font.subscript is not None:
run_info["subscript"] = run.font.subscript
if run.font.all_caps is not None:
run_info["all_caps"] = run.font.all_caps
if run.font.small_caps is not None:
run_info["small_caps"] = run.font.small_caps
# 只有當(dāng)run_info不為空時(shí)才添加
if run_info:
runs_list.append(run_info)
if runs_list:
para_info["runs"] = runs_list
# 只有當(dāng)para_info不為空時(shí)才添加
if para_info:
doc_data["paragraphs"].append(para_info)
# 獲取所有表格
for table in document.tables:
table_info = {
"rows": []
}
# 添加表格屬性
if hasattr(table, 'style') and table.style:
table_info["style"] = table.style.name
for row in table.rows:
row_info = {
"cells": []
}
for cell in row.cells:
cell_info = {}
# 只添加非空的屬性
if cell.text:
cell_info["text"] = cell.text
paragraphs_list = []
# 獲取單元格中的段落
for para in cell.paragraphs:
para_dict = {}
if para.text:
para_dict["text"] = para.text
if para.style and para.style.name:
para_dict["style"] = para.style.name
# 添加段落格式信息
if para.paragraph_format:
paragraph_format = {}
if para.paragraph_format.alignment is not None:
paragraph_format["alignment"] = str(para.paragraph_format.alignment)
if para.paragraph_format.left_indent:
paragraph_format["left_indent"] = para.paragraph_format.left_indent.pt
if para.paragraph_format.right_indent:
paragraph_format["right_indent"] = para.paragraph_format.right_indent.pt
if para.paragraph_format.first_line_indent:
paragraph_format["first_line_indent"] = para.paragraph_format.first_line_indent.pt
if para.paragraph_format.space_before:
paragraph_format["space_before"] = para.paragraph_format.space_before.pt
if para.paragraph_format.space_after:
paragraph_format["space_after"] = para.paragraph_format.space_after.pt
# 限制line_spacing值避免溢出
if para.paragraph_format.line_spacing and para.paragraph_format.line_spacing <= 100:
paragraph_format["line_spacing"] = para.paragraph_format.line_spacing
if paragraph_format:
para_dict["paragraph_format"] = paragraph_format
if para_dict:
paragraphs_list.append(para_dict)
if paragraphs_list:
cell_info["paragraphs"] = paragraphs_list
if cell_info:
row_info["cells"].append(cell_info)
if row_info["cells"]:
table_info["rows"].append(row_info)
if table_info["rows"]:
doc_data["tables"].append(table_info)
return doc_data
def json_to_docx(json_data, output_path):
"""
將JSON數(shù)據(jù)轉(zhuǎn)換為docx文件
為缺失的樣式屬性設(shè)置默認(rèn)值
"""
document = Document()
# 添加段落
for para_data in json_data.get("paragraphs", []):
# 設(shè)置默認(rèn)樣式
style_name = para_data.get("style", "Normal")
try:
paragraph = document.add_paragraph(style=style_name)
except:
paragraph = document.add_paragraph(style="Normal")
# 設(shè)置段落格式
paragraph_format_data = para_data.get("paragraph_format", {})
if paragraph_format_data:
# 設(shè)置段落對(duì)齊方式
alignment_str = paragraph_format_data.get("alignment")
if alignment_str:
# 解析對(duì)齊字符串,提取其中的枚舉值
if "LEFT" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
elif "CENTER" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
elif "RIGHT" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
elif "JUSTIFY" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
elif "DISTRIBUTE" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.DISTRIBUTE
elif "JUSTIFY_MED" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY_MED
# 設(shè)置段落間距和縮進(jìn)
if "left_indent" in paragraph_format_data:
paragraph.paragraph_format.left_indent = Pt(paragraph_format_data["left_indent"])
if "right_indent" in paragraph_format_data:
paragraph.paragraph_format.right_indent = Pt(paragraph_format_data["right_indent"])
if "first_line_indent" in paragraph_format_data:
paragraph.paragraph_format.first_line_indent = Pt(paragraph_format_data["first_line_indent"])
if "space_before" in paragraph_format_data:
paragraph.paragraph_format.space_before = Pt(paragraph_format_data["space_before"])
if "space_after" in paragraph_format_data:
paragraph.paragraph_format.space_after = Pt(paragraph_format_data["space_after"])
# 限制line_spacing值避免溢出
if "line_spacing" in paragraph_format_data and paragraph_format_data["line_spacing"] <= 100:
paragraph.paragraph_format.line_spacing = paragraph_format_data["line_spacing"]
if "keep_with_next" in paragraph_format_data:
paragraph.paragraph_format.keep_with_next = paragraph_format_data["keep_with_next"]
if "keep_together" in paragraph_format_data:
paragraph.paragraph_format.keep_together = paragraph_format_data["keep_together"]
if "page_break_before" in paragraph_format_data:
paragraph.paragraph_format.page_break_before = paragraph_format_data["page_break_before"]
if "widow_control" in paragraph_format_data:
paragraph.paragraph_format.widow_control = paragraph_format_data["widow_control"]
# 清空默認(rèn)文本并添加runs
paragraph.clear()
# 處理runs
runs_data = para_data.get("runs", [])
if runs_data:
for run_data in runs_data:
text = run_data.get("text", "")
run = paragraph.add_run(text)
# 設(shè)置run屬性,默認(rèn)為False
run.bold = run_data.get("bold", False)
run.italic = run_data.get("italic", False)
run.underline = run_data.get("underline", False)
run.font.strike = run_data.get("strike", False)
run.font.superscript = run_data.get("superscript", False)
run.font.subscript = run_data.get("subscript", False)
run.font.all_caps = run_data.get("all_caps", False)
run.font.small_caps = run_data.get("small_caps", False)
# 設(shè)置字體大小,默認(rèn)為Pt(12)
font_size = run_data.get("font_size")
if font_size:
run.font.size = Pt(font_size)
else:
run.font.size = Pt(12)
# 設(shè)置字體名稱,默認(rèn)為None(使用默認(rèn)字體)
font_name = run_data.get("font_name")
if font_name:
run.font.name = font_name
run._element.rPr.rFonts.set(qn('w:eastAsia'), font_name)
# 設(shè)置字體顏色,默認(rèn)為黑色
color = run_data.get("color")
if color and color != "None":
try:
run.font.color.rgb = RGBColor.from_string(color)
except:
# 如果顏色格式錯(cuò)誤,使用默認(rèn)黑色
pass
# 設(shè)置高亮顏色
highlight_color = run_data.get("highlight_color")
if highlight_color and highlight_color != "None":
try:
# 注意:此處簡(jiǎn)化處理,實(shí)際應(yīng)用中需要根據(jù)字符串映射到對(duì)應(yīng)的WD_COLOR_INDEX值
pass
except:
# 如果高亮顏色格式錯(cuò)誤,忽略
pass
else:
# 如果沒有runs數(shù)據(jù),則直接添加段落文本
text = para_data.get("text", "")
run = paragraph.add_run(text)
# 應(yīng)用默認(rèn)樣式
run.font.size = Pt(12)
# 添加表格
for table_data in json_data.get("tables", []):
if table_data.get("rows"):
# 創(chuàng)建表格,行數(shù)和列數(shù)根據(jù)第一行確定
first_row = table_data["rows"][0]
num_rows = len(table_data["rows"])
num_cols = len(first_row["cells"]) if first_row.get("cells") else 1
table = document.add_table(rows=num_rows, cols=num_cols)
# 設(shè)置表格樣式
table_style = table_data.get("style")
if table_style:
try:
table.style = table_style
except:
# 如果樣式不存在,使用默認(rèn)樣式
pass
# 填充表格內(nèi)容
for i, row_data in enumerate(table_data["rows"]):
row = table.rows[i]
for j, cell_data in enumerate(row_data.get("cells", [])):
if j < len(row.cells):
cell = row.cells[j]
cell.text = cell_data.get("text", "")
# 處理單元格中的段落
cell_paragraphs = cell_data.get("paragraphs", [])
if cell_paragraphs:
# 清除默認(rèn)段落
cell.paragraphs[0].clear()
# 添加段落
for para_data in cell_paragraphs:
para = cell.add_paragraph()
para.text = para_data.get("text", "")
# 設(shè)置段落樣式
para_style = para_data.get("style")
if para_style:
try:
para.style = para_style
except:
pass
# 設(shè)置段落格式
paragraph_format_data = para_data.get("paragraph_format", {})
if paragraph_format_data:
# 設(shè)置段落對(duì)齊方式
alignment_str = paragraph_format_data.get("alignment")
if alignment_str:
# 解析對(duì)齊字符串,提取其中的枚舉值
if "LEFT" in alignment_str:
para.alignment = WD_ALIGN_PARAGRAPH.LEFT
elif "CENTER" in alignment_str:
para.alignment = WD_ALIGN_PARAGRAPH.CENTER
elif "RIGHT" in alignment_str:
para.alignment = WD_ALIGN_PARAGRAPH.RIGHT
elif "JUSTIFY" in alignment_str:
para.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
elif "DISTRIBUTE" in alignment_str:
para.alignment = WD_ALIGN_PARAGRAPH.DISTRIBUTE
elif "JUSTIFY_MED" in alignment_str:
para.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY_MED
# 設(shè)置段落間距和縮進(jìn)
if "left_indent" in paragraph_format_data:
para.paragraph_format.left_indent = Pt(paragraph_format_data["left_indent"])
if "right_indent" in paragraph_format_data:
para.paragraph_format.right_indent = Pt(paragraph_format_data["right_indent"])
if "first_line_indent" in paragraph_format_data:
para.paragraph_format.first_line_indent = Pt(paragraph_format_data["first_line_indent"])
if "space_before" in paragraph_format_data:
para.paragraph_format.space_before = Pt(paragraph_format_data["space_before"])
if "space_after" in paragraph_format_data:
para.paragraph_format.space_after = Pt(paragraph_format_data["space_after"])
# 限制line_spacing值避免溢出
if "line_spacing" in paragraph_format_data and paragraph_format_data["line_spacing"] <= 100:
para.paragraph_format.line_spacing = paragraph_format_data["line_spacing"]
# 保存文檔
document.save(output_path)
def main():
"""
主函數(shù),演示如何使用轉(zhuǎn)換功能
"""
print("Docx Converter")
print("1. Convert docx to json")
print("2. Convert json to docx")
choice = input("請(qǐng)選擇操作 (1 或 2): ")
if choice == "1":
docx_path = input("請(qǐng)輸入docx文件路徑: ")
if not os.path.exists(docx_path):
print("文件不存在!")
return
json_data = docx_to_json(docx_path)
json_path = docx_path.replace(".docx", ".json")
with open(json_path, "w", encoding="utf-8") as f:
json.dump(json_data, f, ensure_ascii=False, indent=2)
print(f"轉(zhuǎn)換完成! JSON文件已保存為: {json_path}")
elif choice == "2":
json_path = input("請(qǐng)輸入json文件路徑: ")
if not os.path.exists(json_path):
print("文件不存在!")
return
with open(json_path, "r", encoding="utf-8") as f:
json_data = json.load(f)
output_path = json_path.replace(".json", "_restored.docx")
json_to_docx(json_data, output_path)
print(f"轉(zhuǎn)換完成! Docx文件已保存為: {output_path}")
else:
print("無效的選擇!")
if __name__ == "__main__":
main()
以上就是Python實(shí)現(xiàn)Word文檔與JSON格式雙向轉(zhuǎn)換的完整教程與代碼解析的詳細(xì)內(nèi)容,更多關(guān)于Python Word與JSON互轉(zhuǎn)的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
解決python3 Pycharm上連接數(shù)據(jù)庫時(shí)報(bào)錯(cuò)的問題
今天小編就為大家分享一篇解決python3 Pycharm上連接數(shù)據(jù)庫時(shí)報(bào)錯(cuò)的問題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2018-12-12
使用Python實(shí)現(xiàn)XLS和XLSX之間的相互轉(zhuǎn)換
在日常工作中,我們經(jīng)常需要處理和轉(zhuǎn)換不同格式的Excel文件,以適應(yīng)不同的需求和軟件兼容性,Excel文件的兩種常見格式是XLS(Excel 97-2003)和XLSX(Excel 2007及以上版本),本文將詳細(xì)介紹如何使用Python在XLS和XLSX格式之間進(jìn)行轉(zhuǎn)換,需要的朋友可以參考下2024-09-09
Python Django實(shí)現(xiàn)個(gè)人博客系統(tǒng)的搭建
個(gè)人博客是一個(gè)非常好的平臺(tái),可以讓人們分享自己的知識(shí)和經(jīng)驗(yàn),也可以讓人們交流和互動(dòng)。在這篇文章中,我們將介紹如何使用Python Django框架來開發(fā)一個(gè)個(gè)人博客系統(tǒng),希望對(duì)大家有所幫助2023-04-04
python從Oracle讀取數(shù)據(jù)生成圖表
這篇文章主要介紹了python如何從Oracle讀取數(shù)據(jù)生成圖表,幫助大家更好的利用python處理數(shù)據(jù),感興趣的朋友可以了解下2020-10-10
Python數(shù)據(jù)可視化之從繪制精美雷達(dá)圖的新手指南
這篇文章主要為大家詳細(xì)介紹了Pytho如何繪制精美雷達(dá)圖從而實(shí)現(xiàn)數(shù)據(jù)可視化的相關(guān)知識(shí),文中的示例代碼講解詳細(xì),感興趣的小伙伴可以了解下2025-11-11
python之模擬鼠標(biāo)鍵盤動(dòng)作具體實(shí)現(xiàn)
這篇文章主要介紹了python之模擬鼠標(biāo)鍵盤動(dòng)作具體實(shí)現(xiàn),有需要的朋友可以參考一下2013-12-12
用pycharm開發(fā)django項(xiàng)目示例代碼
這篇文章主要介紹了用pycharm開發(fā)django項(xiàng)目示例代碼,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2019-06-06

