Python實(shí)現(xiàn)增強(qiáng)版Docx與JSON雙向轉(zhuǎn)換的完整指南與代碼解析

更新時(shí)間：2025年12月19日 08:19:37 作者：東方佑

在日常辦公和軟件開發(fā)中,我們經(jīng)常需要處理文檔格式轉(zhuǎn)換的需求,本文將詳細(xì)介紹一個(gè)增強(qiáng)版的Python工具,它可以實(shí)現(xiàn)Docx與JSON之間的高質(zhì)量雙向轉(zhuǎn)換,有需要的小伙伴可以了解下

引言

在日常辦公和軟件開發(fā)中，我們經(jīng)常需要處理文檔格式轉(zhuǎn)換的需求。特別是Word文檔（Docx）與JSON數(shù)據(jù)之間的相互轉(zhuǎn)換，在自動(dòng)化報(bào)告生成、內(nèi)容管理系統(tǒng)和數(shù)據(jù)遷移等場(chǎng)景中尤為重要。本文將詳細(xì)介紹一個(gè)增強(qiáng)版的Python工具，它可以實(shí)現(xiàn)Docx與JSON之間的高質(zhì)量雙向轉(zhuǎn)換，支持樣式、列表、表格、圖片等復(fù)雜元素的完整保留。

與傳統(tǒng)的簡(jiǎn)單文本提取不同，本工具致力于保持文檔的完整結(jié)構(gòu)和格式樣式，包括段落格式、字體樣式、表格布局甚至復(fù)選框等表單控件。這種轉(zhuǎn)換能力對(duì)于需要保持文檔專業(yè)外觀的企業(yè)環(huán)境至關(guān)重要。

核心功能概述

這個(gè)增強(qiáng)版轉(zhuǎn)換器提供了以下核心功能：

雙向轉(zhuǎn)換支持：既可以將Docx文檔轉(zhuǎn)換為結(jié)構(gòu)化的JSON數(shù)據(jù)，也可以將JSON數(shù)據(jù)還原為格式完整的Docx文檔
樣式完整性保持：支持段落樣式、字符樣式、表格樣式等的提取和還原
復(fù)雜元素處理：能夠處理列表、表格、圖片、復(fù)選框等復(fù)雜文檔元素
批量處理能力：支持文件夾批量轉(zhuǎn)換，提高工作效率
文檔元數(shù)據(jù)保留：保留文檔的章節(jié)信息、頁面設(shè)置等元數(shù)據(jù)

與在線轉(zhuǎn)換工具相比，本方案提供了更高的數(shù)據(jù)安全性和自定義靈活性，所有處理均在本地完成，無需上傳敏感文檔到第三方服務(wù)器。

核心代碼解析

1. 文檔到JSON的轉(zhuǎn)換機(jī)制

docx_to_json函數(shù)是整個(gè)轉(zhuǎn)換過程的核心，它通過系統(tǒng)性地解析Docx文檔的各個(gè)組成部分，構(gòu)建完整的結(jié)構(gòu)化JSON數(shù)據(jù)：

def docx_to_json(docx_path):
    document = Document(docx_path)
    doc_data = {
        "metadata": {"created_by": "docx_converter", "version": "2.0"},
        "styles": {"paragraph_styles": [], "character_styles": [], "table_styles": []},
        "paragraphs": [],
        "tables": [],
        "images": [],
        "sections": []
    }
    # 提取各個(gè)元素
    extract_styles(document, doc_data)
    extract_paragraphs(document, doc_data)
    extract_tables(document, doc_data)
    extract_images(document, doc_data)
    extract_sections(document, doc_data)
    return doc_data

這種模塊化的設(shè)計(jì)使得代碼易于維護(hù)和擴(kuò)展，每個(gè)提取函數(shù)負(fù)責(zé)處理特定類型的文檔元素。

2. 樣式提取技術(shù)

樣式提取是保持文檔格式的關(guān)鍵。extract_styles函數(shù)深入分析文檔中的樣式定義：

def extract_styles(document, doc_data):
    styles = document.styles
    for style in styles:
        style_info = {
            "name": style.name,
            "type": str(style.type),
            "builtin": style.builtin,
            # 其他屬性...
        }
        # 提取字體樣式
        if hasattr(style, 'font') and style.font:
            font_info = {}
            if style.font.name: font_info["name"] = style.font.name
            if style.font.size: font_info["size"] = style.font.size.pt
            # 更多字體屬性...

這種方法確保了即使是復(fù)雜的樣式信息也能被完整捕獲，為高質(zhì)量文檔還原奠定基礎(chǔ)。

3. 段落和文本處理

段落處理不僅關(guān)注文本內(nèi)容，還包括格式、列表屬性和內(nèi)嵌元素：

def extract_paragraphs(document, doc_data):
    for para_idx, paragraph in enumerate(document.paragraphs):
        para_info = {}
        # 文本內(nèi)容
        if paragraph.text.strip():
            para_info["text"] = paragraph.text
        # 段落樣式
        if paragraph.style and paragraph.style.name:
            para_info["style"] = paragraph.style.name
        # 列表檢測(cè)
        list_info = detect_list_properties(paragraph)
        if list_info:
            para_info["list_info"] = list_info
        # 處理文本運(yùn)行(runs)
        runs_list = []
        for run in paragraph.runs:
            run_info = extract_run_properties(run)
            if run_info: runs_list.append(run_info)
        if runs_list: para_info["runs"] = runs_list
        doc_data["paragraphs"].append(para_info)

這種細(xì)粒度的處理方式確保了文檔中格式變化的精確捕獲，即使是同一段落內(nèi)不同文本段的樣式差異也能妥善保留。

4. 表格提取算法

表格提取是文檔處理中的難點(diǎn)，本工具通過分層提取的方式確保表格結(jié)構(gòu)的完整性：

def extract_tables(document, doc_data):
    for table_idx, table in enumerate(document.tables):
        table_info = {"index": table_idx, "rows": []}
        # 表格樣式
        if hasattr(table, 'style') and table.style:
            table_info["style"] = table.style.name
        # 處理行和單元格
        for row_idx, row in enumerate(table.rows):
            row_info = {"index": row_idx, "cells": []}
            for cell_idx, cell in enumerate(row.cells):
                cell_info = extract_cell_content(cell, row_idx, cell_idx)
                if cell_info: row_info["cells"].append(cell_info)
            table_info["rows"].append(row_info)
        doc_data["tables"].append(table_info)

表格中的每個(gè)單元格都會(huì)進(jìn)一步解析其中的段落和運(yùn)行，確保嵌套內(nèi)容的完整性。

5. 圖片和多媒體處理

圖片處理采用Base64編碼的方式，將二進(jìn)制圖像數(shù)據(jù)轉(zhuǎn)換為文本格式存儲(chǔ)在JSON中：

def extract_images(document, doc_data):
    for rel in document.part.rels.values():
        if "image" in rel.reltype:
            image_part = rel.target_part
            image_info = {
                "content_type": image_part.content_type,
                "data": base64.b64encode(image_part.blob).decode('utf-8'),
                "filename": getattr(image_part, 'filename', 'image.png')
            }
            doc_data["images"].append(image_info)

這種方法確保了圖片數(shù)據(jù)的無損保存，在文檔還原時(shí)能夠完全恢復(fù)原始圖像質(zhì)量。

應(yīng)用場(chǎng)景與實(shí)戰(zhàn)案例

1. 自動(dòng)化報(bào)告生成

本工具在自動(dòng)化報(bào)告生成場(chǎng)景中表現(xiàn)出色，例如可以將JSON格式的業(yè)務(wù)數(shù)據(jù)自動(dòng)填充到預(yù)設(shè)的Docx模板中，生成具有一致格式的業(yè)務(wù)報(bào)告。

# 示例：將業(yè)務(wù)數(shù)據(jù)轉(zhuǎn)換為格式化的報(bào)告
business_data = {
    "title": "季度銷售報(bào)告",
    "period": "2023年Q1", 
    "metrics": ["銷售額", "增長(zhǎng)率", "市場(chǎng)份額"],
    "values": [1500000, 0.15, 0.23]
}

# 使用模板生成正式報(bào)告
json_to_docx(business_data, "report_template.docx", "季度銷售報(bào)告.docx")

2. 內(nèi)容管理系統(tǒng)集成

對(duì)于內(nèi)容管理系統(tǒng)（CMS），本工具可以實(shí)現(xiàn)內(nèi)容的結(jié)構(gòu)化存儲(chǔ)和靈活發(fā)布。編輯人員可以在Word中方便地編輯內(nèi)容，然后轉(zhuǎn)換為JSON格式存儲(chǔ)到數(shù)據(jù)庫(kù)中，發(fā)布時(shí)再轉(zhuǎn)換為HTML或PDF等多種格式。

3. 法律和合規(guī)文檔處理

在法律行業(yè)，合同和協(xié)議文檔需要嚴(yán)格的格式控制。使用本工具可以確保文檔在多次轉(zhuǎn)換后仍保持格式完整性，避免因格式錯(cuò)誤導(dǎo)致的法律效力問題。

4. 教育與科研應(yīng)用

在學(xué)術(shù)研究中，研究者可以使用此工具批量處理實(shí)驗(yàn)報(bào)告，提取結(jié)構(gòu)化數(shù)據(jù)進(jìn)行分析，或者將數(shù)據(jù)分析結(jié)果自動(dòng)填充到論文模板中。

與其他工具的對(duì)比

與市場(chǎng)上其他文檔轉(zhuǎn)換工具相比，本方案具有獨(dú)特優(yōu)勢(shì)：

特性	本工具	在線轉(zhuǎn)換工具	專業(yè)軟件
數(shù)據(jù)隱私	本地處理，完全私有	需上傳文檔到服務(wù)器	取決于部署方式
自定義程度	高，代碼可任意修改	低，功能固定	中等，依賴軟件接口
格式支持	專注Docx與JSON互轉(zhuǎn)	支持多種格式	支持多種格式
成本	免費(fèi)開源	免費(fèi)或付費(fèi)	通常需要付費(fèi)

與簡(jiǎn)單的文本提取工具相比，本工具在樣式保持方面表現(xiàn)卓越；與復(fù)雜的商業(yè)軟件相比，它具有開源透明的優(yōu)勢(shì)。

使用教程

環(huán)境準(zhǔn)備

首先安裝必要的Python依賴庫(kù)：

pip install python-docx

python-docx是處理Word文檔的核心庫(kù)，提供了豐富的API來操作Docx文件的各個(gè)方面。

基本使用示例

將Docx轉(zhuǎn)換為JSON：

from docx_converter import docx_to_json

# 轉(zhuǎn)換單個(gè)文檔
json_data = docx_to_json("我的文檔.docx")

# 保存JSON結(jié)果
import json
with open("文檔數(shù)據(jù).json", "w", encoding="utf-8") as f:
    json.dump(json_data, f, ensure_ascii=False, indent=2)

將JSON還原為Docx：

from docx_converter import json_to_docx

# 讀取JSON數(shù)據(jù)
with open("文檔數(shù)據(jù).json", "r", encoding="utf-8") as f:
    json_data = json.load(f)

# 還原為Word文檔
json_to_docx(json_data, "還原的文檔.docx")

批量轉(zhuǎn)換：

import os

def batch_convert(folder_path):
    for filename in os.listdir(folder_path):
        if filename.endswith(".docx"):
            docx_path = os.path.join(folder_path, filename)
            json_data = docx_to_json(docx_path)
            json_path = os.path.join(folder_path, filename.replace(".docx", ".json"))
            with open(json_path, "w", encoding="utf-8") as f:
                json.dump(json_data, f, ensure_ascii=False, indent=2)

高級(jí)功能使用

復(fù)選框檢測(cè)：

from docx_converter import find_all_checkboxes

# 檢測(cè)文檔中的復(fù)選框
results = find_all_checkboxes("表單文檔.docx")
print(f"找到 {len(results['checked'])} 個(gè)已選中復(fù)選框")
print(f"找到 {len(results['unchecked'])} 個(gè)未選中復(fù)選框")

樣式自定義：

# 自定義轉(zhuǎn)換樣式映射
def custom_style_mapper(style_info):
    # 修改或過濾特定樣式
    if style_info.get('name') == 'Heading1':
        style_info['font_size'] = 16  # 修改標(biāo)題1的字號(hào)
    return style_info

注意事項(xiàng)與最佳實(shí)踐

1. 文件路徑處理

在處理文件路徑時(shí)，始終使用絕對(duì)路徑并添加適當(dāng)?shù)腻e(cuò)誤處理：

import os

def safe_convert(docx_path):
    if not os.path.exists(docx_path):
        raise FileNotFoundError(f"文檔不存在: {docx_path}")
    
    if not docx_path.endswith('.docx'):
        raise ValueError("僅支持.docx格式文件")
    
    try:
        return docx_to_json(docx_path)
    except Exception as e:
        print(f"轉(zhuǎn)換失敗: {str(e)}")
        return None

2. 大文件處理優(yōu)化

處理大型文檔時(shí)，考慮內(nèi)存使用優(yōu)化：

def process_large_document(docx_path, chunk_size=10):
    """分塊處理大型文檔"""
    document = Document(docx_path)
    total_paragraphs = len(document.paragraphs)
    
    for i in range(0, total_paragraphs, chunk_size):
        chunk_data = process_paragraph_chunk(document, i, i+chunk_size)
        save_chunk(chunk_data, i)

3. 樣式一致性維護(hù)

為了確保樣式一致性，建議使用模板文檔：

def create_from_template(json_data, template_path, output_path):
    """基于模板創(chuàng)建文檔"""
    template_data = docx_to_json(template_path)
    # 將數(shù)據(jù)應(yīng)用到模板樣式
    merged_data = merge_data_with_template(json_data, template_data)
    json_to_docx(merged_data, output_path)

擴(kuò)展與自定義

本工具的設(shè)計(jì)允許輕松擴(kuò)展以支持更多功能：

1. 添加新元素支持

def extract_custom_elements(document, doc_data):
    """提取自定義元素"""
    # 添加對(duì)圖表、數(shù)學(xué)公式等特殊元素的提取邏輯
    pass

def create_custom_elements(document, element_data):
    """創(chuàng)建自定義元素"""
    pass

2. 集成其他格式支持

結(jié)合pandoc等工具，可以擴(kuò)展更多格式支持：

def convert_via_markdown(json_data):
    """通過Markdown中間格式轉(zhuǎn)換"""
    # JSON -> Markdown -> 目標(biāo)格式
    markdown_content = json_to_markdown(json_data)
    # 使用pandoc轉(zhuǎn)換為其他格式
    return markdown_content

3. 云服務(wù)集成

將工具部署為Web服務(wù)，提供API接口：

from flask import Flask, request, send_file

app = Flask(__name__)

@app.route('/convert/docx-to-json', methods=['POST'])
def convert_docx_to_json_api():
    file = request.files['file']
    json_data = docx_to_json(file)
    return json_data

這種架構(gòu)允許與其他系統(tǒng)輕松集成。

總結(jié)

本文詳細(xì)介紹了一個(gè)功能豐富的Docx與JSON雙向轉(zhuǎn)換工具的實(shí)現(xiàn)原理和應(yīng)用方法。通過這個(gè)工具，用戶可以實(shí)現(xiàn)文檔內(nèi)容的結(jié)構(gòu)化提取和精確還原，滿足各種文檔自動(dòng)化處理需求。

與現(xiàn)有解決方案相比，本工具的主要優(yōu)勢(shì)在于：

格式保持完整性：支持樣式、表格、圖片等復(fù)雜元素的精確轉(zhuǎn)換
靈活的可擴(kuò)展性：模塊化設(shè)計(jì)便于添加新功能
開源免費(fèi)：基于MIT許可證，可自由使用和修改
本地化處理：確保敏感數(shù)據(jù)不會(huì)離開本地環(huán)境

隨著數(shù)字化進(jìn)程的加速，文檔自動(dòng)化處理的需求將不斷增長(zhǎng)。本工具為開發(fā)者提供了一個(gè)強(qiáng)大的基礎(chǔ)，可以在此基礎(chǔ)上構(gòu)建更復(fù)雜的文檔處理流程，如與LangChain等AI工具集成實(shí)現(xiàn)智能文檔處理。

未來，我們將繼續(xù)優(yōu)化工具性能，添加對(duì)更多元素的支持，并探索與人工智能技術(shù)的深度融合，使文檔處理更加智能化、自動(dòng)化。

資源推薦

完整代碼：本文涉及的完整代碼已在GitHub上開源
示例文檔：提供多種測(cè)試文檔，演示不同場(chǎng)景下的轉(zhuǎn)換效果
擴(kuò)展模塊：社區(qū)貢獻(xiàn)的擴(kuò)展功能，如PDF支持、OCR集成等

希望本文能幫助您更好地理解和應(yīng)用文檔轉(zhuǎn)換技術(shù)，提升工作效率和自動(dòng)化水平。

完整代碼

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
Docx to JSON and JSON to Docx converter
可以將docx文件的所有樣式抽取成為json對(duì)象，也可以將json對(duì)象還原為docx文件
增強(qiáng)版：支持更多樣式、列表、圖片、表格樣式等
"""

import json
import base64
import os
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH, WD_BREAK
from docx.enum.style import WD_STYLE_TYPE
from docx.enum.table import WD_TABLE_ALIGNMENT
from docx.shared import RGBColor, Pt, Inches
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
import io


def docx_to_json(docx_path):
    """
    將docx文件轉(zhuǎn)換為JSON格式
    增強(qiáng)版：支持更多樣式屬性、列表、圖片等
    """
    document = Document(docx_path)

    # 存儲(chǔ)所有內(nèi)容的字典
    doc_data = {
        "metadata": {
            "created_by": "docx_converter",
            "version": "2.0"
        },
        "styles": {
            "paragraph_styles": [],
            "character_styles": [],
            "table_styles": []
        },
        "paragraphs": [],
        "tables": [],
        "images": [],
        "sections": []
    }

    # 1. 提取所有樣式
    extract_styles(document, doc_data)

    # 2. 提取段落內(nèi)容
    extract_paragraphs(document, doc_data)

    # 3. 提取表格內(nèi)容
    extract_tables(document, doc_data)

    # 4. 提取圖片
    extract_images(document, doc_data)

    # 5. 提取章節(jié)信息
    extract_sections(document, doc_data)

    return doc_data


def extract_styles(document, doc_data):
    """提取文檔中的所有樣式"""
    styles = document.styles
    for style in styles:
        style_info = {
            "name": style.name,
            "type": str(style.type),
            "builtin": style.builtin,
            "hidden": style.hidden,
            "priority": getattr(style, 'priority', None)
        }

        # 字體樣式 - 只有當(dāng)style有font屬性時(shí)才提取
        if hasattr(style, 'font') and style.font:
            font_info = {}
            if style.font.name:
                font_info["name"] = style.font.name
            if style.font.size:
                font_info["size"] = style.font.size.pt
            if style.font.bold is not None:
                font_info["bold"] = style.font.bold
            if style.font.italic is not None:
                font_info["italic"] = style.font.italic
            if style.font.underline is not None:
                font_info["underline"] = str(style.font.underline)
            if style.font.color.rgb:
                font_info["color"] = str(style.font.color.rgb)
            if style.font.all_caps is not None:
                font_info["all_caps"] = style.font.all_caps
            if style.font.small_caps is not None:
                font_info["small_caps"] = style.font.small_caps
            if style.font.superscript is not None:
                font_info["superscript"] = style.font.superscript
            if style.font.subscript is not None:
                font_info["subscript"] = style.font.subscript
            if style.font.strike is not None:
                font_info["strike"] = style.font.strike

            if font_info:
                style_info["font"] = font_info

        # 段落格式 - 僅對(duì)段落樣式提取
        if style.type == WD_STYLE_TYPE.PARAGRAPH and hasattr(style, 'paragraph_format') and style.paragraph_format:
            pf_info = extract_paragraph_format(style.paragraph_format)
            if pf_info:
                style_info["paragraph_format"] = pf_info

        # 根據(jù)樣式類型分類存儲(chǔ)
        if style.type == WD_STYLE_TYPE.PARAGRAPH:
            doc_data["styles"]["paragraph_styles"].append(style_info)
        elif style.type == WD_STYLE_TYPE.CHARACTER:
            doc_data["styles"]["character_styles"].append(style_info)
        elif style.type == WD_STYLE_TYPE.TABLE:
            doc_data["styles"]["table_styles"].append(style_info)


def extract_paragraph_format(paragraph_format):
    """提取段落格式信息"""
    pf_info = {}

    if paragraph_format.alignment is not None:
        pf_info["alignment"] = str(paragraph_format.alignment)
    if paragraph_format.left_indent:
        pf_info["left_indent"] = paragraph_format.left_indent.pt
    if paragraph_format.right_indent:
        pf_info["right_indent"] = paragraph_format.right_indent.pt
    if paragraph_format.first_line_indent:
        pf_info["first_line_indent"] = paragraph_format.first_line_indent.pt
    if paragraph_format.space_before:
        pf_info["space_before"] = paragraph_format.space_before.pt
    if paragraph_format.space_after:
        pf_info["space_after"] = paragraph_format.space_after.pt
    if paragraph_format.line_spacing and paragraph_format.line_spacing <= 100:
        pf_info["line_spacing"] = paragraph_format.line_spacing
    if paragraph_format.keep_with_next is not None:
        pf_info["keep_with_next"] = paragraph_format.keep_with_next
    if paragraph_format.keep_together is not None:
        pf_info["keep_together"] = paragraph_format.keep_together
    if paragraph_format.page_break_before is not None:
        pf_info["page_break_before"] = paragraph_format.page_break_before
    if paragraph_format.widow_control is not None:
        pf_info["widow_control"] = paragraph_format.widow_control
    if paragraph_format.line_spacing_rule is not None:
        pf_info["line_spacing_rule"] = str(paragraph_format.line_spacing_rule)

    # 提取制表符信息
    try:
        if paragraph_format.tab_stops:
            tab_stops_info = []
            for tab_stop in paragraph_format.tab_stops:
                tab_info = {
                    "position": tab_stop.position.pt if tab_stop.position else None,
                    "alignment": str(tab_stop.alignment) if tab_stop.alignment else None,
                    "leader": str(tab_stop.leader) if tab_stop.leader else None
                }
                tab_stops_info.append(tab_info)
            
            if tab_stops_info:
                pf_info["tab_stops"] = tab_stops_info
    except:
        pass

    return pf_info if pf_info else None


def extract_paragraphs(document, doc_data):
    """提取所有段落內(nèi)容"""
    for para_idx, paragraph in enumerate(document.paragraphs):
        para_info = {}

        # 基本文本和樣式
        if paragraph.text.strip():
            para_info["text"] = paragraph.text
        if paragraph.style and paragraph.style.name:
            para_info["style"] = paragraph.style.name

        # 段落格式
        if paragraph.paragraph_format:
            pf_info = extract_paragraph_format(paragraph.paragraph_format)
            if pf_info:
                para_info["paragraph_format"] = pf_info

        # 檢測(cè)列表屬性
        list_info = detect_list_properties(paragraph)
        if list_info:
            para_info["list_info"] = list_info

        # 處理runs
        runs_list = []
        for run in paragraph.runs:
            run_info = extract_run_properties(run)
            if run_info:
                runs_list.append(run_info)

        # 處理復(fù)選框
        checkbox_info = extract_checkboxes(paragraph)
        if checkbox_info:
            runs_list.append(checkbox_info)

        if runs_list:
            para_info["runs"] = runs_list

        # 只有包含內(nèi)容的段落才添加
        if para_info:
            doc_data["paragraphs"].append(para_info)


def detect_list_properties(paragraph):
    """檢測(cè)段落中的列表屬性"""
    list_info = {}

    try:
        pf = paragraph.paragraph_format

        # 檢測(cè)項(xiàng)目符號(hào)列表
        if hasattr(pf, 'bullet_char') and pf.bullet_char is not None:
            list_info['type'] = 'bullet'
            list_info['bullet_char'] = pf.bullet_char
            list_info['level'] = getattr(pf, 'level', 0)

        # 檢測(cè)編號(hào)列表
        elif hasattr(pf, 'number_format') and pf.number_format is not None:
            list_info['type'] = 'number'
            list_info['number_format'] = str(pf.number_format)
            list_info['level'] = getattr(pf, 'level', 0)
            list_info['start_value'] = getattr(pf, 'start_value', 1)

        # 通過樣式名檢測(cè)列表
        elif paragraph.style and paragraph.style.name:
            style_name = paragraph.style.name.lower()
            if 'list' in style_name or 'bullet' in style_name:
                list_info['type'] = 'style_based'
                list_info['style_name'] = paragraph.style.name

    except Exception as e:
        # 如果檢測(cè)失敗，忽略列表屬性
        pass

    return list_info if list_info else None


def extract_run_properties(run):
    """提取run的樣式屬性"""
    run_info = {}

    if run.text.strip():
        run_info["text"] = run.text

    # 字體屬性
    font_props = [
        ("bold", run.bold),
        ("italic", run.italic),
        ("underline", run.underline),
        ("strike", run.font.strike),
        ("superscript", run.font.superscript),
        ("subscript", run.font.subscript),
        ("all_caps", run.font.all_caps),
        ("small_caps", run.font.small_caps)
    ]

    for prop_name, prop_value in font_props:
        if prop_value is not None:
            run_info[prop_name] = prop_value

    # 字體名稱和大小
    if run.font.name:
        run_info["font_name"] = run.font.name
    if run.font.size:
        run_info["font_size"] = run.font.size.pt

    # 顏色
    if run.font.color.rgb:
        run_info["color"] = str(run.font.color.rgb)

    # 高亮顏色
    try:
        if run.font.highlight_color and str(run.font.highlight_color) != 'none':
            run_info["highlight_color"] = str(run.font.highlight_color)
    except:
        pass

    # 下劃線顏色
    try:
        if run.font.underline_color and run.font.underline_color.rgb:
            run_info["underline_color"] = str(run.font.underline_color.rgb)
    except:
        pass

    # 字符間距
    try:
        if run.font.spacing:
            run_info["character_spacing"] = run.font.spacing
    except:
        pass

    # 字體背景色（字符底紋）
    try:
        rPr = run._element.rPr
        if rPr is not None:
            shd_elements = rPr.xpath('.//w:shd')
            if shd_elements:
                shd_element = shd_elements[0]
                fill_color = shd_element.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}fill')
                if fill_color:
                    run_info["background_color"] = fill_color
    except:
        pass

    return run_info if run_info else None


def extract_checkboxes(paragraph):
    """提取復(fù)選框信息"""
    try:
        p_element = paragraph._element
        xml_str = p_element.xml

        # 檢測(cè)傳統(tǒng)復(fù)選框
        if 'w:checkBox' in xml_str:
            if 'w:checked="1"' in xml_str or 'w:checked w:val="true"' in xml_str:
                return {"text": "[?]", "is_checkbox": True, "checked": True}
            else:
                return {"text": "[□]", "is_checkbox": True, "checked": False}

        # 檢測(cè)新式復(fù)選框
        checkboxes = p_element.xpath('.//*[local-name()="checkbox"]')
        for checkbox in checkboxes:
            checked_elements = checkbox.xpath('.//*[local-name()="checked"]')
            if checked_elements:
                checked_element = checked_elements[0]
                checked_value = "false"
                for attr_name in ['{http://schemas.microsoft.com/office/word/2010/wordml}val',
                                  qn('w14:val'), 'w14:val']:
                    val = checked_element.get(attr_name)
                    if val is not None:
                        checked_value = val
                        break

                is_checked = checked_value.lower() == "true" or checked_value == "1"
                return {
                    "text": "[?]" if is_checked else "[□]",
                    "is_checkbox": True,
                    "checked": is_checked
                }

    except Exception as e:
        pass

    return None


def extract_tables(document, doc_data):
    """提取表格內(nèi)容和樣式"""
    for table_idx, table in enumerate(document.tables):
        table_info = {
            "index": table_idx,
            "rows": []
        }

        # 表格樣式
        if hasattr(table, 'style') and table.style:
            table_info["style"] = table.style.name

        # 表格對(duì)齊方式
        if hasattr(table, 'alignment'):
            table_info["alignment"] = str(table.alignment)

        # 處理行和列
        for row_idx, row in enumerate(table.rows):
            row_info = {
                "index": row_idx,
                "cells": [],
                "height": getattr(row, 'height', None)
            }

            for cell_idx, cell in enumerate(row.cells):
                cell_info = extract_cell_content(cell, row_idx, cell_idx)
                if cell_info:
                    row_info["cells"].append(cell_info)

            if row_info["cells"]:
                table_info["rows"].append(row_info)

        if table_info["rows"]:
            doc_data["tables"].append(table_info)


def extract_cell_content(cell, row_idx, cell_idx):
    """提取單元格內(nèi)容"""
    cell_info = {
        "row": row_idx,
        "column": cell_idx,
        "text": cell.text
    }

    # 單元格樣式
    try:
        # 底紋
        if hasattr(cell, 'shading'):
            shading = cell.shading
            if hasattr(shading, 'background_pattern_color'):
                cell_info["shading"] = str(shading.background_pattern_color)
        
        # 垂直對(duì)齊
        if hasattr(cell, 'vertical_alignment') and cell.vertical_alignment is not None:
            cell_info["vertical_alignment"] = str(cell.vertical_alignment)
            
        # 邊距
        if hasattr(cell, 'top_margin') and cell.top_margin is not None:
            cell_info["top_margin"] = cell.top_margin.pt
        if hasattr(cell, 'bottom_margin') and cell.bottom_margin is not None:
            cell_info["bottom_margin"] = cell.bottom_margin.pt
        if hasattr(cell, 'left_margin') and cell.left_margin is not None:
            cell_info["left_margin"] = cell.left_margin.pt
        if hasattr(cell, 'right_margin') and cell.right_margin is not None:
            cell_info["right_margin"] = cell.right_margin.pt
            
        # 單元格邊框
        tc = cell._tc
        tcPr = tc.tcPr
        if tcPr is not None:
            tcBorders = tcPr.xpath('./w:tcBorders')
            if tcBorders:
                borders_info = {}
                border_elements = tcBorders[0].xpath('./*')
                for border_elem in border_elements:
                    border_tag = border_elem.tag.split('}')[1]  # 獲取標(biāo)簽名
                    border_attrs = {}
                    for attr, value in border_elem.attrib.items():
                        attr_name = attr.split('}')[1] if '}' in attr else attr
                        border_attrs[attr_name] = value
                    borders_info[border_tag] = border_attrs
                if borders_info:
                    cell_info["borders"] = borders_info
                    
    except:
        pass

    # 處理單元格中的段落
    paragraphs_list = []
    for para in cell.paragraphs:
        if para.text.strip():
            para_dict = {
                "text": para.text
            }

            if para.style and para.style.name:
                para_dict["style"] = para.style.name

            # 段落格式
            if para.paragraph_format:
                pf_info = extract_paragraph_format(para.paragraph_format)
                if pf_info:
                    para_dict["paragraph_format"] = pf_info

            # 處理runs
            runs_list = []
            for run in para.runs:
                run_info = extract_run_properties(run)
                if run_info:
                    runs_list.append(run_info)

            if runs_list:
                para_dict["runs"] = runs_list

            paragraphs_list.append(para_dict)

    if paragraphs_list:
        cell_info["paragraphs"] = paragraphs_list

    return cell_info


def extract_images(document, doc_data):
    """提取文檔中的圖片"""
    try:
        # 從文檔關(guān)系中提取圖片
        for rel in document.part.rels.values():
            if "image" in rel.reltype:
                image_part = rel.target_part
                image_info = {
                    "content_type": image_part.content_type,
                    "data": base64.b64encode(image_part.blob).decode('utf-8'),
                    "filename": getattr(image_part, 'filename', 'image.png')
                }
                doc_data["images"].append(image_info)
    except Exception as e:
        print(f"提取圖片時(shí)出錯(cuò): {e}")


def extract_sections(document, doc_data):
    """提取章節(jié)信息"""
    for section_idx, section in enumerate(document.sections):
        section_info = {
            "index": section_idx,
            "page_width": section.page_width.pt if section.page_width else None,
            "page_height": section.page_height.pt if section.page_height else None,
            "left_margin": section.left_margin.pt if section.left_margin else None,
            "right_margin": section.right_margin.pt if section.right_margin else None,
            "top_margin": section.top_margin.pt if section.top_margin else None,
            "bottom_margin": section.bottom_margin.pt if section.bottom_margin else None
        }
        doc_data["sections"].append(section_info)


def json_to_docx(json_data, output_path):
    """
    將JSON數(shù)據(jù)轉(zhuǎn)換為docx文件
    增強(qiáng)版：支持更多樣式和元素
    """
    document = Document()

    # 1. 設(shè)置文檔屬性
    setup_document_properties(document, json_data)

    # 2. 添加段落
    create_paragraphs(document, json_data)

    # 3. 添加表格
    create_tables(document, json_data)

    # 4. 添加圖片
    create_images(document, json_data)

    # 保存文檔
    document.save(output_path)


def setup_document_properties(document, json_data):
    """設(shè)置文檔屬性"""
    # 設(shè)置頁面布局
    if json_data.get("sections"):
        section = document.sections[0]
        first_section = json_data["sections"][0]

        if first_section.get("page_width"):
            section.page_width = Pt(first_section["page_width"])
        if first_section.get("page_height"):
            section.page_height = Pt(first_section["page_height"])
        if first_section.get("left_margin"):
            section.left_margin = Pt(first_section["left_margin"])
        if first_section.get("right_margin"):
            section.right_margin = Pt(first_section["right_margin"])
        if first_section.get("top_margin"):
            section.top_margin = Pt(first_section["top_margin"])
        if first_section.get("bottom_margin"):
            section.bottom_margin = Pt(first_section["bottom_margin"])


def create_paragraphs(document, json_data):
    """創(chuàng)建段落"""
    for para_data in json_data.get("paragraphs", []):
        # 創(chuàng)建段落
        style_name = para_data.get("style", "Normal")
        try:
            paragraph = document.add_paragraph(style=style_name)
        except:
            paragraph = document.add_paragraph(style="Normal")

        # 設(shè)置段落格式
        apply_paragraph_formatting(paragraph, para_data)

        # 處理列表
        apply_list_formatting(paragraph, para_data)

        # 清空默認(rèn)文本
        paragraph.clear()

        # 添加runs
        create_runs(paragraph, para_data)


def apply_paragraph_formatting(paragraph, para_data):
    """應(yīng)用段落格式"""
    paragraph_format_data = para_data.get("paragraph_format", {})
    if paragraph_format_data:
        pf = paragraph.paragraph_format

        # 對(duì)齊方式
        alignment_str = paragraph_format_data.get("alignment")
        if alignment_str:
            if "LEFT" in alignment_str:
                paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
            elif "CENTER" in alignment_str:
                paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
            elif "RIGHT" in alignment_str:
                paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
            elif "JUSTIFY" in alignment_str:
                paragraph.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
            elif "DISTRIBUTE" in alignment_str:
                paragraph.alignment = WD_ALIGN_PARAGRAPH.DISTRIBUTE

        # 縮進(jìn)和間距
        indent_props = [
            ("left_indent", "left_indent"),
            ("right_indent", "right_indent"),
            ("first_line_indent", "first_line_indent"),
            ("space_before", "space_before"),
            ("space_after", "space_after")
        ]

        for json_prop, pf_prop in indent_props:
            if json_prop in paragraph_format_data:
                setattr(pf, pf_prop, Pt(paragraph_format_data[json_prop]))

        if "line_spacing" in paragraph_format_data and paragraph_format_data["line_spacing"] <= 100:
            pf.line_spacing = paragraph_format_data["line_spacing"]

        # 應(yīng)用制表符設(shè)置
        if "tab_stops" in paragraph_format_data:
            try:
                tab_stops = pf.tab_stops
                # 清除現(xiàn)有的制表符
                for _ in range(len(tab_stops)):
                    tab_stops.pop()
                
                # 添加新的制表符
                for tab_info in paragraph_format_data["tab_stops"]:
                    position = Pt(tab_info["position"]) if tab_info["position"] else None
                    if position:
                        alignment = None
                        leader = None
                        
                        # 解析對(duì)齊方式
                        if tab_info.get("alignment"):
                            from docx.enum.text import WD_TAB_ALIGNMENT
                            if "LEFT" in tab_info["alignment"]:
                                alignment = WD_TAB_ALIGNMENT.LEFT
                            elif "RIGHT" in tab_info["alignment"]:
                                alignment = WD_TAB_ALIGNMENT.RIGHT
                            elif "CENTER" in tab_info["alignment"]:
                                alignment = WD_TAB_ALIGNMENT.CENTER
                                
                        # 解析前導(dǎo)字符
                        if tab_info.get("leader"):
                            from docx.enum.text import WD_TAB_LEADER
                            if "DOTS" in tab_info["leader"]:
                                leader = WD_TAB_LEADER.DOTS
                            elif "HYPHENS" in tab_info["leader"]:
                                leader = WD_TAB_LEADER.HYPHENS
                            elif "UNDERSCORE" in tab_info["leader"]:
                                leader = WD_TAB_LEADER.UNDERSCORE
                                
                        tab_stops.add_tab_stop(position, alignment, leader)
            except:
                pass


def apply_list_formatting(paragraph, para_data):
    """應(yīng)用列表格式"""
    list_info = para_data.get("list_info")
    if list_info:
        try:
            pf = paragraph.paragraph_format

            if list_info.get("type") == "bullet" and list_info.get("level") is not None:
                # 設(shè)置項(xiàng)目符號(hào)列表
                pf.left_indent = Pt(list_info.get("level", 0) * 36)

            elif list_info.get("type") == "number" and list_info.get("level") is not None:
                # 設(shè)置編號(hào)列表
                pf.left_indent = Pt(list_info.get("level", 0) * 36)

        except Exception as e:
            print(f"應(yīng)用列表格式時(shí)出錯(cuò): {e}")


def create_runs(paragraph, para_data):
    """創(chuàng)建runs"""
    runs_data = para_data.get("runs", [])

    if runs_data:
        for run_data in runs_data:
            text = run_data.get("text", "")

            # 檢查是否有重要內(nèi)容
            has_content = any([
                text,
                run_data.get("bold") is not None,
                run_data.get("italic") is not None,
                run_data.get("underline") is not None,
                run_data.get("font_name"),
                run_data.get("font_size"),
                run_data.get("color"),
                run_data.get("highlight_color")
            ])

            if has_content:
                run = paragraph.add_run(text)
                apply_run_formatting(run, run_data)
    else:
        # 如果沒有runs數(shù)據(jù)，直接添加段落文本
        text = para_data.get("text", "")
        if text:
            run = paragraph.add_run(text)
            run.font.size = Pt(12)


def apply_run_formatting(run, run_data):
    """應(yīng)用run格式"""
    # 基本格式
    format_props = [
        ("bold", "bold"),
        ("italic", "italic"),
        ("underline", "underline"),
        ("strike", "strike"),
        ("superscript", "superscript"),
        ("subscript", "subscript"),
        ("all_caps", "all_caps"),
        ("small_caps", "small_caps")
    ]

    for json_prop, run_prop in format_props:
        if json_prop in run_data:
            setattr(run, run_prop, run_data[json_prop])

    # 字體大小
    if "font_size" in run_data:
        run.font.size = Pt(run_data["font_size"])
    else:
        run.font.size = Pt(12)

    # 字體名稱
    if "font_name" in run_data:
        run.font.name = run_data["font_name"]
        try:
            run._element.rPr.rFonts.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}eastAsia', run_data["font_name"])
        except:
            pass

    # 字體顏色
    if "color" in run_data and run_data["color"] != "None":
        try:
            if run_data["color"].startswith("RGB"):
                color_str = run_data["color"][4:-1]  # 去除"RGB("和")"
                r, g, b = map(int, color_str.split(","))
                run.font.color.rgb = RGBColor(r, g, b)
            else:
                run.font.color.rgb = RGBColor.from_string(run_data["color"])
        except:
            pass

    # 字符間距
    if "character_spacing" in run_data:
        try:
            run.font.spacing = run_data["character_spacing"]
        except:
            pass

    # 字體背景色（字符底紋）
    if "background_color" in run_data:
        try:
            from docx.oxml import OxmlElement
            
            # 創(chuàng)建或獲取rPr元素
            rPr = run._element.get_or_add_rPr()
            
            # 創(chuàng)建shd元素
            shd = OxmlElement('w:shd')
            shd.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val', 'clear')
            shd.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}color', 'auto')
            shd.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}fill', run_data["background_color"])
            
            # 添加到rPr
            rPr.append(shd)
        except:
            pass


def create_tables(document, json_data):
    """創(chuàng)建表格"""
    for table_data in json_data.get("tables", []):
        if not table_data.get("rows"):
            continue

        # 確定表格大小
        num_rows = len(table_data["rows"])
        num_cols = max(len(row.get("cells", [])) for row in table_data["rows"]) if table_data["rows"] else 1

        if num_rows > 0 and num_cols > 0:
            table = document.add_table(rows=num_rows, cols=num_cols)

            # 應(yīng)用表格樣式
            if "style" in table_data:
                try:
                    table.style = table_data["style"]
                except:
                    pass

            # 填充表格內(nèi)容
            for i, row_data in enumerate(table_data["rows"]):
                if i >= num_rows:
                    break

                for j, cell_data in enumerate(row_data.get("cells", [])):
                    if j >= num_cols:
                        break

                    cell = table.cell(i, j)
                    populate_cell_content(cell, cell_data)


def populate_cell_content(cell, cell_data):
    """填充單元格內(nèi)容"""
    # 清除默認(rèn)內(nèi)容
    for paragraph in cell.paragraphs:
        p = paragraph._element
        p.getparent().remove(p)

    # 添加段落內(nèi)容
    if "paragraphs" in cell_data:
        for para_data in cell_data["paragraphs"]:
            para = cell.add_paragraph()

            # 設(shè)置段落樣式
            if "style" in para_data:
                try:
                    para.style = para_data["style"]
                except:
                    pass

            # 添加runs
            if "runs" in para_data:
                for run_data in para_data["runs"]:
                    text = run_data.get("text", "")
                    run = para.add_run(text)
                    apply_run_formatting(run, run_data)
            else:
                # 直接添加文本
                text = para_data.get("text", "")
                if text:
                    run = para.add_run(text)
                    run.font.size = Pt(12)
    else:
        # 直接添加文本
        text = cell_data.get("text", "")
        if text:
            para = cell.add_paragraph()
            run = para.add_run(text)
            run.font.size = Pt(12)
            
    # 應(yīng)用單元格樣式
    try:
        # 垂直對(duì)齊
        if "vertical_alignment" in cell_data:
            from docx.enum.table import WD_ALIGN_VERTICAL
            alignment_str = cell_data["vertical_alignment"]
            if "TOP" in alignment_str:
                cell.vertical_alignment = WD_ALIGN_VERTICAL.TOP
            elif "CENTER" in alignment_str:
                cell.vertical_alignment = WD_ALIGN_VERTICAL.CENTER
            elif "BOTTOM" in alignment_str:
                cell.vertical_alignment = WD_ALIGN_VERTICAL.BOTTOM
                
        # 邊距
        if "top_margin" in cell_data:
            cell.top_margin = Pt(cell_data["top_margin"])
        if "bottom_margin" in cell_data:
            cell.bottom_margin = Pt(cell_data["bottom_margin"])
        if "left_margin" in cell_data:
            cell.left_margin = Pt(cell_data["left_margin"])
        if "right_margin" in cell_data:
            cell.right_margin = Pt(cell_data["right_margin"])
            
        # 單元格邊框
        if "borders" in cell_data:
            set_cell_border(cell, cell_data["borders"])
            
    except Exception as e:
        print(f"應(yīng)用單元格樣式時(shí)出錯(cuò): {e}")


def create_images(document, json_data):
    """創(chuàng)建圖片"""
    for image_data in json_data.get("images", []):
        try:
            image_bytes = base64.b64decode(image_data["data"])
            image_io = io.BytesIO(image_bytes)

            # 添加圖片到文檔
            paragraph = document.add_paragraph()
            run = paragraph.add_run()
            run.add_picture(image_io, width=Inches(2.0))

        except Exception as e:
            print(f"添加圖片時(shí)出錯(cuò): {e}")


def find_all_checkboxes(docx_path):
    """查找文檔中所有復(fù)選框（增強(qiáng)版）"""
    doc = Document(docx_path)

    results = {
        'unchecked': [],
        'checked': [],
        'locations': [],
        'form_controls': []
    }

    print("=== 開始搜索復(fù)選框 ===")

    # 搜索段落中的復(fù)選框
    for para_idx, paragraph in enumerate(doc.paragraphs):
        find_checkboxes_in_paragraph(paragraph, f"段落{para_idx + 1}", results)

    # 搜索表格中的復(fù)選框
    for table_idx, table in enumerate(doc.tables):
        for row_idx, row in enumerate(table.rows):
            for cell_idx, cell in enumerate(row.cells):
                for para_idx, paragraph in enumerate(cell.paragraphs):
                    location = f"表格{table_idx + 1}行{row_idx + 1}列{cell_idx + 1}段落{para_idx + 1}"
                    find_checkboxes_in_paragraph(paragraph, location, results)

    # 搜索頁眉頁腳
    for section_idx, section in enumerate(doc.sections):
        for para_idx, paragraph in enumerate(section.header.paragraphs):
            find_checkboxes_in_paragraph(paragraph, f"節(jié){section_idx + 1}頁眉段落{para_idx + 1}", results)
        for para_idx, paragraph in enumerate(section.footer.paragraphs):
            find_checkboxes_in_paragraph(paragraph, f"節(jié){section_idx + 1}頁腳段落{para_idx + 1}", results)

    # 輸出結(jié)果
    print(f"\n=== 統(tǒng)計(jì)結(jié)果 ===")
    print(f"未選中復(fù)選框數(shù)量: {len(results['unchecked'])}")
    print(f"已選中復(fù)選框數(shù)量: {len(results['checked'])}")
    print(f"表單控件數(shù)量: {len(results['form_controls'])}")

    return results


def find_checkboxes_in_paragraph(paragraph, location, results):
    """在段落中查找復(fù)選框"""
    try:
        p_element = paragraph._element
        xml_str = p_element.xml

        # 查找傳統(tǒng)表單復(fù)選框
        if 'w:checkBox' in xml_str or 'w14:checkbox' in xml_str:
            is_checked = any(marker in xml_str for marker in
                             ['w:checked="1"', 'w:checked w:val="true"', 'w:checked w:val="1"'])

            checkbox_info = {
                'location': location,
                'text': paragraph.text,
                'checked': is_checked,
                'type': 'form_control'
            }

            if is_checked:
                results['checked'].append(checkbox_info)
            else:
                results['unchecked'].append(checkbox_info)

            results['form_controls'].append(checkbox_info)
            print(f"[表單控件] {location}: {'已選中' if is_checked else '未選中'}")

        # 查找模擬復(fù)選框（文本符號(hào)）
        checkbox_symbols = {
            'unchecked': ['□', '?', '[ ]', '()', '○'],
            'checked': ['?', '?', '[x]', '[X]', '[√]', '(x)', '(X)']
        }

        for symbol in checkbox_symbols['unchecked']:
            if symbol in paragraph.text:
                results['unchecked'].append({
                    'location': location,
                    'text': paragraph.text,
                    'symbol': symbol,
                    'type': 'text_symbol'
                })
                print(f"[文本符號(hào)] {location}: 未選中 '{symbol}'")

        for symbol in checkbox_symbols['checked']:
            if symbol in paragraph.text:
                results['checked'].append({
                    'location': location,
                    'text': paragraph.text,
                    'symbol': symbol,
                    'type': 'text_symbol'
                })
                print(f"[文本符號(hào)] {location}: 已選中 '{symbol}'")

    except Exception as e:
        print(f"檢查段落 {location} 時(shí)出錯(cuò): {e}")


def set_cell_border(cell, borders_data):
    """設(shè)置單元格邊框"""
    try:
        from docx.oxml import OxmlElement
        from docx.oxml.ns import qn
        
        tc = cell._tc
        tcPr = tc.get_or_add_tcPr()
        
        # 獲取或創(chuàng)建tcBorders元素
        tcBorders = tcPr.first_child_found_in("w:tcBorders")
        if tcBorders is None:
            tcBorders = OxmlElement('w:tcBorders')
            tcPr.append(tcBorders)
            
        # 根據(jù)數(shù)據(jù)設(shè)置邊框
        for border_name, border_attrs in borders_data.items():
            # 檢查是否存在該邊框元素，如果不存在則創(chuàng)建
            element = tcBorders.find('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}%s' % border_name)
            if element is None:
                element = OxmlElement('w:%s' % border_name)
                tcBorders.append(element)
                
            # 設(shè)置邊框?qū)傩?
            for attr_name, attr_value in border_attrs.items():
                element.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}%s' % attr_name, str(attr_value))
                
    except Exception as e:
        print(f"設(shè)置單元格邊框時(shí)出錯(cuò): {e}")


def main():
    """主函數(shù)"""
    print("Docx Converter 增強(qiáng)版 v2.0")
    print("1. Convert docx to json")
    print("2. Convert json to docx")
    print("3. Find checkboxes in docx")
    print("4. Batch convert folder")

    choice = input("請(qǐng)選擇操作 (1/2/3/4): ")

    if choice == "1":
        docx_path = input("請(qǐng)輸入docx文件路徑: ")
        if not os.path.exists(docx_path):
            print("文件不存在!")
            return

        json_data = docx_to_json(docx_path)
        json_path = docx_path.replace(".docx", "_enhanced.json")

        with open(json_path, "w", encoding="utf-8") as f:
            json.dump(json_data, f, ensure_ascii=False, indent=2)

        print(f"轉(zhuǎn)換完成! JSON文件已保存為: {json_path}")

    elif choice == "2":
        json_path = input("請(qǐng)輸入json文件路徑: ")
        if not os.path.exists(json_path):
            print("文件不存在!")
            return

        with open(json_path, "r", encoding="utf-8") as f:
            json_data = json.load(f)

        output_path = json_path.replace(".json", "_restored.docx")
        json_to_docx(json_data, output_path)
        print(f"轉(zhuǎn)換完成! Docx文件已保存為: {output_path}")

    elif choice == "3":
        docx_path = input("請(qǐng)輸入docx文件路徑: ")
        if not os.path.exists(docx_path):
            print("文件不存在!")
            return

        results = find_all_checkboxes(docx_path)
        print("\n復(fù)選框查找完成!")

    elif choice == "4":
        folder_path = input("請(qǐng)輸入文件夾路徑: ")
        if not os.path.exists(folder_path):
            print("文件夾不存在!")
            return

        # 批量轉(zhuǎn)換邏輯
        for filename in os.listdir(folder_path):
            if filename.endswith(".docx"):
                docx_path = os.path.join(folder_path, filename)
                print(f"處理文件: {filename}")

                try:
                    json_data = docx_to_json(docx_path)
                    json_path = os.path.join(folder_path, filename.replace(".docx", ".json"))

                    with open(json_path, "w", encoding="utf-8") as f:
                        json.dump(json_data, f, ensure_ascii=False, indent=2)

                    print(f"成功轉(zhuǎn)換: {filename}")
                except Exception as e:
                    print(f"轉(zhuǎn)換失敗 {filename}: {e}")

        print("批量轉(zhuǎn)換完成!")

    else:
        print("無效的選擇!")


if __name__ == "__main__":
    main()

到此這篇關(guān)于Python實(shí)現(xiàn)增強(qiáng)版Docx與JSON雙向轉(zhuǎn)換的完整指南與代碼解析的文章就介紹到這了,更多相關(guān)Python Word與JSON互轉(zhuǎn)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

Python實(shí)現(xiàn)增強(qiáng)版Docx與JSON雙向轉(zhuǎn)換的完整指南與代碼解析

目錄

引言

核心功能概述

核心代碼解析

1. 文檔到JSON的轉(zhuǎn)換機(jī)制

2. 樣式提取技術(shù)

3. 段落和文本處理

4. 表格提取算法

5. 圖片和多媒體處理

應(yīng)用場(chǎng)景與實(shí)戰(zhàn)案例

1. 自動(dòng)化報(bào)告生成

2. 內(nèi)容管理系統(tǒng)集成

3. 法律和合規(guī)文檔處理

4. 教育與科研應(yīng)用

與其他工具的對(duì)比

使用教程

環(huán)境準(zhǔn)備

基本使用示例

高級(jí)功能使用

注意事項(xiàng)與最佳實(shí)踐

1. 文件路徑處理

2. 大文件處理優(yōu)化

3. 樣式一致性維護(hù)

擴(kuò)展與自定義

1. 添加新元素支持

2. 集成其他格式支持

3. 云服務(wù)集成

總結(jié)

資源推薦

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线 免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

Python實(shí)現(xiàn)增強(qiáng)版Docx與JSON雙向轉(zhuǎn)換的完整指南與代碼解析

目錄

引言

核心功能概述

核心代碼解析

1. 文檔到JSON的轉(zhuǎn)換機(jī)制

2. 樣式提取技術(shù)

3. 段落和文本處理

4. 表格提取算法

5. 圖片和多媒體處理

應(yīng)用場(chǎng)景與實(shí)戰(zhàn)案例

1. 自動(dòng)化報(bào)告生成

2. 內(nèi)容管理系統(tǒng)集成

3. 法律和合規(guī)文檔處理

4. 教育與科研應(yīng)用

與其他工具的對(duì)比

使用教程

環(huán)境準(zhǔn)備

基本使用示例

高級(jí)功能使用

注意事項(xiàng)與最佳實(shí)踐

1. 文件路徑處理

2. 大文件處理優(yōu)化

3. 樣式一致性維護(hù)

擴(kuò)展與自定義

1. 添加新元素支持

2. 集成其他格式支持

3. 云服務(wù)集成

總結(jié)

資源推薦

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕