Python實(shí)現(xiàn)增強(qiáng)版Docx與JSON雙向轉(zhuǎn)換的完整指南與代碼解析
引言
在日常辦公和軟件開發(fā)中,我們經(jīng)常需要處理文檔格式轉(zhuǎn)換的需求。特別是Word文檔(Docx)與JSON數(shù)據(jù)之間的相互轉(zhuǎn)換,在自動(dòng)化報(bào)告生成、內(nèi)容管理系統(tǒng)和數(shù)據(jù)遷移等場(chǎng)景中尤為重要。本文將詳細(xì)介紹一個(gè)增強(qiáng)版的Python工具,它可以實(shí)現(xiàn)Docx與JSON之間的高質(zhì)量雙向轉(zhuǎn)換,支持樣式、列表、表格、圖片等復(fù)雜元素的完整保留。
與傳統(tǒng)的簡(jiǎn)單文本提取不同,本工具致力于保持文檔的完整結(jié)構(gòu)和格式樣式,包括段落格式、字體樣式、表格布局甚至復(fù)選框等表單控件。這種轉(zhuǎn)換能力對(duì)于需要保持文檔專業(yè)外觀的企業(yè)環(huán)境至關(guān)重要。
核心功能概述
這個(gè)增強(qiáng)版轉(zhuǎn)換器提供了以下核心功能:
- 雙向轉(zhuǎn)換支持:既可以將Docx文檔轉(zhuǎn)換為結(jié)構(gòu)化的JSON數(shù)據(jù),也可以將JSON數(shù)據(jù)還原為格式完整的Docx文檔
- 樣式完整性保持:支持段落樣式、字符樣式、表格樣式等的提取和還原
- 復(fù)雜元素處理:能夠處理列表、表格、圖片、復(fù)選框等復(fù)雜文檔元素
- 批量處理能力:支持文件夾批量轉(zhuǎn)換,提高工作效率
- 文檔元數(shù)據(jù)保留:保留文檔的章節(jié)信息、頁面設(shè)置等元數(shù)據(jù)
與在線轉(zhuǎn)換工具相比,本方案提供了更高的數(shù)據(jù)安全性和自定義靈活性,所有處理均在本地完成,無需上傳敏感文檔到第三方服務(wù)器。
核心代碼解析
1. 文檔到JSON的轉(zhuǎn)換機(jī)制
docx_to_json函數(shù)是整個(gè)轉(zhuǎn)換過程的核心,它通過系統(tǒng)性地解析Docx文檔的各個(gè)組成部分,構(gòu)建完整的結(jié)構(gòu)化JSON數(shù)據(jù):
def docx_to_json(docx_path):
document = Document(docx_path)
doc_data = {
"metadata": {"created_by": "docx_converter", "version": "2.0"},
"styles": {"paragraph_styles": [], "character_styles": [], "table_styles": []},
"paragraphs": [],
"tables": [],
"images": [],
"sections": []
}
# 提取各個(gè)元素
extract_styles(document, doc_data)
extract_paragraphs(document, doc_data)
extract_tables(document, doc_data)
extract_images(document, doc_data)
extract_sections(document, doc_data)
return doc_data
這種模塊化的設(shè)計(jì)使得代碼易于維護(hù)和擴(kuò)展,每個(gè)提取函數(shù)負(fù)責(zé)處理特定類型的文檔元素。
2. 樣式提取技術(shù)
樣式提取是保持文檔格式的關(guān)鍵。extract_styles函數(shù)深入分析文檔中的樣式定義:
def extract_styles(document, doc_data):
styles = document.styles
for style in styles:
style_info = {
"name": style.name,
"type": str(style.type),
"builtin": style.builtin,
# 其他屬性...
}
# 提取字體樣式
if hasattr(style, 'font') and style.font:
font_info = {}
if style.font.name: font_info["name"] = style.font.name
if style.font.size: font_info["size"] = style.font.size.pt
# 更多字體屬性...
這種方法確保了即使是復(fù)雜的樣式信息也能被完整捕獲,為高質(zhì)量文檔還原奠定基礎(chǔ)。
3. 段落和文本處理
段落處理不僅關(guān)注文本內(nèi)容,還包括格式、列表屬性和內(nèi)嵌元素:
def extract_paragraphs(document, doc_data):
for para_idx, paragraph in enumerate(document.paragraphs):
para_info = {}
# 文本內(nèi)容
if paragraph.text.strip():
para_info["text"] = paragraph.text
# 段落樣式
if paragraph.style and paragraph.style.name:
para_info["style"] = paragraph.style.name
# 列表檢測(cè)
list_info = detect_list_properties(paragraph)
if list_info:
para_info["list_info"] = list_info
# 處理文本運(yùn)行(runs)
runs_list = []
for run in paragraph.runs:
run_info = extract_run_properties(run)
if run_info: runs_list.append(run_info)
if runs_list: para_info["runs"] = runs_list
doc_data["paragraphs"].append(para_info)
這種細(xì)粒度的處理方式確保了文檔中格式變化的精確捕獲,即使是同一段落內(nèi)不同文本段的樣式差異也能妥善保留。
4. 表格提取算法
表格提取是文檔處理中的難點(diǎn),本工具通過分層提取的方式確保表格結(jié)構(gòu)的完整性:
def extract_tables(document, doc_data):
for table_idx, table in enumerate(document.tables):
table_info = {"index": table_idx, "rows": []}
# 表格樣式
if hasattr(table, 'style') and table.style:
table_info["style"] = table.style.name
# 處理行和單元格
for row_idx, row in enumerate(table.rows):
row_info = {"index": row_idx, "cells": []}
for cell_idx, cell in enumerate(row.cells):
cell_info = extract_cell_content(cell, row_idx, cell_idx)
if cell_info: row_info["cells"].append(cell_info)
table_info["rows"].append(row_info)
doc_data["tables"].append(table_info)
表格中的每個(gè)單元格都會(huì)進(jìn)一步解析其中的段落和運(yùn)行,確保嵌套內(nèi)容的完整性。
5. 圖片和多媒體處理
圖片處理采用Base64編碼的方式,將二進(jìn)制圖像數(shù)據(jù)轉(zhuǎn)換為文本格式存儲(chǔ)在JSON中:
def extract_images(document, doc_data):
for rel in document.part.rels.values():
if "image" in rel.reltype:
image_part = rel.target_part
image_info = {
"content_type": image_part.content_type,
"data": base64.b64encode(image_part.blob).decode('utf-8'),
"filename": getattr(image_part, 'filename', 'image.png')
}
doc_data["images"].append(image_info)
這種方法確保了圖片數(shù)據(jù)的無損保存,在文檔還原時(shí)能夠完全恢復(fù)原始圖像質(zhì)量。
應(yīng)用場(chǎng)景與實(shí)戰(zhàn)案例
1. 自動(dòng)化報(bào)告生成
本工具在自動(dòng)化報(bào)告生成場(chǎng)景中表現(xiàn)出色,例如可以將JSON格式的業(yè)務(wù)數(shù)據(jù)自動(dòng)填充到預(yù)設(shè)的Docx模板中,生成具有一致格式的業(yè)務(wù)報(bào)告。
# 示例:將業(yè)務(wù)數(shù)據(jù)轉(zhuǎn)換為格式化的報(bào)告
business_data = {
"title": "季度銷售報(bào)告",
"period": "2023年Q1",
"metrics": ["銷售額", "增長(zhǎng)率", "市場(chǎng)份額"],
"values": [1500000, 0.15, 0.23]
}
# 使用模板生成正式報(bào)告
json_to_docx(business_data, "report_template.docx", "季度銷售報(bào)告.docx")
2. 內(nèi)容管理系統(tǒng)集成
對(duì)于內(nèi)容管理系統(tǒng)(CMS),本工具可以實(shí)現(xiàn)內(nèi)容的結(jié)構(gòu)化存儲(chǔ)和靈活發(fā)布。編輯人員可以在Word中方便地編輯內(nèi)容,然后轉(zhuǎn)換為JSON格式存儲(chǔ)到數(shù)據(jù)庫(kù)中,發(fā)布時(shí)再轉(zhuǎn)換為HTML或PDF等多種格式。
3. 法律和合規(guī)文檔處理
在法律行業(yè),合同和協(xié)議文檔需要嚴(yán)格的格式控制。使用本工具可以確保文檔在多次轉(zhuǎn)換后仍保持格式完整性,避免因格式錯(cuò)誤導(dǎo)致的法律效力問題。
4. 教育與科研應(yīng)用
在學(xué)術(shù)研究中,研究者可以使用此工具批量處理實(shí)驗(yàn)報(bào)告,提取結(jié)構(gòu)化數(shù)據(jù)進(jìn)行分析,或者將數(shù)據(jù)分析結(jié)果自動(dòng)填充到論文模板中。
與其他工具的對(duì)比
與市場(chǎng)上其他文檔轉(zhuǎn)換工具相比,本方案具有獨(dú)特優(yōu)勢(shì):
| 特性 | 本工具 | 在線轉(zhuǎn)換工具 | 專業(yè)軟件 |
|---|---|---|---|
| 數(shù)據(jù)隱私 | 本地處理,完全私有 | 需上傳文檔到服務(wù)器 | 取決于部署方式 |
| 自定義程度 | 高,代碼可任意修改 | 低,功能固定 | 中等,依賴軟件接口 |
| 格式支持 | 專注Docx與JSON互轉(zhuǎn) | 支持多種格式 | 支持多種格式 |
| 成本 | 免費(fèi)開源 | 免費(fèi)或付費(fèi) | 通常需要付費(fèi) |
與簡(jiǎn)單的文本提取工具相比,本工具在樣式保持方面表現(xiàn)卓越;與復(fù)雜的商業(yè)軟件相比,它具有開源透明的優(yōu)勢(shì)。
使用教程
環(huán)境準(zhǔn)備
首先安裝必要的Python依賴庫(kù):
pip install python-docx
python-docx是處理Word文檔的核心庫(kù),提供了豐富的API來操作Docx文件的各個(gè)方面。
基本使用示例
將Docx轉(zhuǎn)換為JSON:
from docx_converter import docx_to_json
# 轉(zhuǎn)換單個(gè)文檔
json_data = docx_to_json("我的文檔.docx")
# 保存JSON結(jié)果
import json
with open("文檔數(shù)據(jù).json", "w", encoding="utf-8") as f:
json.dump(json_data, f, ensure_ascii=False, indent=2)
將JSON還原為Docx:
from docx_converter import json_to_docx
# 讀取JSON數(shù)據(jù)
with open("文檔數(shù)據(jù).json", "r", encoding="utf-8") as f:
json_data = json.load(f)
# 還原為Word文檔
json_to_docx(json_data, "還原的文檔.docx")
批量轉(zhuǎn)換:
import os
def batch_convert(folder_path):
for filename in os.listdir(folder_path):
if filename.endswith(".docx"):
docx_path = os.path.join(folder_path, filename)
json_data = docx_to_json(docx_path)
json_path = os.path.join(folder_path, filename.replace(".docx", ".json"))
with open(json_path, "w", encoding="utf-8") as f:
json.dump(json_data, f, ensure_ascii=False, indent=2)
高級(jí)功能使用
復(fù)選框檢測(cè):
from docx_converter import find_all_checkboxes
# 檢測(cè)文檔中的復(fù)選框
results = find_all_checkboxes("表單文檔.docx")
print(f"找到 {len(results['checked'])} 個(gè)已選中復(fù)選框")
print(f"找到 {len(results['unchecked'])} 個(gè)未選中復(fù)選框")
樣式自定義:
# 自定義轉(zhuǎn)換樣式映射
def custom_style_mapper(style_info):
# 修改或過濾特定樣式
if style_info.get('name') == 'Heading1':
style_info['font_size'] = 16 # 修改標(biāo)題1的字號(hào)
return style_info
注意事項(xiàng)與最佳實(shí)踐
1. 文件路徑處理
在處理文件路徑時(shí),始終使用絕對(duì)路徑并添加適當(dāng)?shù)腻e(cuò)誤處理:
import os
def safe_convert(docx_path):
if not os.path.exists(docx_path):
raise FileNotFoundError(f"文檔不存在: {docx_path}")
if not docx_path.endswith('.docx'):
raise ValueError("僅支持.docx格式文件")
try:
return docx_to_json(docx_path)
except Exception as e:
print(f"轉(zhuǎn)換失敗: {str(e)}")
return None
2. 大文件處理優(yōu)化
處理大型文檔時(shí),考慮內(nèi)存使用優(yōu)化:
def process_large_document(docx_path, chunk_size=10):
"""分塊處理大型文檔"""
document = Document(docx_path)
total_paragraphs = len(document.paragraphs)
for i in range(0, total_paragraphs, chunk_size):
chunk_data = process_paragraph_chunk(document, i, i+chunk_size)
save_chunk(chunk_data, i)
3. 樣式一致性維護(hù)
為了確保樣式一致性,建議使用模板文檔:
def create_from_template(json_data, template_path, output_path):
"""基于模板創(chuàng)建文檔"""
template_data = docx_to_json(template_path)
# 將數(shù)據(jù)應(yīng)用到模板樣式
merged_data = merge_data_with_template(json_data, template_data)
json_to_docx(merged_data, output_path)
擴(kuò)展與自定義
本工具的設(shè)計(jì)允許輕松擴(kuò)展以支持更多功能:
1. 添加新元素支持
def extract_custom_elements(document, doc_data):
"""提取自定義元素"""
# 添加對(duì)圖表、數(shù)學(xué)公式等特殊元素的提取邏輯
pass
def create_custom_elements(document, element_data):
"""創(chuàng)建自定義元素"""
pass
2. 集成其他格式支持
結(jié)合pandoc等工具,可以擴(kuò)展更多格式支持:
def convert_via_markdown(json_data):
"""通過Markdown中間格式轉(zhuǎn)換"""
# JSON -> Markdown -> 目標(biāo)格式
markdown_content = json_to_markdown(json_data)
# 使用pandoc轉(zhuǎn)換為其他格式
return markdown_content
3. 云服務(wù)集成
將工具部署為Web服務(wù),提供API接口:
from flask import Flask, request, send_file
app = Flask(__name__)
@app.route('/convert/docx-to-json', methods=['POST'])
def convert_docx_to_json_api():
file = request.files['file']
json_data = docx_to_json(file)
return json_data
這種架構(gòu)允許與其他系統(tǒng)輕松集成。
總結(jié)
本文詳細(xì)介紹了一個(gè)功能豐富的Docx與JSON雙向轉(zhuǎn)換工具的實(shí)現(xiàn)原理和應(yīng)用方法。通過這個(gè)工具,用戶可以實(shí)現(xiàn)文檔內(nèi)容的結(jié)構(gòu)化提取和精確還原,滿足各種文檔自動(dòng)化處理需求。
與現(xiàn)有解決方案相比,本工具的主要優(yōu)勢(shì)在于:
- 格式保持完整性:支持樣式、表格、圖片等復(fù)雜元素的精確轉(zhuǎn)換
- 靈活的可擴(kuò)展性:模塊化設(shè)計(jì)便于添加新功能
- 開源免費(fèi):基于MIT許可證,可自由使用和修改
- 本地化處理:確保敏感數(shù)據(jù)不會(huì)離開本地環(huán)境
隨著數(shù)字化進(jìn)程的加速,文檔自動(dòng)化處理的需求將不斷增長(zhǎng)。本工具為開發(fā)者提供了一個(gè)強(qiáng)大的基礎(chǔ),可以在此基礎(chǔ)上構(gòu)建更復(fù)雜的文檔處理流程,如與LangChain等AI工具集成實(shí)現(xiàn)智能文檔處理。
未來,我們將繼續(xù)優(yōu)化工具性能,添加對(duì)更多元素的支持,并探索與人工智能技術(shù)的深度融合,使文檔處理更加智能化、自動(dòng)化。
資源推薦
- 完整代碼:本文涉及的完整代碼已在GitHub上開源
- 示例文檔:提供多種測(cè)試文檔,演示不同場(chǎng)景下的轉(zhuǎn)換效果
- 擴(kuò)展模塊:社區(qū)貢獻(xiàn)的擴(kuò)展功能,如PDF支持、OCR集成等
希望本文能幫助您更好地理解和應(yīng)用文檔轉(zhuǎn)換技術(shù),提升工作效率和自動(dòng)化水平。
完整代碼
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Docx to JSON and JSON to Docx converter
可以將docx文件的所有樣式抽取成為json對(duì)象,也可以將json對(duì)象還原為docx文件
增強(qiáng)版:支持更多樣式、列表、圖片、表格樣式等
"""
import json
import base64
import os
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH, WD_BREAK
from docx.enum.style import WD_STYLE_TYPE
from docx.enum.table import WD_TABLE_ALIGNMENT
from docx.shared import RGBColor, Pt, Inches
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
import io
def docx_to_json(docx_path):
"""
將docx文件轉(zhuǎn)換為JSON格式
增強(qiáng)版:支持更多樣式屬性、列表、圖片等
"""
document = Document(docx_path)
# 存儲(chǔ)所有內(nèi)容的字典
doc_data = {
"metadata": {
"created_by": "docx_converter",
"version": "2.0"
},
"styles": {
"paragraph_styles": [],
"character_styles": [],
"table_styles": []
},
"paragraphs": [],
"tables": [],
"images": [],
"sections": []
}
# 1. 提取所有樣式
extract_styles(document, doc_data)
# 2. 提取段落內(nèi)容
extract_paragraphs(document, doc_data)
# 3. 提取表格內(nèi)容
extract_tables(document, doc_data)
# 4. 提取圖片
extract_images(document, doc_data)
# 5. 提取章節(jié)信息
extract_sections(document, doc_data)
return doc_data
def extract_styles(document, doc_data):
"""提取文檔中的所有樣式"""
styles = document.styles
for style in styles:
style_info = {
"name": style.name,
"type": str(style.type),
"builtin": style.builtin,
"hidden": style.hidden,
"priority": getattr(style, 'priority', None)
}
# 字體樣式 - 只有當(dāng)style有font屬性時(shí)才提取
if hasattr(style, 'font') and style.font:
font_info = {}
if style.font.name:
font_info["name"] = style.font.name
if style.font.size:
font_info["size"] = style.font.size.pt
if style.font.bold is not None:
font_info["bold"] = style.font.bold
if style.font.italic is not None:
font_info["italic"] = style.font.italic
if style.font.underline is not None:
font_info["underline"] = str(style.font.underline)
if style.font.color.rgb:
font_info["color"] = str(style.font.color.rgb)
if style.font.all_caps is not None:
font_info["all_caps"] = style.font.all_caps
if style.font.small_caps is not None:
font_info["small_caps"] = style.font.small_caps
if style.font.superscript is not None:
font_info["superscript"] = style.font.superscript
if style.font.subscript is not None:
font_info["subscript"] = style.font.subscript
if style.font.strike is not None:
font_info["strike"] = style.font.strike
if font_info:
style_info["font"] = font_info
# 段落格式 - 僅對(duì)段落樣式提取
if style.type == WD_STYLE_TYPE.PARAGRAPH and hasattr(style, 'paragraph_format') and style.paragraph_format:
pf_info = extract_paragraph_format(style.paragraph_format)
if pf_info:
style_info["paragraph_format"] = pf_info
# 根據(jù)樣式類型分類存儲(chǔ)
if style.type == WD_STYLE_TYPE.PARAGRAPH:
doc_data["styles"]["paragraph_styles"].append(style_info)
elif style.type == WD_STYLE_TYPE.CHARACTER:
doc_data["styles"]["character_styles"].append(style_info)
elif style.type == WD_STYLE_TYPE.TABLE:
doc_data["styles"]["table_styles"].append(style_info)
def extract_paragraph_format(paragraph_format):
"""提取段落格式信息"""
pf_info = {}
if paragraph_format.alignment is not None:
pf_info["alignment"] = str(paragraph_format.alignment)
if paragraph_format.left_indent:
pf_info["left_indent"] = paragraph_format.left_indent.pt
if paragraph_format.right_indent:
pf_info["right_indent"] = paragraph_format.right_indent.pt
if paragraph_format.first_line_indent:
pf_info["first_line_indent"] = paragraph_format.first_line_indent.pt
if paragraph_format.space_before:
pf_info["space_before"] = paragraph_format.space_before.pt
if paragraph_format.space_after:
pf_info["space_after"] = paragraph_format.space_after.pt
if paragraph_format.line_spacing and paragraph_format.line_spacing <= 100:
pf_info["line_spacing"] = paragraph_format.line_spacing
if paragraph_format.keep_with_next is not None:
pf_info["keep_with_next"] = paragraph_format.keep_with_next
if paragraph_format.keep_together is not None:
pf_info["keep_together"] = paragraph_format.keep_together
if paragraph_format.page_break_before is not None:
pf_info["page_break_before"] = paragraph_format.page_break_before
if paragraph_format.widow_control is not None:
pf_info["widow_control"] = paragraph_format.widow_control
if paragraph_format.line_spacing_rule is not None:
pf_info["line_spacing_rule"] = str(paragraph_format.line_spacing_rule)
# 提取制表符信息
try:
if paragraph_format.tab_stops:
tab_stops_info = []
for tab_stop in paragraph_format.tab_stops:
tab_info = {
"position": tab_stop.position.pt if tab_stop.position else None,
"alignment": str(tab_stop.alignment) if tab_stop.alignment else None,
"leader": str(tab_stop.leader) if tab_stop.leader else None
}
tab_stops_info.append(tab_info)
if tab_stops_info:
pf_info["tab_stops"] = tab_stops_info
except:
pass
return pf_info if pf_info else None
def extract_paragraphs(document, doc_data):
"""提取所有段落內(nèi)容"""
for para_idx, paragraph in enumerate(document.paragraphs):
para_info = {}
# 基本文本和樣式
if paragraph.text.strip():
para_info["text"] = paragraph.text
if paragraph.style and paragraph.style.name:
para_info["style"] = paragraph.style.name
# 段落格式
if paragraph.paragraph_format:
pf_info = extract_paragraph_format(paragraph.paragraph_format)
if pf_info:
para_info["paragraph_format"] = pf_info
# 檢測(cè)列表屬性
list_info = detect_list_properties(paragraph)
if list_info:
para_info["list_info"] = list_info
# 處理runs
runs_list = []
for run in paragraph.runs:
run_info = extract_run_properties(run)
if run_info:
runs_list.append(run_info)
# 處理復(fù)選框
checkbox_info = extract_checkboxes(paragraph)
if checkbox_info:
runs_list.append(checkbox_info)
if runs_list:
para_info["runs"] = runs_list
# 只有包含內(nèi)容的段落才添加
if para_info:
doc_data["paragraphs"].append(para_info)
def detect_list_properties(paragraph):
"""檢測(cè)段落中的列表屬性"""
list_info = {}
try:
pf = paragraph.paragraph_format
# 檢測(cè)項(xiàng)目符號(hào)列表
if hasattr(pf, 'bullet_char') and pf.bullet_char is not None:
list_info['type'] = 'bullet'
list_info['bullet_char'] = pf.bullet_char
list_info['level'] = getattr(pf, 'level', 0)
# 檢測(cè)編號(hào)列表
elif hasattr(pf, 'number_format') and pf.number_format is not None:
list_info['type'] = 'number'
list_info['number_format'] = str(pf.number_format)
list_info['level'] = getattr(pf, 'level', 0)
list_info['start_value'] = getattr(pf, 'start_value', 1)
# 通過樣式名檢測(cè)列表
elif paragraph.style and paragraph.style.name:
style_name = paragraph.style.name.lower()
if 'list' in style_name or 'bullet' in style_name:
list_info['type'] = 'style_based'
list_info['style_name'] = paragraph.style.name
except Exception as e:
# 如果檢測(cè)失敗,忽略列表屬性
pass
return list_info if list_info else None
def extract_run_properties(run):
"""提取run的樣式屬性"""
run_info = {}
if run.text.strip():
run_info["text"] = run.text
# 字體屬性
font_props = [
("bold", run.bold),
("italic", run.italic),
("underline", run.underline),
("strike", run.font.strike),
("superscript", run.font.superscript),
("subscript", run.font.subscript),
("all_caps", run.font.all_caps),
("small_caps", run.font.small_caps)
]
for prop_name, prop_value in font_props:
if prop_value is not None:
run_info[prop_name] = prop_value
# 字體名稱和大小
if run.font.name:
run_info["font_name"] = run.font.name
if run.font.size:
run_info["font_size"] = run.font.size.pt
# 顏色
if run.font.color.rgb:
run_info["color"] = str(run.font.color.rgb)
# 高亮顏色
try:
if run.font.highlight_color and str(run.font.highlight_color) != 'none':
run_info["highlight_color"] = str(run.font.highlight_color)
except:
pass
# 下劃線顏色
try:
if run.font.underline_color and run.font.underline_color.rgb:
run_info["underline_color"] = str(run.font.underline_color.rgb)
except:
pass
# 字符間距
try:
if run.font.spacing:
run_info["character_spacing"] = run.font.spacing
except:
pass
# 字體背景色(字符底紋)
try:
rPr = run._element.rPr
if rPr is not None:
shd_elements = rPr.xpath('.//w:shd')
if shd_elements:
shd_element = shd_elements[0]
fill_color = shd_element.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}fill')
if fill_color:
run_info["background_color"] = fill_color
except:
pass
return run_info if run_info else None
def extract_checkboxes(paragraph):
"""提取復(fù)選框信息"""
try:
p_element = paragraph._element
xml_str = p_element.xml
# 檢測(cè)傳統(tǒng)復(fù)選框
if 'w:checkBox' in xml_str:
if 'w:checked="1"' in xml_str or 'w:checked w:val="true"' in xml_str:
return {"text": "[?]", "is_checkbox": True, "checked": True}
else:
return {"text": "[□]", "is_checkbox": True, "checked": False}
# 檢測(cè)新式復(fù)選框
checkboxes = p_element.xpath('.//*[local-name()="checkbox"]')
for checkbox in checkboxes:
checked_elements = checkbox.xpath('.//*[local-name()="checked"]')
if checked_elements:
checked_element = checked_elements[0]
checked_value = "false"
for attr_name in ['{http://schemas.microsoft.com/office/word/2010/wordml}val',
qn('w14:val'), 'w14:val']:
val = checked_element.get(attr_name)
if val is not None:
checked_value = val
break
is_checked = checked_value.lower() == "true" or checked_value == "1"
return {
"text": "[?]" if is_checked else "[□]",
"is_checkbox": True,
"checked": is_checked
}
except Exception as e:
pass
return None
def extract_tables(document, doc_data):
"""提取表格內(nèi)容和樣式"""
for table_idx, table in enumerate(document.tables):
table_info = {
"index": table_idx,
"rows": []
}
# 表格樣式
if hasattr(table, 'style') and table.style:
table_info["style"] = table.style.name
# 表格對(duì)齊方式
if hasattr(table, 'alignment'):
table_info["alignment"] = str(table.alignment)
# 處理行和列
for row_idx, row in enumerate(table.rows):
row_info = {
"index": row_idx,
"cells": [],
"height": getattr(row, 'height', None)
}
for cell_idx, cell in enumerate(row.cells):
cell_info = extract_cell_content(cell, row_idx, cell_idx)
if cell_info:
row_info["cells"].append(cell_info)
if row_info["cells"]:
table_info["rows"].append(row_info)
if table_info["rows"]:
doc_data["tables"].append(table_info)
def extract_cell_content(cell, row_idx, cell_idx):
"""提取單元格內(nèi)容"""
cell_info = {
"row": row_idx,
"column": cell_idx,
"text": cell.text
}
# 單元格樣式
try:
# 底紋
if hasattr(cell, 'shading'):
shading = cell.shading
if hasattr(shading, 'background_pattern_color'):
cell_info["shading"] = str(shading.background_pattern_color)
# 垂直對(duì)齊
if hasattr(cell, 'vertical_alignment') and cell.vertical_alignment is not None:
cell_info["vertical_alignment"] = str(cell.vertical_alignment)
# 邊距
if hasattr(cell, 'top_margin') and cell.top_margin is not None:
cell_info["top_margin"] = cell.top_margin.pt
if hasattr(cell, 'bottom_margin') and cell.bottom_margin is not None:
cell_info["bottom_margin"] = cell.bottom_margin.pt
if hasattr(cell, 'left_margin') and cell.left_margin is not None:
cell_info["left_margin"] = cell.left_margin.pt
if hasattr(cell, 'right_margin') and cell.right_margin is not None:
cell_info["right_margin"] = cell.right_margin.pt
# 單元格邊框
tc = cell._tc
tcPr = tc.tcPr
if tcPr is not None:
tcBorders = tcPr.xpath('./w:tcBorders')
if tcBorders:
borders_info = {}
border_elements = tcBorders[0].xpath('./*')
for border_elem in border_elements:
border_tag = border_elem.tag.split('}')[1] # 獲取標(biāo)簽名
border_attrs = {}
for attr, value in border_elem.attrib.items():
attr_name = attr.split('}')[1] if '}' in attr else attr
border_attrs[attr_name] = value
borders_info[border_tag] = border_attrs
if borders_info:
cell_info["borders"] = borders_info
except:
pass
# 處理單元格中的段落
paragraphs_list = []
for para in cell.paragraphs:
if para.text.strip():
para_dict = {
"text": para.text
}
if para.style and para.style.name:
para_dict["style"] = para.style.name
# 段落格式
if para.paragraph_format:
pf_info = extract_paragraph_format(para.paragraph_format)
if pf_info:
para_dict["paragraph_format"] = pf_info
# 處理runs
runs_list = []
for run in para.runs:
run_info = extract_run_properties(run)
if run_info:
runs_list.append(run_info)
if runs_list:
para_dict["runs"] = runs_list
paragraphs_list.append(para_dict)
if paragraphs_list:
cell_info["paragraphs"] = paragraphs_list
return cell_info
def extract_images(document, doc_data):
"""提取文檔中的圖片"""
try:
# 從文檔關(guān)系中提取圖片
for rel in document.part.rels.values():
if "image" in rel.reltype:
image_part = rel.target_part
image_info = {
"content_type": image_part.content_type,
"data": base64.b64encode(image_part.blob).decode('utf-8'),
"filename": getattr(image_part, 'filename', 'image.png')
}
doc_data["images"].append(image_info)
except Exception as e:
print(f"提取圖片時(shí)出錯(cuò): {e}")
def extract_sections(document, doc_data):
"""提取章節(jié)信息"""
for section_idx, section in enumerate(document.sections):
section_info = {
"index": section_idx,
"page_width": section.page_width.pt if section.page_width else None,
"page_height": section.page_height.pt if section.page_height else None,
"left_margin": section.left_margin.pt if section.left_margin else None,
"right_margin": section.right_margin.pt if section.right_margin else None,
"top_margin": section.top_margin.pt if section.top_margin else None,
"bottom_margin": section.bottom_margin.pt if section.bottom_margin else None
}
doc_data["sections"].append(section_info)
def json_to_docx(json_data, output_path):
"""
將JSON數(shù)據(jù)轉(zhuǎn)換為docx文件
增強(qiáng)版:支持更多樣式和元素
"""
document = Document()
# 1. 設(shè)置文檔屬性
setup_document_properties(document, json_data)
# 2. 添加段落
create_paragraphs(document, json_data)
# 3. 添加表格
create_tables(document, json_data)
# 4. 添加圖片
create_images(document, json_data)
# 保存文檔
document.save(output_path)
def setup_document_properties(document, json_data):
"""設(shè)置文檔屬性"""
# 設(shè)置頁面布局
if json_data.get("sections"):
section = document.sections[0]
first_section = json_data["sections"][0]
if first_section.get("page_width"):
section.page_width = Pt(first_section["page_width"])
if first_section.get("page_height"):
section.page_height = Pt(first_section["page_height"])
if first_section.get("left_margin"):
section.left_margin = Pt(first_section["left_margin"])
if first_section.get("right_margin"):
section.right_margin = Pt(first_section["right_margin"])
if first_section.get("top_margin"):
section.top_margin = Pt(first_section["top_margin"])
if first_section.get("bottom_margin"):
section.bottom_margin = Pt(first_section["bottom_margin"])
def create_paragraphs(document, json_data):
"""創(chuàng)建段落"""
for para_data in json_data.get("paragraphs", []):
# 創(chuàng)建段落
style_name = para_data.get("style", "Normal")
try:
paragraph = document.add_paragraph(style=style_name)
except:
paragraph = document.add_paragraph(style="Normal")
# 設(shè)置段落格式
apply_paragraph_formatting(paragraph, para_data)
# 處理列表
apply_list_formatting(paragraph, para_data)
# 清空默認(rèn)文本
paragraph.clear()
# 添加runs
create_runs(paragraph, para_data)
def apply_paragraph_formatting(paragraph, para_data):
"""應(yīng)用段落格式"""
paragraph_format_data = para_data.get("paragraph_format", {})
if paragraph_format_data:
pf = paragraph.paragraph_format
# 對(duì)齊方式
alignment_str = paragraph_format_data.get("alignment")
if alignment_str:
if "LEFT" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
elif "CENTER" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
elif "RIGHT" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
elif "JUSTIFY" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
elif "DISTRIBUTE" in alignment_str:
paragraph.alignment = WD_ALIGN_PARAGRAPH.DISTRIBUTE
# 縮進(jìn)和間距
indent_props = [
("left_indent", "left_indent"),
("right_indent", "right_indent"),
("first_line_indent", "first_line_indent"),
("space_before", "space_before"),
("space_after", "space_after")
]
for json_prop, pf_prop in indent_props:
if json_prop in paragraph_format_data:
setattr(pf, pf_prop, Pt(paragraph_format_data[json_prop]))
if "line_spacing" in paragraph_format_data and paragraph_format_data["line_spacing"] <= 100:
pf.line_spacing = paragraph_format_data["line_spacing"]
# 應(yīng)用制表符設(shè)置
if "tab_stops" in paragraph_format_data:
try:
tab_stops = pf.tab_stops
# 清除現(xiàn)有的制表符
for _ in range(len(tab_stops)):
tab_stops.pop()
# 添加新的制表符
for tab_info in paragraph_format_data["tab_stops"]:
position = Pt(tab_info["position"]) if tab_info["position"] else None
if position:
alignment = None
leader = None
# 解析對(duì)齊方式
if tab_info.get("alignment"):
from docx.enum.text import WD_TAB_ALIGNMENT
if "LEFT" in tab_info["alignment"]:
alignment = WD_TAB_ALIGNMENT.LEFT
elif "RIGHT" in tab_info["alignment"]:
alignment = WD_TAB_ALIGNMENT.RIGHT
elif "CENTER" in tab_info["alignment"]:
alignment = WD_TAB_ALIGNMENT.CENTER
# 解析前導(dǎo)字符
if tab_info.get("leader"):
from docx.enum.text import WD_TAB_LEADER
if "DOTS" in tab_info["leader"]:
leader = WD_TAB_LEADER.DOTS
elif "HYPHENS" in tab_info["leader"]:
leader = WD_TAB_LEADER.HYPHENS
elif "UNDERSCORE" in tab_info["leader"]:
leader = WD_TAB_LEADER.UNDERSCORE
tab_stops.add_tab_stop(position, alignment, leader)
except:
pass
def apply_list_formatting(paragraph, para_data):
"""應(yīng)用列表格式"""
list_info = para_data.get("list_info")
if list_info:
try:
pf = paragraph.paragraph_format
if list_info.get("type") == "bullet" and list_info.get("level") is not None:
# 設(shè)置項(xiàng)目符號(hào)列表
pf.left_indent = Pt(list_info.get("level", 0) * 36)
elif list_info.get("type") == "number" and list_info.get("level") is not None:
# 設(shè)置編號(hào)列表
pf.left_indent = Pt(list_info.get("level", 0) * 36)
except Exception as e:
print(f"應(yīng)用列表格式時(shí)出錯(cuò): {e}")
def create_runs(paragraph, para_data):
"""創(chuàng)建runs"""
runs_data = para_data.get("runs", [])
if runs_data:
for run_data in runs_data:
text = run_data.get("text", "")
# 檢查是否有重要內(nèi)容
has_content = any([
text,
run_data.get("bold") is not None,
run_data.get("italic") is not None,
run_data.get("underline") is not None,
run_data.get("font_name"),
run_data.get("font_size"),
run_data.get("color"),
run_data.get("highlight_color")
])
if has_content:
run = paragraph.add_run(text)
apply_run_formatting(run, run_data)
else:
# 如果沒有runs數(shù)據(jù),直接添加段落文本
text = para_data.get("text", "")
if text:
run = paragraph.add_run(text)
run.font.size = Pt(12)
def apply_run_formatting(run, run_data):
"""應(yīng)用run格式"""
# 基本格式
format_props = [
("bold", "bold"),
("italic", "italic"),
("underline", "underline"),
("strike", "strike"),
("superscript", "superscript"),
("subscript", "subscript"),
("all_caps", "all_caps"),
("small_caps", "small_caps")
]
for json_prop, run_prop in format_props:
if json_prop in run_data:
setattr(run, run_prop, run_data[json_prop])
# 字體大小
if "font_size" in run_data:
run.font.size = Pt(run_data["font_size"])
else:
run.font.size = Pt(12)
# 字體名稱
if "font_name" in run_data:
run.font.name = run_data["font_name"]
try:
run._element.rPr.rFonts.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}eastAsia', run_data["font_name"])
except:
pass
# 字體顏色
if "color" in run_data and run_data["color"] != "None":
try:
if run_data["color"].startswith("RGB"):
color_str = run_data["color"][4:-1] # 去除"RGB("和")"
r, g, b = map(int, color_str.split(","))
run.font.color.rgb = RGBColor(r, g, b)
else:
run.font.color.rgb = RGBColor.from_string(run_data["color"])
except:
pass
# 字符間距
if "character_spacing" in run_data:
try:
run.font.spacing = run_data["character_spacing"]
except:
pass
# 字體背景色(字符底紋)
if "background_color" in run_data:
try:
from docx.oxml import OxmlElement
# 創(chuàng)建或獲取rPr元素
rPr = run._element.get_or_add_rPr()
# 創(chuàng)建shd元素
shd = OxmlElement('w:shd')
shd.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val', 'clear')
shd.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}color', 'auto')
shd.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}fill', run_data["background_color"])
# 添加到rPr
rPr.append(shd)
except:
pass
def create_tables(document, json_data):
"""創(chuàng)建表格"""
for table_data in json_data.get("tables", []):
if not table_data.get("rows"):
continue
# 確定表格大小
num_rows = len(table_data["rows"])
num_cols = max(len(row.get("cells", [])) for row in table_data["rows"]) if table_data["rows"] else 1
if num_rows > 0 and num_cols > 0:
table = document.add_table(rows=num_rows, cols=num_cols)
# 應(yīng)用表格樣式
if "style" in table_data:
try:
table.style = table_data["style"]
except:
pass
# 填充表格內(nèi)容
for i, row_data in enumerate(table_data["rows"]):
if i >= num_rows:
break
for j, cell_data in enumerate(row_data.get("cells", [])):
if j >= num_cols:
break
cell = table.cell(i, j)
populate_cell_content(cell, cell_data)
def populate_cell_content(cell, cell_data):
"""填充單元格內(nèi)容"""
# 清除默認(rèn)內(nèi)容
for paragraph in cell.paragraphs:
p = paragraph._element
p.getparent().remove(p)
# 添加段落內(nèi)容
if "paragraphs" in cell_data:
for para_data in cell_data["paragraphs"]:
para = cell.add_paragraph()
# 設(shè)置段落樣式
if "style" in para_data:
try:
para.style = para_data["style"]
except:
pass
# 添加runs
if "runs" in para_data:
for run_data in para_data["runs"]:
text = run_data.get("text", "")
run = para.add_run(text)
apply_run_formatting(run, run_data)
else:
# 直接添加文本
text = para_data.get("text", "")
if text:
run = para.add_run(text)
run.font.size = Pt(12)
else:
# 直接添加文本
text = cell_data.get("text", "")
if text:
para = cell.add_paragraph()
run = para.add_run(text)
run.font.size = Pt(12)
# 應(yīng)用單元格樣式
try:
# 垂直對(duì)齊
if "vertical_alignment" in cell_data:
from docx.enum.table import WD_ALIGN_VERTICAL
alignment_str = cell_data["vertical_alignment"]
if "TOP" in alignment_str:
cell.vertical_alignment = WD_ALIGN_VERTICAL.TOP
elif "CENTER" in alignment_str:
cell.vertical_alignment = WD_ALIGN_VERTICAL.CENTER
elif "BOTTOM" in alignment_str:
cell.vertical_alignment = WD_ALIGN_VERTICAL.BOTTOM
# 邊距
if "top_margin" in cell_data:
cell.top_margin = Pt(cell_data["top_margin"])
if "bottom_margin" in cell_data:
cell.bottom_margin = Pt(cell_data["bottom_margin"])
if "left_margin" in cell_data:
cell.left_margin = Pt(cell_data["left_margin"])
if "right_margin" in cell_data:
cell.right_margin = Pt(cell_data["right_margin"])
# 單元格邊框
if "borders" in cell_data:
set_cell_border(cell, cell_data["borders"])
except Exception as e:
print(f"應(yīng)用單元格樣式時(shí)出錯(cuò): {e}")
def create_images(document, json_data):
"""創(chuàng)建圖片"""
for image_data in json_data.get("images", []):
try:
image_bytes = base64.b64decode(image_data["data"])
image_io = io.BytesIO(image_bytes)
# 添加圖片到文檔
paragraph = document.add_paragraph()
run = paragraph.add_run()
run.add_picture(image_io, width=Inches(2.0))
except Exception as e:
print(f"添加圖片時(shí)出錯(cuò): {e}")
def find_all_checkboxes(docx_path):
"""查找文檔中所有復(fù)選框(增強(qiáng)版)"""
doc = Document(docx_path)
results = {
'unchecked': [],
'checked': [],
'locations': [],
'form_controls': []
}
print("=== 開始搜索復(fù)選框 ===")
# 搜索段落中的復(fù)選框
for para_idx, paragraph in enumerate(doc.paragraphs):
find_checkboxes_in_paragraph(paragraph, f"段落{para_idx + 1}", results)
# 搜索表格中的復(fù)選框
for table_idx, table in enumerate(doc.tables):
for row_idx, row in enumerate(table.rows):
for cell_idx, cell in enumerate(row.cells):
for para_idx, paragraph in enumerate(cell.paragraphs):
location = f"表格{table_idx + 1}行{row_idx + 1}列{cell_idx + 1}段落{para_idx + 1}"
find_checkboxes_in_paragraph(paragraph, location, results)
# 搜索頁眉頁腳
for section_idx, section in enumerate(doc.sections):
for para_idx, paragraph in enumerate(section.header.paragraphs):
find_checkboxes_in_paragraph(paragraph, f"節(jié){section_idx + 1}頁眉段落{para_idx + 1}", results)
for para_idx, paragraph in enumerate(section.footer.paragraphs):
find_checkboxes_in_paragraph(paragraph, f"節(jié){section_idx + 1}頁腳段落{para_idx + 1}", results)
# 輸出結(jié)果
print(f"\n=== 統(tǒng)計(jì)結(jié)果 ===")
print(f"未選中復(fù)選框數(shù)量: {len(results['unchecked'])}")
print(f"已選中復(fù)選框數(shù)量: {len(results['checked'])}")
print(f"表單控件數(shù)量: {len(results['form_controls'])}")
return results
def find_checkboxes_in_paragraph(paragraph, location, results):
"""在段落中查找復(fù)選框"""
try:
p_element = paragraph._element
xml_str = p_element.xml
# 查找傳統(tǒng)表單復(fù)選框
if 'w:checkBox' in xml_str or 'w14:checkbox' in xml_str:
is_checked = any(marker in xml_str for marker in
['w:checked="1"', 'w:checked w:val="true"', 'w:checked w:val="1"'])
checkbox_info = {
'location': location,
'text': paragraph.text,
'checked': is_checked,
'type': 'form_control'
}
if is_checked:
results['checked'].append(checkbox_info)
else:
results['unchecked'].append(checkbox_info)
results['form_controls'].append(checkbox_info)
print(f"[表單控件] {location}: {'已選中' if is_checked else '未選中'}")
# 查找模擬復(fù)選框(文本符號(hào))
checkbox_symbols = {
'unchecked': ['□', '?', '[ ]', '()', '○'],
'checked': ['?', '?', '[x]', '[X]', '[√]', '(x)', '(X)']
}
for symbol in checkbox_symbols['unchecked']:
if symbol in paragraph.text:
results['unchecked'].append({
'location': location,
'text': paragraph.text,
'symbol': symbol,
'type': 'text_symbol'
})
print(f"[文本符號(hào)] {location}: 未選中 '{symbol}'")
for symbol in checkbox_symbols['checked']:
if symbol in paragraph.text:
results['checked'].append({
'location': location,
'text': paragraph.text,
'symbol': symbol,
'type': 'text_symbol'
})
print(f"[文本符號(hào)] {location}: 已選中 '{symbol}'")
except Exception as e:
print(f"檢查段落 {location} 時(shí)出錯(cuò): {e}")
def set_cell_border(cell, borders_data):
"""設(shè)置單元格邊框"""
try:
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
tc = cell._tc
tcPr = tc.get_or_add_tcPr()
# 獲取或創(chuàng)建tcBorders元素
tcBorders = tcPr.first_child_found_in("w:tcBorders")
if tcBorders is None:
tcBorders = OxmlElement('w:tcBorders')
tcPr.append(tcBorders)
# 根據(jù)數(shù)據(jù)設(shè)置邊框
for border_name, border_attrs in borders_data.items():
# 檢查是否存在該邊框元素,如果不存在則創(chuàng)建
element = tcBorders.find('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}%s' % border_name)
if element is None:
element = OxmlElement('w:%s' % border_name)
tcBorders.append(element)
# 設(shè)置邊框?qū)傩?
for attr_name, attr_value in border_attrs.items():
element.set('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}%s' % attr_name, str(attr_value))
except Exception as e:
print(f"設(shè)置單元格邊框時(shí)出錯(cuò): {e}")
def main():
"""主函數(shù)"""
print("Docx Converter 增強(qiáng)版 v2.0")
print("1. Convert docx to json")
print("2. Convert json to docx")
print("3. Find checkboxes in docx")
print("4. Batch convert folder")
choice = input("請(qǐng)選擇操作 (1/2/3/4): ")
if choice == "1":
docx_path = input("請(qǐng)輸入docx文件路徑: ")
if not os.path.exists(docx_path):
print("文件不存在!")
return
json_data = docx_to_json(docx_path)
json_path = docx_path.replace(".docx", "_enhanced.json")
with open(json_path, "w", encoding="utf-8") as f:
json.dump(json_data, f, ensure_ascii=False, indent=2)
print(f"轉(zhuǎn)換完成! JSON文件已保存為: {json_path}")
elif choice == "2":
json_path = input("請(qǐng)輸入json文件路徑: ")
if not os.path.exists(json_path):
print("文件不存在!")
return
with open(json_path, "r", encoding="utf-8") as f:
json_data = json.load(f)
output_path = json_path.replace(".json", "_restored.docx")
json_to_docx(json_data, output_path)
print(f"轉(zhuǎn)換完成! Docx文件已保存為: {output_path}")
elif choice == "3":
docx_path = input("請(qǐng)輸入docx文件路徑: ")
if not os.path.exists(docx_path):
print("文件不存在!")
return
results = find_all_checkboxes(docx_path)
print("\n復(fù)選框查找完成!")
elif choice == "4":
folder_path = input("請(qǐng)輸入文件夾路徑: ")
if not os.path.exists(folder_path):
print("文件夾不存在!")
return
# 批量轉(zhuǎn)換邏輯
for filename in os.listdir(folder_path):
if filename.endswith(".docx"):
docx_path = os.path.join(folder_path, filename)
print(f"處理文件: {filename}")
try:
json_data = docx_to_json(docx_path)
json_path = os.path.join(folder_path, filename.replace(".docx", ".json"))
with open(json_path, "w", encoding="utf-8") as f:
json.dump(json_data, f, ensure_ascii=False, indent=2)
print(f"成功轉(zhuǎn)換: {filename}")
except Exception as e:
print(f"轉(zhuǎn)換失敗 {filename}: {e}")
print("批量轉(zhuǎn)換完成!")
else:
print("無效的選擇!")
if __name__ == "__main__":
main()
到此這篇關(guān)于Python實(shí)現(xiàn)增強(qiáng)版Docx與JSON雙向轉(zhuǎn)換的完整指南與代碼解析的文章就介紹到這了,更多相關(guān)Python Word與JSON互轉(zhuǎn)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Python實(shí)現(xiàn)EM算法實(shí)例代碼
這篇文章主要給大家介紹了關(guān)于Python實(shí)現(xiàn)EM算法的相關(guān)資料,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2020-10-10
python在OpenCV里實(shí)現(xiàn)投影變換效果
這篇文章主要介紹了python在OpenCV里實(shí)現(xiàn)投影變換效果,本文通過實(shí)例代碼給大家介紹的非常詳細(xì),具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2019-08-08
用Python編寫一個(gè)簡(jiǎn)單的Lisp解釋器的教程
這篇文章主要介紹了用Python編寫一個(gè)簡(jiǎn)單的Lisp解釋器的教程,Lisp是一種源碼簡(jiǎn)單的函數(shù)式編程語言,本文主要介紹對(duì)其中的一個(gè)子集Scheme的解釋器開發(fā),需要的朋友可以參考下2015-04-04
根據(jù)DataFrame某一列的值來選擇具體的某一行方法
今天小編就為大家分享一篇根據(jù)DataFrame某一列的值來選擇具體的某一行方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2018-07-07
python-jwt用戶認(rèn)證食用教學(xué)的實(shí)現(xiàn)方法
這篇文章主要介紹了python-jwt用戶認(rèn)證食用教學(xué)的實(shí)現(xiàn)方法,本文給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2021-01-01
python通過opencv實(shí)現(xiàn)圖片裁剪原理解析
這篇文章主要介紹了python通過opencv實(shí)現(xiàn)圖片裁剪原理解析,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-01-01

