Python數(shù)據(jù)處理之JSON數(shù)據(jù)轉(zhuǎn)換和處理詳解

更新時(shí)間：2025年12月05日 09:26:40 作者：零日失眠者

這篇文章主要為大家詳細(xì)介紹了如何使用Python實(shí)現(xiàn)一個(gè)專業(yè)的JSON數(shù)據(jù)處理和轉(zhuǎn)換工具,專門(mén)用于處理驗(yàn)證和轉(zhuǎn)換JSON格式的數(shù)據(jù),有需要的可以了解下

功能介紹

這是一個(gè)專業(yè)的JSON數(shù)據(jù)處理和轉(zhuǎn)換工具，專門(mén)用于處理、驗(yàn)證和轉(zhuǎn)換JSON格式的數(shù)據(jù)。該工具具備以下核心功能：

數(shù)據(jù)解析功能：

支持多種JSON格式解析（標(biāo)準(zhǔn)JSON、JSON Lines、JSON數(shù)組等）
自動(dòng)檢測(cè)和處理編碼問(wèn)題
處理嵌套和復(fù)雜JSON結(jié)構(gòu)
支持大文件流式處理

數(shù)據(jù)驗(yàn)證功能：

JSON格式語(yǔ)法驗(yàn)證
自定義Schema驗(yàn)證
數(shù)據(jù)類型驗(yàn)證
必填字段和可選字段驗(yàn)證

數(shù)據(jù)轉(zhuǎn)換功能：

JSON到其他格式轉(zhuǎn)換（CSV、Excel、XML等）
字段重命名和映射
數(shù)據(jù)扁平化處理
嵌套結(jié)構(gòu)展開(kāi)

數(shù)據(jù)清洗功能：

去除空值和無(wú)效數(shù)據(jù)
標(biāo)準(zhǔn)化數(shù)據(jù)格式
處理重復(fù)記錄
數(shù)據(jù)類型轉(zhuǎn)換

批量處理功能：

支持多個(gè)JSON文件批量處理
處理進(jìn)度顯示和日志記錄
錯(cuò)誤處理和恢復(fù)機(jī)制
處理結(jié)果匯總報(bào)告

場(chǎng)景應(yīng)用

1. API數(shù)據(jù)處理

處理REST API返回的JSON數(shù)據(jù)
轉(zhuǎn)換API數(shù)據(jù)為其他格式便于分析
驗(yàn)證API響應(yīng)數(shù)據(jù)的完整性和正確性
批量處理多個(gè)API響應(yīng)文件

2. 配置文件管理

處理和驗(yàn)證應(yīng)用程序配置文件
轉(zhuǎn)換配置文件格式
批量更新多個(gè)配置文件
配置文件版本管理和比較

3. 日志數(shù)據(jù)分析

處理JSON格式的日志文件
提取關(guān)鍵日志信息
轉(zhuǎn)換日志數(shù)據(jù)為分析報(bào)表
統(tǒng)計(jì)日志中的關(guān)鍵指標(biāo)

4. 數(shù)據(jù)集成項(xiàng)目

整合來(lái)自不同系統(tǒng)的JSON數(shù)據(jù)
轉(zhuǎn)換數(shù)據(jù)格式以適配目標(biāo)系統(tǒng)
驗(yàn)證數(shù)據(jù)質(zhì)量和完整性
生成數(shù)據(jù)處理報(bào)告

報(bào)錯(cuò)處理

1. JSON解析異常

try:
    data = json.loads(json_string)
except json.JSONDecodeError as e:
    logger.error(f"JSON解析錯(cuò)誤: {str(e)}")
    raise JSONProcessingError(f"JSON格式錯(cuò)誤: {str(e)}")
except UnicodeDecodeError as e:
    logger.error(f"編碼錯(cuò)誤: {str(e)}")
    raise JSONProcessingError(f"文件編碼錯(cuò)誤: {str(e)}")

2. 數(shù)據(jù)驗(yàn)證異常

try:
    jsonschema.validate(instance=data, schema=schema)
except jsonschema.ValidationError as e:
    logger.error(f"數(shù)據(jù)驗(yàn)證失敗: {str(e)}")
    raise JSONProcessingError(f"數(shù)據(jù)不符合Schema要求: {str(e)}")
except jsonschema.SchemaError as e:
    logger.error(f"Schema定義錯(cuò)誤: {str(e)}")
    raise JSONProcessingError(f"Schema格式錯(cuò)誤: {str(e)}")

3. 文件操作異常

try:
    with open(input_file, 'r', encoding=encoding) as f:
        data = json.load(f)
except FileNotFoundError:
    logger.error(f"輸入文件不存在: {input_file}")
    raise JSONProcessingError(f"文件未找到: {input_file}")
except PermissionError:
    logger.error(f"無(wú)權(quán)限訪問(wèn)文件: {input_file}")
    raise JSONProcessingError(f"文件訪問(wèn)權(quán)限不足: {input_file}")

4. 內(nèi)存溢出異常

try:
    # 處理大文件時(shí)的內(nèi)存管理
    if file_size > MAX_FILE_SIZE:
        process_large_json_file(input_file)
except MemoryError:
    logger.error("內(nèi)存不足，無(wú)法處理大文件")
    raise JSONProcessingError("內(nèi)存不足，請(qǐng)減小文件大小或增加系統(tǒng)內(nèi)存")

代碼實(shí)現(xiàn)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
JSON數(shù)據(jù)處理和轉(zhuǎn)換工具
功能：處理、驗(yàn)證和轉(zhuǎn)換JSON格式數(shù)據(jù)
作者：Cline
版本：1.0
"""

import json
import argparse
import sys
import logging
import os
from datetime import datetime
import jsonschema
from typing import Dict, List, Any, Optional, Union
import pandas as pd
import xml.etree.ElementTree as ET
from collections import OrderedDict
import chardet

# 配置日志
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('json_processor.log'),
        logging.StreamHandler(sys.stdout)
    ]
)
logger = logging.getLogger(__name__)

class JSONProcessingError(Exception):
    """JSON處理異常類"""
    pass

class JSONProcessor:
    def __init__(self, config: Dict[str, Any]):
        self.input_file = config.get('input_file')
        self.output_file = config.get('output_file', 'processed_data.json')
        self.encoding = config.get('encoding', 'auto')
        self.schema_file = config.get('schema_file')
        self.output_format = config.get('output_format', 'json')
        self.flatten_depth = config.get('flatten_depth', 0)
        self.backup_original = config.get('backup_original', True)
        
        # 處理選項(xiàng)
        self.options = config.get('options', {})
        
        # 處理結(jié)果統(tǒng)計(jì)
        self.stats = {
            'total_records': 0,
            'processed_records': 0,
            'errors': [],
            'warnings': []
        }
        
    def detect_encoding(self, file_path: str) -> str:
        """自動(dòng)檢測(cè)文件編碼"""
        try:
            with open(file_path, 'rb') as f:
                raw_data = f.read(10000)  # 讀取前10KB數(shù)據(jù)
                result = chardet.detect(raw_data)
                encoding = result['encoding']
                confidence = result['confidence']
                
                logger.info(f"檢測(cè)到文件編碼: {encoding} (置信度: {confidence:.2f})")
                return encoding if confidence > 0.7 else 'utf-8'
                
        except Exception as e:
            logger.warning(f"編碼檢測(cè)失敗，使用默認(rèn)編碼: {str(e)}")
            return 'utf-8'
            
    def load_json_data(self) -> Union[Dict, List]:
        """加載JSON數(shù)據(jù)"""
        logger.info(f"開(kāi)始加載JSON數(shù)據(jù)文件: {self.input_file}")
        
        try:
            # 自動(dòng)檢測(cè)編碼
            if self.encoding == 'auto':
                self.encoding = self.detect_encoding(self.input_file)
                
            # 檢測(cè)文件大小
            file_size = os.path.getsize(self.input_file)
            if file_size > 100 * 1024 * 1024:  # 100MB
                logger.warning(f"文件較大 ({file_size / (1024*1024):.2f} MB)，建議使用流式處理")
                
            # 嘗試加載JSON數(shù)據(jù)
            with open(self.input_file, 'r', encoding=self.encoding) as f:
                # 檢測(cè)是否為JSON Lines格式
                first_line = f.readline().strip()
                f.seek(0)  # 重置文件指針
                
                if first_line.startswith('[') or first_line.startswith('{'):
                    # 標(biāo)準(zhǔn)JSON格式
                    data = json.load(f)
                else:
                    # JSON Lines格式
                    data = []
                    for line_num, line in enumerate(f, 1):
                        if line.strip():
                            try:
                                data.append(json.loads(line))
                            except json.JSONDecodeError as e:
                                logger.warning(f"第 {line_num} 行JSON解析失敗: {str(e)}")
                                
            self.stats['total_records'] = len(data) if isinstance(data, list) else 1
            logger.info(f"成功加載JSON數(shù)據(jù)，共 {self.stats['total_records']} 條記錄")
            return data
            
        except FileNotFoundError:
            logger.error(f"輸入文件不存在: {self.input_file}")
            raise JSONProcessingError(f"文件未找到: {self.input_file}")
        except json.JSONDecodeError as e:
            logger.error(f"JSON解析錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"JSON格式錯(cuò)誤: {str(e)}")
        except UnicodeDecodeError as e:
            logger.error(f"文件編碼錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"編碼錯(cuò)誤，請(qǐng)嘗試指定其他編碼格式")
        except Exception as e:
            logger.error(f"加載JSON數(shù)據(jù)時(shí)發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"數(shù)據(jù)加載失敗: {str(e)}")
            
    def backup_original(self):
        """備份原始文件"""
        if not self.backup_original:
            return
            
        try:
            backup_name = f"{self.input_file}.backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
            import shutil
            shutil.copy2(self.input_file, backup_name)
            logger.info(f"原始文件已備份到: {backup_name}")
        except Exception as e:
            logger.warning(f"備份原始文件失敗: {str(e)}")
            
    def validate_schema(self, data: Union[Dict, List]):
        """驗(yàn)證JSON Schema"""
        if not self.schema_file or not os.path.exists(self.schema_file):
            logger.info("未指定Schema文件，跳過(guò)Schema驗(yàn)證")
            return True
            
        try:
            with open(self.schema_file, 'r', encoding='utf-8') as f:
                schema = json.load(f)
                
            # 驗(yàn)證數(shù)據(jù)
            jsonschema.validate(instance=data, schema=schema)
            logger.info("JSON數(shù)據(jù)通過(guò)Schema驗(yàn)證")
            return True
            
        except jsonschema.ValidationError as e:
            logger.error(f"數(shù)據(jù)驗(yàn)證失敗: {str(e)}")
            self.stats['errors'].append({
                'type': 'schema_validation',
                'message': str(e)
            })
            return False
        except jsonschema.SchemaError as e:
            logger.error(f"Schema定義錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"Schema格式錯(cuò)誤: {str(e)}")
        except Exception as e:
            logger.error(f"Schema驗(yàn)證時(shí)發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"Schema驗(yàn)證失敗: {str(e)}")
            
    def flatten_dict(self, d: Dict, parent_key: str = '', sep: str = '.') -> Dict:
        """扁平化嵌套字典"""
        items = []
        for k, v in d.items():
            new_key = f"{parent_key}{sep}{k}" if parent_key else k
            if isinstance(v, dict) and (self.flatten_depth <= 0 or parent_key.count(sep) < self.flatten_depth):
                items.extend(self.flatten_dict(v, new_key, sep=sep).items())
            elif isinstance(v, list) and v and isinstance(v[0], dict):
                # 處理字典列表
                for i, item in enumerate(v):
                    if isinstance(item, dict):
                        items.extend(self.flatten_dict(item, f"{new_key}[{i}]", sep=sep).items())
                    else:
                        items.append((f"{new_key}[{i}]", item))
            else:
                items.append((new_key, v))
        return dict(items)
        
    def process_record(self, record: Dict) -> Dict:
        """處理單條記錄"""
        try:
            processed_record = record.copy()
            
            # 字段重命名
            field_mapping = self.options.get('field_mapping', {})
            for old_name, new_name in field_mapping.items():
                if old_name in processed_record:
                    processed_record[new_name] = processed_record.pop(old_name)
                    
            # 數(shù)據(jù)類型轉(zhuǎn)換
            type_conversions = self.options.get('type_conversions', {})
            for field, target_type in type_conversions.items():
                if field in processed_record:
                    try:
                        if target_type == 'int':
                            processed_record[field] = int(processed_record[field])
                        elif target_type == 'float':
                            processed_record[field] = float(processed_record[field])
                        elif target_type == 'str':
                            processed_record[field] = str(processed_record[field])
                        elif target_type == 'bool':
                            processed_record[field] = bool(processed_record[field])
                    except (ValueError, TypeError) as e:
                        logger.warning(f"字段 {field} 類型轉(zhuǎn)換失敗: {str(e)}")
                        self.stats['warnings'].append({
                            'type': 'type_conversion',
                            'field': field,
                            'message': str(e)
                        })
                        
            # 數(shù)據(jù)清洗
            cleaning_options = self.options.get('cleaning', {})
            if cleaning_options.get('remove_empty', False):
                processed_record = {k: v for k, v in processed_record.items() if v is not None and v != ''}
                
            # 扁平化處理
            if self.flatten_depth > 0:
                processed_record = self.flatten_dict(processed_record)
                
            return processed_record
            
        except Exception as e:
            logger.error(f"處理記錄時(shí)發(fā)生錯(cuò)誤: {str(e)}")
            self.stats['errors'].append({
                'type': 'record_processing',
                'message': str(e)
            })
            return record  # 返回原始記錄
            
    def process_data(self, data: Union[Dict, List]) -> Union[Dict, List]:
        """處理JSON數(shù)據(jù)"""
        logger.info("開(kāi)始處理JSON數(shù)據(jù)...")
        
        try:
            if isinstance(data, list):
                # 處理記錄列表
                processed_data = []
                for i, record in enumerate(data):
                    if isinstance(record, dict):
                        processed_record = self.process_record(record)
                        processed_data.append(processed_record)
                        self.stats['processed_records'] += 1
                    else:
                        processed_data.append(record)
                        self.stats['processed_records'] += 1
                        
                    # 顯示進(jìn)度
                    if (i + 1) % 1000 == 0:
                        logger.info(f"已處理 {i + 1} 條記錄")
                        
            elif isinstance(data, dict):
                # 處理單個(gè)字典
                processed_data = self.process_record(data)
                self.stats['processed_records'] = 1
                
            else:
                # 其他類型直接返回
                processed_data = data
                self.stats['processed_records'] = 1
                
            logger.info(f"數(shù)據(jù)處理完成，共處理 {self.stats['processed_records']} 條記錄")
            return processed_data
            
        except Exception as e:
            logger.error(f"處理JSON數(shù)據(jù)時(shí)發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"數(shù)據(jù)處理失敗: {str(e)}")
            
    def convert_to_csv(self, data: Union[Dict, List], output_file: str):
        """轉(zhuǎn)換為CSV格式"""
        try:
            if isinstance(data, list) and all(isinstance(item, dict) for item in data):
                # 轉(zhuǎn)換記錄列表為DataFrame
                df = pd.DataFrame(data)
                df.to_csv(output_file, index=False, encoding='utf-8-sig')
            elif isinstance(data, dict):
                # 轉(zhuǎn)換單個(gè)字典為DataFrame
                df = pd.DataFrame([data])
                df.to_csv(output_file, index=False, encoding='utf-8-sig')
            else:
                raise JSONProcessingError("數(shù)據(jù)格式不支持轉(zhuǎn)換為CSV")
                
            logger.info(f"數(shù)據(jù)已轉(zhuǎn)換為CSV格式并保存到: {output_file}")
            
        except Exception as e:
            logger.error(f"轉(zhuǎn)換為CSV時(shí)發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"CSV轉(zhuǎn)換失敗: {str(e)}")
            
    def convert_to_excel(self, data: Union[Dict, List], output_file: str):
        """轉(zhuǎn)換為Excel格式"""
        try:
            if isinstance(data, list) and all(isinstance(item, dict) for item in data):
                # 轉(zhuǎn)換記錄列表為DataFrame
                df = pd.DataFrame(data)
                df.to_excel(output_file, index=False)
            elif isinstance(data, dict):
                # 轉(zhuǎn)換單個(gè)字典為DataFrame
                df = pd.DataFrame([data])
                df.to_excel(output_file, index=False)
            else:
                raise JSONProcessingError("數(shù)據(jù)格式不支持轉(zhuǎn)換為Excel")
                
            logger.info(f"數(shù)據(jù)已轉(zhuǎn)換為Excel格式并保存到: {output_file}")
            
        except Exception as e:
            logger.error(f"轉(zhuǎn)換為Excel時(shí)發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"Excel轉(zhuǎn)換失敗: {str(e)}")
            
    def convert_to_xml(self, data: Union[Dict, List], output_file: str):
        """轉(zhuǎn)換為XML格式"""
        try:
            def dict_to_xml(d, root):
                for k, v in d.items():
                    if isinstance(v, dict):
                        sub_root = ET.SubElement(root, k)
                        dict_to_xml(v, sub_root)
                    elif isinstance(v, list):
                        for item in v:
                            if isinstance(item, dict):
                                sub_root = ET.SubElement(root, k)
                                dict_to_xml(item, sub_root)
                            else:
                                sub_element = ET.SubElement(root, k)
                                sub_element.text = str(item)
                    else:
                        sub_element = ET.SubElement(root, k)
                        sub_element.text = str(v)
                        
            # 創(chuàng)建根元素
            root = ET.Element("root")
            
            if isinstance(data, list):
                for item in data:
                    if isinstance(item, dict):
                        item_root = ET.SubElement(root, "item")
                        dict_to_xml(item, item_root)
            elif isinstance(data, dict):
                dict_to_xml(data, root)
                
            # 保存XML文件
            tree = ET.ElementTree(root)
            tree.write(output_file, encoding='utf-8', xml_declaration=True)
            logger.info(f"數(shù)據(jù)已轉(zhuǎn)換為XML格式并保存到: {output_file}")
            
        except Exception as e:
            logger.error(f"轉(zhuǎn)換為XML時(shí)發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"XML轉(zhuǎn)換失敗: {str(e)}")
            
    def save_results(self, data: Union[Dict, List]):
        """保存處理結(jié)果"""
        try:
            # 確保輸出目錄存在
            output_dir = os.path.dirname(self.output_file) if os.path.dirname(self.output_file) else '.'
            os.makedirs(output_dir, exist_ok=True)
            
            if self.output_format == 'json':
                with open(self.output_file, 'w', encoding='utf-8') as f:
                    json.dump(data, f, indent=2, ensure_ascii=False)
            elif self.output_format == 'csv':
                self.convert_to_csv(data, self.output_file)
            elif self.output_format == 'excel':
                self.convert_to_excel(data, self.output_file)
            elif self.output_format == 'xml':
                self.convert_to_xml(data, self.output_file)
            else:
                logger.error(f"不支持的輸出格式: {self.output_format}")
                raise JSONProcessingError(f"不支持的輸出格式: {self.output_format}")
                
            logger.info(f"處理結(jié)果已保存到 {self.output_file}")
            
        except Exception as e:
            logger.error(f"保存處理結(jié)果時(shí)出錯(cuò): {str(e)}")
            raise JSONProcessingError(f"保存失敗: {str(e)}")
            
    def generate_report(self):
        """生成處理報(bào)告"""
        try:
            report = {
                'timestamp': datetime.now().isoformat(),
                'input_file': self.input_file,
                'output_file': self.output_file,
                'processing_stats': self.stats,
                'options': self.options
            }
            
            report_file = f"{self.output_file}.report.json"
            with open(report_file, 'w', encoding='utf-8') as f:
                json.dump(report, f, indent=2, ensure_ascii=False)
                
            logger.info(f"處理報(bào)告已保存到 {report_file}")
            
            # 打印簡(jiǎn)要報(bào)告
            print("\n" + "="*50)
            print("JSON數(shù)據(jù)處理報(bào)告")
            print("="*50)
            print(f"處理時(shí)間: {report['timestamp']}")
            print(f"輸入文件: {self.input_file}")
            print(f"輸出文件: {self.output_file}")
            print("-"*50)
            print(f"總記錄數(shù): {self.stats['total_records']}")
            print(f"處理記錄數(shù): {self.stats['processed_records']}")
            print(f"錯(cuò)誤數(shù): {len(self.stats['errors'])}")
            print(f"警告數(shù): {len(self.stats['warnings'])}")
            print("="*50)
            
        except Exception as e:
            logger.error(f"生成處理報(bào)告時(shí)出錯(cuò): {str(e)}")
            
    def run_processing(self):
        """運(yùn)行JSON數(shù)據(jù)處理"""
        logger.info("開(kāi)始JSON數(shù)據(jù)處理...")
        
        try:
            # 1. 備份原始文件
            self.backup_original()
            
            # 2. 加載JSON數(shù)據(jù)
            data = self.load_json_data()
            
            # 3. 驗(yàn)證Schema
            self.validate_schema(data)
            
            # 4. 處理數(shù)據(jù)
            processed_data = self.process_data(data)
            
            # 5. 保存結(jié)果
            self.save_results(processed_data)
            
            # 6. 生成報(bào)告
            self.generate_report()
            
            logger.info("JSON數(shù)據(jù)處理完成")
            return processed_data
            
        except Exception as e:
            logger.error(f"JSON數(shù)據(jù)處理過(guò)程中發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"處理失敗: {str(e)}")
            
    def process_large_file_streaming(self):
        """流式處理大文件"""
        logger.info("開(kāi)始流式處理大JSON文件...")
        
        try:
            # 自動(dòng)檢測(cè)編碼
            if self.encoding == 'auto':
                self.encoding = self.detect_encoding(self.input_file)
                
            processed_records = []
            record_count = 0
            
            with open(self.input_file, 'r', encoding=self.encoding) as f:
                # 檢測(cè)文件格式
                first_char = f.read(1)
                f.seek(0)
                
                if first_char == '[':
                    # JSON數(shù)組格式
                    data = json.load(f)
                    if isinstance(data, list):
                        for record in data:
                            if isinstance(record, dict):
                                processed_record = self.process_record(record)
                                processed_records.append(processed_record)
                                record_count += 1
                                
                                # 分批保存避免內(nèi)存溢出
                                if record_count % 10000 == 0:
                                    logger.info(f"已處理 {record_count} 條記錄")
                else:
                    # JSON Lines格式
                    for line_num, line in enumerate(f, 1):
                        if line.strip():
                            try:
                                record = json.loads(line)
                                processed_record = self.process_record(record)
                                processed_records.append(processed_record)
                                record_count += 1
                                
                                # 分批保存避免內(nèi)存溢出
                                if record_count % 10000 == 0:
                                    logger.info(f"已處理 {record_count} 條記錄")
                            except json.JSONDecodeError as e:
                                logger.warning(f"第 {line_num} 行JSON解析失敗: {str(e)}")
                                
            self.stats['processed_records'] = record_count
            logger.info(f"流式處理完成，共處理 {record_count} 條記錄")
            
            # 保存結(jié)果
            self.save_results(processed_records)
            self.generate_report()
            
            return processed_records
            
        except Exception as e:
            logger.error(f"流式處理過(guò)程中發(fā)生錯(cuò)誤: {str(e)}")
            raise JSONProcessingError(f"流式處理失敗: {str(e)}")

def create_sample_schema():
    """創(chuàng)建示例Schema文件"""
    sample_schema = {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            "id": {
                "type": "integer"
            },
            "name": {
                "type": "string"
            },
            "email": {
                "type": "string",
                "format": "email"
            },
            "age": {
                "type": "integer",
                "minimum": 0,
                "maximum": 120
            },
            "active": {
                "type": "boolean"
            }
        },
        "required": ["id", "name", "email"]
    }
    
    with open('sample_schema.json', 'w', encoding='utf-8') as f:
        json.dump(sample_schema, f, indent=2, ensure_ascii=False)
    logger.info("示例Schema文件已創(chuàng)建: sample_schema.json")

def create_sample_config():
    """創(chuàng)建示例配置文件"""
    sample_config = {
        "field_mapping": {
            "user_id": "id",
            "full_name": "name",
            "is_active": "active"
        },
        "type_conversions": {
            "age": "int",
            "salary": "float",
            "active": "bool"
        },
        "cleaning": {
            "remove_empty": True
        }
    }
    
    with open('sample_config.json', 'w', encoding='utf-8') as f:
        json.dump(sample_config, f, indent=2, ensure_ascii=False)
    logger.info("示例配置文件已創(chuàng)建: sample_config.json")

def main():
    parser = argparse.ArgumentParser(description='JSON數(shù)據(jù)處理和轉(zhuǎn)換工具')
    parser.add_argument('input_file', help='輸入JSON文件路徑')
    parser.add_argument('-o', '--output', help='輸出文件路徑')
    parser.add_argument('-e', '--encoding', default='auto', help='文件編碼 (auto/utf-8/gbk等)')
    parser.add_argument('-s', '--schema', help='JSON Schema驗(yàn)證文件路徑')
    parser.add_argument('-f', '--format', choices=['json', 'csv', 'excel', 'xml'], default='json', help='輸出格式')
    parser.add_argument('-c', '--config', help='處理配置文件路徑')
    parser.add_argument('--flatten-depth', type=int, default=0, help='扁平化深度 (0表示不限制)')
    parser.add_argument('--streaming', action='store_true', help='流式處理大文件')
    parser.add_argument('--no-backup', action='store_true', help='不備份原始文件')
    parser.add_argument('--sample-schema', action='store_true', help='創(chuàng)建示例Schema文件')
    parser.add_argument('--sample-config', action='store_true', help='創(chuàng)建示例配置文件')
    
    args = parser.parse_args()
    
    if args.sample_schema:
        create_sample_schema()
        return
        
    if args.sample_config:
        create_sample_config()
        return
        
    # 加載配置文件
    options = {}
    if args.config and os.path.exists(args.config):
        try:
            with open(args.config, 'r', encoding='utf-8') as f:
                options = json.load(f)
        except Exception as e:
            logger.error(f"加載配置文件失敗: {str(e)}")
            
    # 配置處理參數(shù)
    config = {
        'input_file': args.input_file,
        'output_file': args.output or f"processed_{os.path.basename(args.input_file)}",
        'encoding': args.encoding,
        'schema_file': args.schema,
        'output_format': args.format,
        'flatten_depth': args.flatten_depth,
        'backup_original': not args.no_backup,
        'options': options
    }
    
    # 創(chuàng)建處理器實(shí)例
    processor = JSONProcessor(config)
    
    try:
        # 執(zhí)行處理
        if args.streaming:
            processor.process_large_file_streaming()
        else:
            processor.run_processing()
            
    except KeyboardInterrupt:
        logger.info("JSON數(shù)據(jù)處理被用戶中斷")
        sys.exit(1)
    except JSONProcessingError as e:
        logger.error(f"JSON處理錯(cuò)誤: {str(e)}")
        sys.exit(1)
    except Exception as e:
        logger.error(f"JSON數(shù)據(jù)處理過(guò)程中發(fā)生未知錯(cuò)誤: {str(e)}")
        sys.exit(1)

if __name__ == '__main__':
    main()

使用說(shuō)明

1. 基本使用

# 基本JSON處理
python json_processor.py data.json

# 指定輸出文件
python json_processor.py data.json -o processed_data.json

# 指定文件編碼
python json_processor.py data.json -e utf-8

# 轉(zhuǎn)換為CSV格式
python json_processor.py data.json -f csv -o result.csv

2. 數(shù)據(jù)驗(yàn)證

# 使用Schema驗(yàn)證
python json_processor.py data.json -s schema.json

# 創(chuàng)建示例Schema文件
python json_processor.py --sample-schema

3. 數(shù)據(jù)轉(zhuǎn)換

# 轉(zhuǎn)換為Excel格式
python json_processor.py data.json -f excel -o result.xlsx

# 轉(zhuǎn)換為XML格式
python json_processor.py data.json -f xml -o result.xml

# 扁平化嵌套結(jié)構(gòu)
python json_processor.py data.json --flatten-depth 2

4. 配置文件使用

# 使用配置文件
python json_processor.py data.json -c config.json

# 創(chuàng)建示例配置文件
python json_processor.py --sample-config

5. 大文件處理

# 流式處理大文件
python json_processor.py large_data.json --streaming

# 不備份原始文件
python json_processor.py data.json --no-backup

配置文件示例

創(chuàng)建一個(gè)名為 config.json 的配置文件：

{
  "field_mapping": {
    "user_id": "id",
    "full_name": "name",
    "email_address": "email",
    "is_active": "active"
  },
  "type_conversions": {
    "age": "int",
    "salary": "float",
    "active": "bool"
  },
  "cleaning": {
    "remove_empty": true
  }
}

Schema文件示例

創(chuàng)建一個(gè)名為 schema.json 的Schema驗(yàn)證文件：

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id": {
        "type": "integer",
        "minimum": 1
      },
      "name": {
        "type": "string",
        "minLength": 1
      },
      "email": {
        "type": "string",
        "format": "email"
      },
      "age": {
        "type": "integer",
        "minimum": 0,
        "maximum": 120
      },
      "active": {
        "type": "boolean"
      }
    },
    "required": ["id", "name", "email"],
    "additionalProperties": false
  }
}

高級(jí)特性

1. 批量處理

可以通過(guò)腳本實(shí)現(xiàn)批量處理多個(gè)文件：

import glob
import os

# 處理目錄下所有JSON文件
json_files = glob.glob("data/*.json")

for json_file in json_files:
    config = {
        'input_file': json_file,
        'output_file': f"processed_{os.path.basename(json_file)}",
        'schema_file': 'schema.json',
        'options': {
            'field_mapping': {'old_field': 'new_field'},
            'type_conversions': {'age': 'int'}
        }
    }
    
    processor = JSONProcessor(config)
    processor.run_processing()

2. 自動(dòng)化調(diào)度

可以結(jié)合cron實(shí)現(xiàn)定期自動(dòng)處理：

# 每天凌晨3點(diǎn)處理JSON數(shù)據(jù)
0 3 * * * /usr/bin/python3 /path/to/json_processor.py /path/to/daily_data.json -s /path/to/schema.json

3. API數(shù)據(jù)處理

可以處理API返回的JSON數(shù)據(jù)：

import requests

# 獲取API數(shù)據(jù)
response = requests.get('https://api.example.com/data')
data = response.json()

# 保存為臨時(shí)文件
with open('temp_data.json', 'w') as f:
    json.dump(data, f)

# 處理數(shù)據(jù)
config = {'input_file': 'temp_data.json'}
processor = JSONProcessor(config)
processor.run_processing()

性能優(yōu)化

1. 內(nèi)存管理

對(duì)于大文件使用流式處理避免內(nèi)存溢出
及時(shí)釋放不需要的數(shù)據(jù)結(jié)構(gòu)
使用適當(dāng)?shù)臄?shù)據(jù)類型減少內(nèi)存占用

2. 處理速度優(yōu)化

批量處理減少I(mǎi)/O操作
向量化操作替代循環(huán)處理
合理設(shè)置緩沖區(qū)大小

3. 錯(cuò)誤處理優(yōu)化

實(shí)現(xiàn)優(yōu)雅的錯(cuò)誤恢復(fù)機(jī)制
記錄詳細(xì)的處理日志便于問(wèn)題追蹤
提供友好的錯(cuò)誤提示信息

安全考慮

1. 數(shù)據(jù)安全

處理前自動(dòng)備份原始數(shù)據(jù)
敏感數(shù)據(jù)脫敏處理
輸出文件權(quán)限合理設(shè)置

2. 文件安全

驗(yàn)證輸入文件的合法性
限制輸出文件路徑避免任意文件寫(xiě)入
檢查文件大小避免處理異常大文件

3. 系統(tǒng)安全

限制處理文件的數(shù)量和大小
實(shí)現(xiàn)處理超時(shí)機(jī)制
記錄所有操作日志便于審計(jì)

這個(gè)JSON數(shù)據(jù)處理和轉(zhuǎn)換工具是一個(gè)功能強(qiáng)大、安全可靠的數(shù)據(jù)處理工具，能夠幫助用戶高效地處理和轉(zhuǎn)換JSON格式的數(shù)據(jù)，為后續(xù)的數(shù)據(jù)分析和應(yīng)用提供高質(zhì)量的數(shù)據(jù)基礎(chǔ)。

到此這篇關(guān)于Python數(shù)據(jù)處理之JSON數(shù)據(jù)轉(zhuǎn)換和處理詳解的文章就介紹到這了,更多相關(guān)Python JSON數(shù)據(jù)處理內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

Python數(shù)據(jù)處理之JSON數(shù)據(jù)轉(zhuǎn)換和處理詳解

目錄

功能介紹

場(chǎng)景應(yīng)用

1. API數(shù)據(jù)處理

2. 配置文件管理

3. 日志數(shù)據(jù)分析

4. 數(shù)據(jù)集成項(xiàng)目

報(bào)錯(cuò)處理

1. JSON解析異常

2. 數(shù)據(jù)驗(yàn)證異常

3. 文件操作異常

4. 內(nèi)存溢出異常

代碼實(shí)現(xiàn)

使用說(shuō)明

1. 基本使用

2. 數(shù)據(jù)驗(yàn)證

3. 數(shù)據(jù)轉(zhuǎn)換

4. 配置文件使用

5. 大文件處理

配置文件示例

Schema文件示例

高級(jí)特性

1. 批量處理

2. 自動(dòng)化調(diào)度

3. API數(shù)據(jù)處理

性能優(yōu)化

1. 內(nèi)存管理

2. 處理速度優(yōu)化

3. 錯(cuò)誤處理優(yōu)化

安全考慮

1. 數(shù)據(jù)安全

2. 文件安全

3. 系統(tǒng)安全

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线 免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

Python數(shù)據(jù)處理之JSON數(shù)據(jù)轉(zhuǎn)換和處理詳解

目錄

功能介紹

場(chǎng)景應(yīng)用

1. API數(shù)據(jù)處理

2. 配置文件管理

3. 日志數(shù)據(jù)分析

4. 數(shù)據(jù)集成項(xiàng)目

報(bào)錯(cuò)處理

1. JSON解析異常

2. 數(shù)據(jù)驗(yàn)證異常

3. 文件操作異常

4. 內(nèi)存溢出異常

代碼實(shí)現(xiàn)

使用說(shuō)明

1. 基本使用

2. 數(shù)據(jù)驗(yàn)證

3. 數(shù)據(jù)轉(zhuǎn)換

4. 配置文件使用

5. 大文件處理

配置文件示例

Schema文件示例

高級(jí)特性

1. 批量處理

2. 自動(dòng)化調(diào)度

3. API數(shù)據(jù)處理

性能優(yōu)化

1. 內(nèi)存管理

2. 處理速度優(yōu)化

3. 錯(cuò)誤處理優(yōu)化

安全考慮

1. 數(shù)據(jù)安全

2. 文件安全

3. 系統(tǒng)安全

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕