Python實(shí)現(xiàn)將HTML表格一鍵導(dǎo)出為Excel

更新時(shí)間：2025年11月27日 09:04:04 作者：weixin_46244623

在數(shù)據(jù)處理和網(wǎng)頁(yè)爬蟲項(xiàng)目中,我們經(jīng)常會(huì)遇到從 HTML 頁(yè)面中提取表格的需求,本文將使用Python,BeautifulSoup和pandas實(shí)現(xiàn)一鍵將 HTML中的多個(gè)表格導(dǎo)出為Excel文件,需要的可以了解下

在數(shù)據(jù)處理和網(wǎng)頁(yè)爬蟲項(xiàng)目中，我們經(jīng)常會(huì)遇到從 HTML 頁(yè)面中提取表格的需求。手動(dòng)復(fù)制粘貼不僅低效，還容易出錯(cuò)。本文將帶你使用 Python + BeautifulSoup + pandas，實(shí)現(xiàn) 一鍵將 HTML 中的多個(gè)表格導(dǎo)出為 Excel 文件（.xlsx），支持多 Sheet 自動(dòng)分表，代碼簡(jiǎn)潔、實(shí)用性強(qiáng)。

一、依賴安裝

pip install beautifulsoup4 pandas openpyxl

二、實(shí)現(xiàn)代碼

from bs4 import BeautifulSoup
import pandas as pd


def html_table_to_xlsx(html_content, output_file):
    """
    將 HTML 中的表格提取并導(dǎo)出為 xlsx 文件。

    :param html_content: HTML 文本內(nèi)容
    :param output_file: 導(dǎo)出的 xlsx 文件路徑
    """
    # 使用 BeautifulSoup 解析 HTML
    soup = BeautifulSoup(html_content, 'html.parser')


    # 查找 HTML 中的所有表格
    tables = soup.find_all('table')
    if not tables:
        print("HTML 中沒(méi)有找到表格！")
        return


    # 逐個(gè)解析表格并導(dǎo)出到 Excel
    with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
        for i, table in enumerate(tables):
            # 將表格轉(zhuǎn)為 DataFrame
            df = pd.read_html(str(table))[0]
            # 寫入 Excel，不同表格寫入不同的 sheet
            sheet_name = f"Sheet{i + 1}"
            df.to_excel(writer, index=False, sheet_name=sheet_name)


    print(f"表格已成功導(dǎo)出到 {output_file}")


# 示例 HTML 內(nèi)容
html_content = """
<html>
<head><title>測(cè)試表格</title></head>
<body>
    <table border="1">
        <tr>
            <th>姓名</th>
            <th>年齡</th>
            <th>城市</th>
        </tr>
        <tr>
            <td>張三</td>
            <td>28</td>
            <td>北京</td>
        </tr>
        <tr>
            <td>李四</td>
            <td>34</td>
            <td>上海</td>
        </tr>
    </table>
</body>
</html>
"""


# 調(diào)用函數(shù)，將 HTML 中的表格導(dǎo)出為 Excel 文件
html_table_to_xlsx(html_content, "output.xlsx")

三、最終效果

四、方法補(bǔ)充

Python 解析 HTML 表格并轉(zhuǎn)換為 Excel 表格

在處理數(shù)據(jù)時(shí)，我們常常會(huì)遇到需要從 HTML 表格中提取信息并將其轉(zhuǎn)換為 Excel 文件的需求。本文將介紹如何使用 Python 來(lái)解析 HTML 表格，并將其轉(zhuǎn)換為 Excel 表格。

準(zhǔn)備工作

在開始之前，請(qǐng)確保您的環(huán)境中已經(jīng)安裝了必要的庫(kù)：

beautifulsoup4：用于解析 HTML。
pandas：用于處理和轉(zhuǎn)換數(shù)據(jù)。
openpyxl：用于生成 Excel 文件。

可以通過(guò)以下命令安裝這些庫(kù)：

pip install beautifulsoup4 pandas openpyxl

解析 HTML 表格

首先，我們需要從一個(gè) HTML 文件中提取表格數(shù)據(jù)。假設(shè)我們有一個(gè)簡(jiǎn)單的 HTML 文件，其中包含一個(gè)表格。

HTML 示例代碼：

<!DOCTYPE html>
<html>
<body>

<table border="1">
  <tr>
    <th>姓名</th>
    <th>年齡</th>
    <th>職業(yè)</th>
  </tr>
  <tr>
    <td>張三</td>
    <td>25</td>
    <td>工程師</td>
  </tr>
  <tr>
    <td>李四</td>
    <td>30</td>
    <td>醫(yī)生</td>
  </tr>
</table>

</body>
</html>

接下來(lái)，我們將使用 BeautifulSoup 來(lái)解析這個(gè) HTML 文件并提取表格數(shù)據(jù)。

解析代碼示例

from bs4 import BeautifulSoup

# 讀取 HTML 文件
with open('example.html', 'r', encoding='utf-8') as f:
    html_content = f.read()

# 使用 BeautifulSoup 解析 HTML
soup = BeautifulSoup(html_content, 'html.parser')

# 查找所有的表格
tables = soup.find_all('table')

# 遍歷每個(gè)表格
for table in tables:
    # 提取表頭
    headers = [header.text for header in table.find_all('th')]
    
    # 提取表格數(shù)據(jù)
    data = []
    for row in table.find_all('tr')[1:]:  # 跳過(guò)表頭行
        cells = row.find_all('td')
        row_data = [cell.text for cell in cells]
        data.append(row_data)
    
    print("表頭:", headers)
    print("數(shù)據(jù):", data)

將數(shù)據(jù)轉(zhuǎn)換為 Excel

現(xiàn)在我們已經(jīng)成功提取了表格數(shù)據(jù)，接下來(lái)我們將使用 Pandas 將其轉(zhuǎn)換為 Excel 文件。

轉(zhuǎn)換代碼示例

import pandas as pd

# 創(chuàng)建 DataFrame
df = pd.DataFrame(data, columns=headers)

# 將 DataFrame 寫入 Excel 文件
df.to_excel('output.xlsx', index=False)

上述代碼將提取的表格數(shù)據(jù)保存到名為 output.xlsx 的 Excel 文件中。

完整代碼示例

以下是完整的代碼示例，結(jié)合了 HTML 解析和 Excel 轉(zhuǎn)換的功能。

from bs4 import BeautifulSoup
import pandas as pd

# 讀取 HTML 文件
with open('example.html', 'r', encoding='utf-8') as f:
    html_content = f.read()

# 使用 BeautifulSoup 解析 HTML
soup = BeautifulSoup(html_content, 'html.parser')

# 查找所有的表格
tables = soup.find_all('table')

# 遍歷每個(gè)表格
for i, table in enumerate(tables):
    # 提取表頭
    headers = [header.text for header in table.find_all('th')]
    
    # 提取表格數(shù)據(jù)
    data = []
    for row in table.find_all('tr')[1:]:  # 跳過(guò)表頭行
        cells = row.find_all('td')
        row_data = [cell.text for cell in cells]
        data.append(row_data)
    
    # 創(chuàng)建 DataFrame
    df = pd.DataFrame(data, columns=headers)
    
    # 將 DataFrame 寫入 Excel 文件
    df.to_excel(f'table_{i+1}.xlsx', index=False)

到此這篇關(guān)于Python實(shí)現(xiàn)將HTML表格一鍵導(dǎo)出為Excel的文章就介紹到這了,更多相關(guān)Python HTML表格轉(zhuǎn)Excel內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: