Python的pandas庫基礎(chǔ)知識(shí)超詳細(xì)教程
Pandas是Python中用于數(shù)據(jù)處理和分析的核心庫,提供了快速、靈活且明確的數(shù)據(jù)結(jié)構(gòu),主要包括一維的Series和二維的DataFrame。它支持從CSV、Excel、SQL等多種數(shù)據(jù)源導(dǎo)入數(shù)據(jù),并具備數(shù)據(jù)清洗、合并、重塑、分組統(tǒng)計(jì)、時(shí)間序列分析等功能。Pandas還易于與其他Python數(shù)據(jù)分析庫集成,是金融、統(tǒng)計(jì)、社會(huì)科學(xué)和工程等領(lǐng)域進(jìn)行數(shù)據(jù)分析和處理的強(qiáng)大工具。
一、配置環(huán)境
在命令行中運(yùn)行以下命令:
pip show pandas

如果為以下內(nèi)容,則表示未安裝pandas庫

要安裝Pandas庫,你可以使用Python的包管理工具pip。在命令行界面(例如終端、命令提示符或Anaconda Prompt,取決于你的操作系統(tǒng)和Python安裝方式)中,輸入以下命令:
pip install pandas

安裝成功展示圖:

二、序列和數(shù)據(jù)表
2.1 初始化
Series可以存儲(chǔ)任何數(shù)據(jù)類型,例如整數(shù)、浮點(diǎn)數(shù)、字符串、python對(duì)象等,每個(gè)元素都有一個(gè)索引。
import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A)

2.2 獲取數(shù)值
import pandas as pd
A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1")
print(A)
print("數(shù)值:", A.values)
2.3 獲取索引
import pandas as pd
A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1")
print(A)
print("索引:", A.index)
2.4 索引取內(nèi)容
import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A) print(A[["A", "C"]])

2.5 索引改變?nèi)≈?/h3>
import pandas as pd
A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1")
print(A)
A[["A", "C"]] = [11, 12]
print(A)
import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A) A[["A", "C"]] = [11, 12] print(A)

2.6 字典生成序列
import pandas as pd
A = pd.Series({"A":1, "B":2, "C":3, "D":4})
print(A)
2.7 計(jì)算取值出現(xiàn)次數(shù)
import pandas as pd
A = pd.Series({"A":1, "B":2, "C":3, "D":4, "E":2, "F":3})
print(A.value_counts())
2.8 數(shù)據(jù)表
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"]}
B = pd.DataFrame(A)
print(B)
2.9 數(shù)據(jù)表添加新變量
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"]}
B = pd.DataFrame(A)
print(B)
B["high"] = ["180", "183", "160", "178", "158"]
print(B)
2.10 獲取列名
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B)
print("數(shù)據(jù)表列名:", B.columns)
2.11 根據(jù)列名獲取數(shù)據(jù)
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B)
print(B[["name", "sex"]])
2.12 輸出固定行
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B.loc[2])
2.13 輸出多行
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B.loc[2 : 4])
2.14 輸出指定行和列
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B.loc[2 : 4, ["name", "high"]])
2.15 輸出性別為“男”的行和列
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B.loc[B.sex == "男", ["name", "sex"]])
2.16 獲取指定行
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B.iloc[0 : 2])
2.17 獲取指定列
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B.iloc[ : , 0 : 2])
2.18 獲取指定位置數(shù)據(jù)
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(B.iloc[0 : 2, 0 : 2])
2.19 索引轉(zhuǎn)化
import numpy as np
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
# 轉(zhuǎn)換為列表
print(B.iloc[list(B.sex == "男"), 0 : 3])
# 轉(zhuǎn)換為數(shù)組
print(B.iloc[np.array(B.sex == "男"), 0 : 3])
2.20 判斷條件
import numpy as np
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
print(list(B.age >= "18"))
2.21 重新賦值
import numpy as np
import pandas as pd
A = {"name": ["小米", "小華", "小魅", "小破", "小領(lǐng)"],
"age": ["20", "18", "16", "23", "19"],
"sex": ["男", "男", "女", "男", "女"],
"high": ["180", "183", "160", "178", "158"]}
B = pd.DataFrame(A)
B.high = ["179", "186", "168", "183", "160"]
print(B)
三、數(shù)據(jù)聚合和分組運(yùn)算
3.1 獲取數(shù)據(jù)集
iris.csv(iris數(shù)據(jù)集、鳶尾花數(shù)據(jù)集)
3.2 讀取數(shù)據(jù)集
鳶尾花數(shù)據(jù)集(Iris Dataset),又稱安德森鳶尾花卉數(shù)據(jù)集(Anderson’s Iris Data Set),是數(shù)據(jù)科學(xué)與機(jī)器學(xué)習(xí)領(lǐng)域中最著名的經(jīng)典數(shù)據(jù)集之一。
鳶尾花數(shù)據(jù)集可以通過多種方式獲取,如Scikit-learn提供的內(nèi)置數(shù)據(jù)集,以及UCI機(jī)器學(xué)習(xí)庫等。獲取后,可以使用Python等編程語言進(jìn)行數(shù)據(jù)加載、預(yù)處理和模型訓(xùn)練等操作。
鳶尾花數(shù)據(jù)集以其簡潔明了的數(shù)據(jù)結(jié)構(gòu)和廣泛的應(yīng)用場景,成為了機(jī)器學(xué)習(xí)初學(xué)者的首選案例。通過學(xué)習(xí)和實(shí)踐這一數(shù)據(jù)集,初學(xué)者可以逐步掌握機(jī)器學(xué)習(xí)的基礎(chǔ)知識(shí)和技能。
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
print(iris.head())
3.3 計(jì)算每列均值
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
print(iris.iloc[ : , 1 : 5].apply(func = np.mean, axis = 0))
3.4 計(jì)算每列的最小值
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
min = iris.iloc[ : , 1 : 5].apply(func = np.min , axis = 0)
print(min)
3.5 計(jì)算每列的最大值
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
max = iris.iloc[ : , 1 : 5].apply(func = np.max , axis = 0)
print(max)
3.6 計(jì)算每列的樣本數(shù)量
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
size = iris.iloc[ : , 1 : 5].apply(func = np.size , axis = 0)
print(size)
3.7 行計(jì)算
只展示前五行
其中代碼的axis=0要改成axis=1
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
data = iris.iloc[0 : 5, 1 : 5].apply(func = (np.min, np.max, np.mean, np.std, np.var) , axis = 1)
print(data)
3.8 分組計(jì)算均值
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
res = iris.drop("Id", axis = 1).groupby(by = "Species").mean()
print(res)
3.9 分組計(jì)算偏度
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
res = iris.drop("Id", axis = 1).groupby(by = "Species").skew()
print(res)
3.10 聚合運(yùn)算
3.10.1 分組前
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
res = iris.drop("Id", axis = 1).agg({"SepalLengthCm" : ["min", "max", "mean"],
"SepalWidthCm" : ["min", "max", "mean"],
"PetalLengthCm" : ["min", "max", "mean"]})
print(res)
3.10.2 分組后
import numpy as np
import pandas as pd
iris = pd.read_csv("D:/iris.csv")
res = (iris.drop("Id", axis = 1).groupby(by = "SepalLengthCm")
.agg({"SepalLengthCm" : ["min", "max", "mean"],
"SepalWidthCm" : ["min"],
"PetalLengthCm" : ["skew"]}))
print(res)
四、數(shù)據(jù)可視化
Mtplotlib是Python中一個(gè)廣泛使用的繪圖庫,它提供了一個(gè)類似于MATLAB的繪圖框架。Mtplotlib可以生成高質(zhì)量的圖表,這些圖表可以用于數(shù)據(jù)可視化、科學(xué)研究、教育以及出版等領(lǐng)域。
4.1 安裝matplotlib庫
pip install matplotlib

安裝成功展示圖:

4.2 檢測matplotlib庫
pip show matplotlib

4.3 箱線圖
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
iris = pd.read_csv("D:/iris.csv")
iris.iloc[ : , 1 : 6].boxplot(column = ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"], by = "Species", figsize=(10,10))
plt.show()
4.4 散點(diǎn)圖
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
iris = pd.read_csv("D:/iris.csv")
color = iris.Species.map({"setosa" : "blue", "versicolor" : "green", "virginica" : "red"})
iris.plot(kind = "scatter" , x = "SepalLengthCm", y = "SepalWidthCm", s = 30, c = color, figsize = (10,10))
plt.show()
4.5 六邊形熱力圖
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
iris = pd.read_csv("D:/iris.csv")
iris.plot(kind = "hexbin" , x = "SepalLengthCm", y = "SepalWidthCm", gridsize = 15, figsize = (10,7), sharex = False)
plt.show()
4.6 折線圖
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
iris = pd.read_csv("D:/iris.csv")
iris.iloc[ : , 0 : 5].plot(kind = "line", x = "Id", figsize = (12, 8))
plt.show()
到此這篇關(guān)于Python的pandas庫基礎(chǔ)知識(shí)超詳細(xì)教程的文章就介紹到這了,更多相關(guān)Python的pandas庫基礎(chǔ)知識(shí)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Python項(xiàng)目Docker倉庫發(fā)布指南
本文檔詳細(xì)介紹如何將 AI Backend Python 項(xiàng)目構(gòu)建為 Docker 鏡像并發(fā)布到各種 Docker 倉庫,包括 Docker Hub、阿里云容器鏡像服務(wù)、騰訊云容器鏡像服務(wù)等,需要的朋友可以參考下2025-08-08
Python將阿拉伯?dāng)?shù)字轉(zhuǎn)換為羅馬數(shù)字的方法
這篇文章主要介紹了Python將阿拉伯?dāng)?shù)字轉(zhuǎn)換為羅馬數(shù)字的方法,涉及Python字符串轉(zhuǎn)換及流程控制的相關(guān)技巧,具有一定參考借鑒價(jià)值,需要的朋友可以參考下2015-07-07
如何使用Flask-Migrate拓展數(shù)據(jù)庫表結(jié)構(gòu)
這篇文章主要介紹了如何使用Flask-Migrate拓展數(shù)據(jù)庫表結(jié)構(gòu),文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2019-07-07
詳解Python中最常用的10個(gè)內(nèi)置函數(shù)
Python作為一種多用途編程語言,擁有豐富的內(nèi)置函數(shù)庫,這些函數(shù)可以極大地提高開發(fā)效率,本文將介紹Python中最常用的10個(gè)內(nèi)置函數(shù),我們將深入了解每個(gè)函數(shù),并提供示例代碼以幫助您更好地理解它們,需要的朋友可以參考下2023-11-11

