【Python茴香豆系列】之 PANDAS 如何遍历 DataFrame 的所有行#

用 Python 编程，使用不同的方法来完成同一个目标，有时候是一件很有意思的事情。这让我想起鲁迅笔下的孔乙己。孔乙己对于茴香豆的茴字的四种写法颇有研究。我不敢自比孔乙己，这里搜集一些 Python 的茴香豆，以飨各位码农。

首先准备一个函数，用来生成用于测试的 DataFrame 。这个 DataFrame 有 3 列，名称分别为 a 、 b 和 c 。

[1]:

import numpy as np
import pandas as pd

df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]})
df

[1]:

BOSS 的要求是打印出：

4
5
6

[2]:

df['a'] + df['b']

[2]:

0    5
1    7
2    9
dtype: int64

茴香豆一： iterrows#

[3]:

for index, row in df.iterrows():
    print(row['a'], row['b'])

1 4
2 5
3 6

[4]:

for row in df.itertuples(index=True, name='hxd'):
    print(row.a, row.b)

1 4
2 5
3 6

[5]:

for row in df.itertuples(name='hxd'):
    print(row)

hxd(Index=0, a=1, b=4, c=7)
hxd(Index=1, a=2, b=5, c=8)
hxd(Index=2, a=3, b=6, c=9)

[6]:

for i in range(0, len(df)):
    print (df.iloc[i]['a'], df.iloc[i]['b'])

1 4
2 5
3 6

[7]:

for row in df.to_dict(orient='records'):
    print(row['a'], row['b'])

1 4
2 5
3 6

当然， DataFrame 还有各种各样的 to_ 开头的方法，有兴趣的朋友可以尝试一下。

[8]:

df = pd.DataFrame(np.random.randn(100000, 4), columns=list('abcd'))

[9]:

timeit [row.a + row.b for row in df.itertuples(index=False)]

62.4 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

[10]:

timeit df['a'] + df['b']

201 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

[11]:

timeit [row for row in df.itertuples(index=False)]

60.1 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

[12]:

timeit [x+y for x,y in zip(df['a'], df['b'])]

15.7 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

【Python茴香豆系列】之拍扁列表

【Python茴香豆系列】之 PANDAS 修改 DataFrame 列名