ベクトル | NumPy / Pandas におけるベクトルの取り扱い方

Python でデータ解析を行う際に、ベクトルを NumPy の一次元配列あるいは Pandas の Series で扱うことで、計算などが簡単に行えるようになる。

NumPy 配列

ベクトルの生成

NumPy の配列は、numpy.array メソッドで生成する。NumPy の配列は、ベクトルとして足し算や引き算などを簡単に行えるようになる。

import numpy as np
a = np.array([1, 1, 2, 3, 5, 8, 13])
b = np.array([1, 4, 9, 16, 25, 36, 49])

a + b
## array([ 2,  5, 11, 19, 30, 44, 62])

a[0] + b[2]
## 10

すべてゼロの配列や一定の規則で数値を繰り返す配列は、zeros や repeat などのメソッドを利用して作成する。

import numpy as np
x = np.array([1, 2, 3, 4])

x.repeat(3)
## array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])

x.repeat(x)
## array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])

np.zeros(5)
## array([ 0.,  0.,  0.,  0.,  0.])

np.arange(1,5)
## array([1, 2, 3, 4])

np.arange(1, 10, 3)
## array([1, 4, 7])

np.arange(10, 1, -1)
## array([10,  9,  8,  7,  6,  5,  4,  3,  2])

ベクトルの長さやデータの型などを調べる際に次のようにする。

import numpy as np
a = np.array([1, 1, 2, 3, 5, 8, 13])
a
## array([ 1,  1,  2,  3,  5,  8, 13])

a.size
## 7

a.dtype
## dtype('int64')

ベクトルの連結

2 つの NumPy ベクトルを連結するには append メソッドを利用する。

import numpy as np
a = np.array([1, 1, 2, 3, 5, 8, 13])
b = np.array([1, 4, 9, 16, 25, 36, 49])

c = np.append(a, b)
c
## array([ 1,  1,  2,  3,  5,  8, 13,  1,  4,  9, 16, 25, 36, 49])

2 つのベクトルを縦にまたは横に結合して 2 次元配列に変換するには NumPy の column_stack または row_stack を利用する。

import numpy as np
a = np.array([1, 1, 2, 3, 5, 8, 13])
b = np.array([1, 4, 9, 16, 25, 36, 49])

d = np.column_stack((a, b))
d
## array([[ 1,  1],
##        [ 1,  4],
##        [ 2,  9],
##        [ 3, 16],
##        [ 5, 25],
##        [ 8, 36],
##        [13, 49]])

## d[4,0]
## 5

e = np.row_stack((a, b))
e
## array([[ 1,  1,  2,  3,  5,  8, 13],
##        [ 1,  4,  9, 16, 25, 36, 49]])

e[1, 5]
## 36

ベクトルの計算

NumPy のベクトルには、平均値などの基礎統計量を計算するメソッドが定義されている。

import numpy as np
a = np.array([1, 1, 2, 3, 5, 8, 13])

a.mean()     # np.mean(a)
## 4.7142857142857144

a.var()      # np.var(a)
## 16.77551020408163

a.sum()      # np.sum(a)
## 33

a.min()      # np.min(a)
## 1

a.max()      # np.max(a)
## 13

ベクトル要素のソート

ベクトルを並べかえるには、sort や argsort メッソドを利用して並べ替える。sort はベクトル要素を並べ変えるのに対して、argsort はベクトル要素を並べ替えるためのインデックスを求めている。

import numpy as np
y = np.array([1, 43, 23, 32, 55, 11])

y[::-1]
## array([11, 55, 32, 23, 43,  1])

 np.sort(y)
## array([ 1, 11, 23, 32, 43, 55])

np.argsort(y)
## array([0, 1, 2, 3, 4, 5])

y[np.argsort(y)]
## array([ 1, 11, 23, 32, 43, 55])

Pandas Series

ベクトルの生成

ベクトルを、データ解析に特化した Pandas ライブラリーで定義されている Series クラスのオブジェクトとして作成することもできる。

import pandas as pd

a = pd.Series([1, 1, 2, 3, 4])
a
## 0    1
## 1    1
## 2    2
## 3    3
## 4    4
## dtype: int64

ベクトルを生成するときに、index で、ベクトルの各要素に名前をつけることができる。

import pandas as pd

b = pd.Series([23, 11, 14, 24], index = ['gene1', 'gene2', 'gene3', 'gene4'])
b
## gene1    23
## gene2    11
## gene3    14
## gene4    24
## dtype: int64

c = pd.Series([23, 11, 14, 24])
c.index = ['gene1', 'gene2', 'gene3', 'gene4']
c
## gene1    23
## gene2    11
## gene3    14
## gene4    24
## dtype: int64

d = pd.Series({'gene1': 23, 'gene2': 11, 'gene3': 14, 'gene4': 18})
d
## gene1    23
## gene2    11
## gene3    14
## gene4    24
## dtype: int64

ベクトルの各要素を取り出すには、添字と名前の両方を利用することができる。

import pandas as pd

x = pd.Series([23, 11, 14, 24], index = ['gene1', 'gene2', 'gene3', 'gene4'])
x[1]
## 11

x['gene3']
## 14

x[2:3]
## gene1    23
## gene2    11
## dtype: int64

ベクトルの計算

ベクトルの各要素に対して足し算や引き算などを行うことができる。

import pandas as pd

x = pd.Series([23, 11, 14, 24], index = ['gene1', 'gene2', 'gene3', 'gene4'])
x.sum()
## 72

x.mean()
## 18.0

x.var()
## 42.0

基本的な統計量を計算するメソッドも定義されている。

import pandas as pd

x = pd.Series([23, 11, 14, 24], index = ['gene1', 'gene2', 'gene3', 'gene4'])
y = pd.Series([31, 12, 10, 17], index = ['gene1', 'gene2', 'gene3', 'gene4'])

x + y
## gene1    54
## gene2    23
## gene3    24
## gene4    41
## dtype: int64

x * y
## gene1    713
## gene2    132
## gene3    140
## gene4    408
## dtype: int64