Sometimes, we wanna couple multiple dataframes together. In this note, I use df as DataFrame, s as Series.

Libraries

import pandas as pd
import numpy as np

Coupling dfs with merge()

There are 4 types of merging, like in SQL.

Untitled

On the same column name,

# left
df_left = pd.merge(left=df1, right=df2, how='left', on='Col_1', suffixes=('_df1', '_df2'))
# right
df_right = pd.merge(left=df1, right=df2, how='right', on='Col_1', suffixes=('_df1', '_df2'))
Col_1 Col_2
0 A 1
1 E 3
2 C NaN
3 D NaN
4 B 2
Col_1 Col_2
0 A 1
1 B 2
2 C -3
3 F -4
4 E NaN
Col_1 Col_2_df1 Col_2_df2
0 A 1 1
1 E 3 NaN
2 C NaN -3
3 D NaN NaN
4 B 2 2
Col_1 Col_2_df1 Col_2_df2
0 A 1 1
1 E 3 NaN
2 C NaN -3
3 B 2 2
4 F NaN -4
# inner (defaut)
df_inner = pd.merge(left=df1, right=df2, on='Col_1', suffixes=('_df1', '_df2'))
# outer
df_outer = pd.merge(left=df1, right=df2, how='outer', on='Col_1', suffixes=('_df1', '_df2'))

display_side_by_side(df1, df2, df_inner, df_outer)
Col_1 Col_2
0 A 1
1 E 3
2 C NaN
3 D NaN
4 B 2
Col_1 Col_2
0 A 1
1 B 2
2 C -3
3 F -4
4 E NaN
Col_1 Col_2_df1 Col_2_df2
0 A 1 1
1 E 3 NaN
2 C NaN -3
3 B 2 2
Col_1 Col_2_df1 Col_2_df2
0 A 1 1
1 E 3 NaN
2 C NaN -3
3 D NaN NaN
4 B 2 2
5 F NaN -4

On the different column names,

# left
df_left = pd.merge(left=df1, right=df2, how='left', left_on='Col_1', right_on='Col_X', suffixes=('_df1', '_df2'))

display_side_by_side(df1, df2, df_left)
Col_1 Col_2
0 A 1
1 E 3
2 C NaN
3 D NaN
4 B 2
Col_X Col_2
0 A 1
1 B 2
2 C -3
3 F -4
4 E NaN