In this note, I use df
as DataFrame
, s
as Series
.
import pandas as pd # import pandas package
import numpy as np
csv
file:
,
of ;
?.dropna()
or .drop_duplicates
)?NaN
values? Drop them?0/1
features, they have only 2 unique values (0
and 1
)?KDE
plot to check the values distribution.# REMOVING COLUMNS
df.drop('New', axis=1, inplace=True) # drop column 'New'
df.drop(['col1', 'col2'], axis=1, inplace=True)
# ONLY KEEP SOME
kept_cols = ['col1', 'col2', ...]
df = df[kept_cols]
# ALL EXCEPT SOME
df[df.columns.difference(['b'])]