In this note, I use df as DataFrame, s as Series.
import pandas as pd # import pandas package
import numpy as np
csv file:
, of ;?.dropna() or .drop_duplicates)?NaN values? Drop them?0/1 features, they have only 2 unique values (0 and 1)?KDE plot to check the values distribution.# REMOVING COLUMNS
df.drop('New', axis=1, inplace=True) # drop column 'New'
df.drop(['col1', 'col2'], axis=1, inplace=True)
# ONLY KEEP SOME
kept_cols = ['col1', 'col2', ...]
df = df[kept_cols]
# ALL EXCEPT SOME
df[df.columns.difference(['b'])]