In this note, I use df as DataFrame, s as Series.

Libraries

import pandas as pd # import pandas package
import numpy as np

Things need to be checked

  1. csv file:
    1. Values are separated by , of ;?
    2. Encoding.
    3. Timestamp type.
  2. Indexes are sorted?
  3. Indexes are continuous with step 1 (especially after using .dropna() or .drop_duplicates)?
  4. Are there NaN values? Drop them?
  5. Are there duplicates? Drop them?
  6. How many unique values?
  7. For 0/1 features, they have only 2 unique values (0 and 1)?
  8. KDE plot to check the values distribution.
  9. The number of columns?
  10. Unique labels?
  11. Time series:
    1. Time range.
    2. Time step.
    3. Timestamp's type.
    4. Timezone.
    5. Timestamps are monotonic?

Deal with columns

Remove or Keep some

# REMOVING COLUMNS
df.drop('New', axis=1, inplace=True) # drop column 'New'
df.drop(['col1', 'col2'], axis=1, inplace=True)
# ONLY KEEP SOME
kept_cols = ['col1', 'col2', ...]
df = df[kept_cols]
# ALL EXCEPT SOME
df[df.columns.difference(['b'])]

Rename columns