clean_columns()

Clean messy column names in a pandas dataframe.

Usage

Source

clean_columns(
    df,
    case="snake",
    replace=None,
    remove_accents=True,
)

Parameters

df: pd.DataFrame | pl.DataFrame

Dataframe from which column names are to be cleaned.

case: str = "snake"

The desired case style of the column name. Defaults to “snake”.

- 'snake' produces 'column_name';
- 'kebab' produces 'column-name';
- 'camel' produces 'columnName';
- 'pascal' produces 'ColumnName';
- 'const' produces 'COLUMN_NAME';
- 'sentence' produces 'Column name';
- 'title' produces 'Column Name';
- 'lower' produces 'column name';
- 'upper' produces 'COLUMN NAME';
replace: dict[str, str] | None = None

Values to replace in the column names. Defaults to None.

- {'old_value': 'new_value'}
remove_accents: bool = True

If True, strip accents from the column names. Defaults to True.

Raises

ValueError

If case is not valid.

Returns

pd.DataFrame | pl.DataFrame

pd.DataFrame | pl.DataFrame: Dataframe with cleaned column names.

Examples

Clean column names by converting the names to camel case style, removing accents, and correcting a mispelling.

>>> df = pd.DataFrame(
                    {
                    'FirstNom': ['Philip', 'Turanga'],
                    'lastName': ['Fry', 'Leela'],
                    'Téléphone': ['555-234-5678', '(604) 111-2335']
                    })
>>> clean_columns(df, case='camel', replace={'Nom': 'Name'})
firstName lastName       telephone
0    Philip      Fry    555-234-5678
1   Turanga    Leela  (604) 111-2335