clean_columns

clean_columns(df, case='snake', replace=None, remove_accents=True)

Clean messy column names in a pandas dataframe.

Parameters

Name Type Description Default
df typing.Union[pandas.pandas.DataFrame, polars.polars.DataFrame] Dataframe from which column names are to be cleaned. required
case str The desired case style of the column name. Defaults to “snake”. - ‘snake’ produces ‘column_name’; - ‘kebab’ produces ‘column-name’; - ‘camel’ produces ‘columnName’; - ‘pascal’ produces ‘ColumnName’; - ‘const’ produces ‘COLUMN_NAME’; - ‘sentence’ produces ‘Column name’; - ‘title’ produces ‘Column Name’; - ‘lower’ produces ‘column name’; - ‘upper’ produces ‘COLUMN NAME’; 'snake'
replace typing.Optional[typing.Dict[str, str]] Values to replace in the column names. Defaults to None. - {‘old_value’: ‘new_value’} None
remove_accents bool If True, strip accents from the column names. Defaults to True. True

Raises

Type Description
ValueError If case is not valid.

Returns

Type Description
typing.Union[pandas.pandas.DataFrame, polars.polars.DataFrame] Dataframe with cleaned column names.

Examples

Clean column names by converting the names to camel case style, removing accents, and correcting a mispelling.

>>> df = pd.DataFrame(
                    {
                    'FirstNom': ['Philip', 'Turanga'],
                    'lastName': ['Fry', 'Leela'],
                    'Téléphone': ['555-234-5678', '(604) 111-2335']
                    })

>>> clean_columns(df, case='camel', replace={'Nom': 'Name'})
firstName lastName       telephone
0    Philip      Fry    555-234-5678
1   Turanga    Leela  (604) 111-2335