clean_columns
clean_columns(df, case='snake', replace=None, remove_accents=True)
Clean messy column names in a pandas dataframe.
Parameters
Name | Type | Description | Default |
---|---|---|---|
df |
typing.Union[pandas.pandas.DataFrame, polars.polars.DataFrame] | Dataframe from which column names are to be cleaned. | required |
case |
str | The desired case style of the column name. Defaults to “snake”. - ‘snake’ produces ‘column_name’; - ‘kebab’ produces ‘column-name’; - ‘camel’ produces ‘columnName’; - ‘pascal’ produces ‘ColumnName’; - ‘const’ produces ‘COLUMN_NAME’; - ‘sentence’ produces ‘Column name’; - ‘title’ produces ‘Column Name’; - ‘lower’ produces ‘column name’; - ‘upper’ produces ‘COLUMN NAME’; | 'snake' |
replace |
typing.Optional[typing.Dict[str, str]] | Values to replace in the column names. Defaults to None. - {‘old_value’: ‘new_value’} | None |
remove_accents |
bool | If True, strip accents from the column names. Defaults to True. | True |
Raises
Type | Description |
---|---|
ValueError | If case is not valid. |
Returns
Type | Description |
---|---|
typing.Union[pandas.pandas.DataFrame, polars.polars.DataFrame] | Dataframe with cleaned column names. |
Examples
Clean column names by converting the names to camel case style, removing accents, and correcting a mispelling.
>>> df = pd.DataFrame(
{
'FirstNom': ['Philip', 'Turanga'],
'lastName': ['Fry', 'Leela'],
'Téléphone': ['555-234-5678', '(604) 111-2335']
})
>>> clean_columns(df, case='camel', replace={'Nom': 'Name'})
firstName lastName telephone
0 Philip Fry 555-234-5678
1 Turanga Leela (604) 111-2335