---------------------------------------------------------------------- This is the API documentation for the skimpy library. ---------------------------------------------------------------------- ## Functions Utility functions clean_columns(df: 'pd.DataFrame | pl.DataFrame', case: 'str' = 'snake', replace: 'dict[str, str] | None' = None, remove_accents: 'bool' = True) -> 'pd.DataFrame | pl.DataFrame' Clean messy column names in a pandas dataframe. Args: df (pd.DataFrame | pl.DataFrame): Dataframe from which column names are to be cleaned. case (str, optional): The desired case style of the column name. Defaults to "snake". - 'snake' produces 'column_name'; - 'kebab' produces 'column-name'; - 'camel' produces 'columnName'; - 'pascal' produces 'ColumnName'; - 'const' produces 'COLUMN_NAME'; - 'sentence' produces 'Column name'; - 'title' produces 'Column Name'; - 'lower' produces 'column name'; - 'upper' produces 'COLUMN NAME'; replace (dict[str, str] | None, optional): Values to replace in the column names. Defaults to None. - {'old_value': 'new_value'} remove_accents (bool, optional): If True, strip accents from the column names. Defaults to True. Raises: ValueError: If case is not valid. Returns: pd.DataFrame | pl.DataFrame: Dataframe with cleaned column names. Examples: Clean column names by converting the names to camel case style, removing accents, and correcting a mispelling. >>> df = pd.DataFrame( { 'FirstNom': ['Philip', 'Turanga'], 'lastName': ['Fry', 'Leela'], 'Téléphone': ['555-234-5678', '(604) 111-2335'] }) >>> clean_columns(df, case='camel', replace={'Nom': 'Name'}) firstName lastName telephone 0 Philip Fry 555-234-5678 1 Turanga Leela (604) 111-2335 generate_test_data() -> 'pd.DataFrame' Generate a pandas dataframe with several different datatypes. For testing skimpy, it's convenient to have a dataset with many different data types. This function creates that dataframe. Returns: pd.DataFrame: dataframe with columns spanning several data types. Examples: Generate test data to demonstrate how skimpy works. >>> df = generate_test_data() skim(df_in: 'pd.DataFrame | pl.DataFrame') -> 'None' Skim a pandas or polars dataframe and return visual summary statistics on it. skim is an alternative to pandas.DataFrame.describe(), quickly providing an overview of a data frame via a table displayed in the console. It produces a different set of summary functions based on the types of columns in the dataframe. You may get better results from ensuring that you set the datatypes in your dataframe you want before running skim. Note that any unknown column types, or mixed column types, will not be processed. Args: df_in (pd.DataFrame | pl.DataFrame): Dataframe to skim. Raises: NotImplementedError: If the dataframe has a MultiIndex column structure. Examples: Skim a dataframe >>> df = pd.DataFrame( { 'col1': ['Philip', 'Turanga', 'bob'], 'col2': [50, 100, 70], 'col3': [False, True, True] }) >>> df["col1"] = df["col1"].astype("string") >>> skim(df) skim_get_data(df_in: 'pd.DataFrame | pl.DataFrame') -> 'JSON | str' Skim a pandas or polars dataframe and return summary statistics as a dictionary, and without printing to the console. skim is an alternative to pandas.DataFrame.describe(), quickly providing an overview of a data frame via a table of summary statistics. It produces a different set of summary functions based on the types of columns in the dataframe. You may get better results from ensuring that you set the datatypes in your dataframe you want before running skim. Note that any unknown column types, or mixed column types, will not be processed. Args: df_in (pd.DataFrame | pl.DataFrame): Dataframe to get summary statistics on. Returns: JSON | str: Dictionary of summary statistics. skim_get_figure(df_in: 'pd.DataFrame | pl.DataFrame', save_path: 'os.PathLike | str', format: 'str' = 'svg') -> 'None' Skim a pandas or polars dataframe, print the stats to the console, and save a version of the table as an SVG, HTML, or text file. skim is an alternative to pandas.DataFrame.describe(), quickly providing an overview of a data frame via a table of summary statistics. It produces a different set of summary functions based on the types of columns in the dataframe. You may get better results from ensuring that you set the datatypes in your dataframe you want before running skim. Note that any unknown column types, or mixed column types, will not be processed. Args: df_in (pd.DataFrame | pl.DataFrame): Dataframe to skim. save_path (os.PathLike | str): Path to save figure to (include extension). format (str, optional): svg, html, or text. Defaults to "svg". Raises: ValueError: If the format is not one of svg, html, or text. ## Constants Module-level constants and data CASE_STYLES set() -> new empty set object set(iterable) -> new set object Build an unordered collection of unique elements. COMPLETE_COL str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. DATE_COL_FIRST str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. DATE_COL_LAST str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. HIST_BINS int([x]) -> integer int(x, base=10) -> integer Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero. If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int('0b100', base=0) 4 MAX_COL_WIDTH int([x]) -> integer int(x, base=10) -> integer Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero. If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int('0b100', base=0) 4 MIN_COL_WIDTH int([x]) -> integer int(x, base=10) -> integer Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero. If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int('0b100', base=0) 4 MISSING_COL str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. NULL_VALUES set() -> new empty set object set(iterable) -> new set object Build an unordered collection of unique elements. NUM_COL_MEAN str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. QUANTILES Built-in mutable sequence. If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified. ---------------------------------------------------------------------- This is the CLI documentation for the package. ---------------------------------------------------------------------- ## CLI: skimpy ``` Usage: skimpy [OPTIONS] INPUT The skimpy command line interface. Usage refers only to command line. Args: input (str): Path of data file (csv, parquet, or sqlite) table (str | None): Table name for sqlite files; shows available tables if not provided Options: --version Show the version and exit. -t, --table TEXT Table name (required for sqlite files). If not provided, shows available tables. --help Show this message and exit. ```