Schema

Schema

class Schema[source]

Bases: Generic[tybles._RowSpec]

Describes the structure of a Pandas dataframe

In Tybles, a schema is derived from a dataclass describing one row of the dataframe.

Attributes

Note: attributes inherited from parent classes are not shown here, if any

row_spec

Row specification

order_columns

Whether to order columns in the dataframe as in the row specification

missing_columns

What to do with missing columns

extra_columns

What to do with extra columns present, that are not part of the row specification

validate

Whether to run validation on every row of the data

field_names

Names of the fields in the schema, in order of definition

dtypes

Mapping of field names with associated dtypes

annotated_types

Mapping of field names with associated annotated types

Methods

from_rows

Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances

process_raw_data_frame

type df

DataFrame

read_csv

Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand

validate_row

Validates the given row and raises an exception if validation fails

List of members of Schema

row_spec: Type[tybles._RowSpec]

Row specification

order_columns: bool

Whether to order columns in the dataframe as in the row specification

missing_columns: Union[Literal['error'], Literal['missing'], Literal['fill']]

What to do with missing columns

This occurs when reading/creating a dataframe and when writing/exporting a dataframe.

The possible values are:

  • “error”: raise an error (default)

  • “missing”: leave the missing columns missing (set validate to False then)

  • “fill”: fill the columns with the dtype default value

extra_columns: Union[Literal['drop'], Literal['keep'], Literal['error']]

What to do with extra columns present, that are not part of the row specification

  • “drop”: remove the extra columns from the dataframe (default)

  • “keep”: keep the extra columns in the dataframe (note that the dtype is autodetected)

  • “error”: raise an error

validate: bool

Whether to run validation on every row of the data

If the typeguard library is present, this will use typeguard.check_type(), otherwise a simple isinstance() check will be done.

field_names: Sequence[str]

Names of the fields in the schema, in order of definition

dtypes: Mapping[str, numpy.dtype]

Mapping of field names with associated dtypes

Can also serve as a dtype= argument for various Pandas functions

annotated_types: Mapping[str, type]

Mapping of field names with associated annotated types

validate_row(row)[source]

Validates the given row and raises an exception if validation fails

Parameters

row (TypeVar(_RowSpec)) – Row to validate

Raises
  • TypeError – If typeguard or standard validation failed

  • BeartypeException – If beartype failed

Return type

None

from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['DataFrame'] = 'DataFrame', **kwargs) pandas.core.frame.DataFrame[source]
from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['Tyble'], **kwargs) pandas.core.frame.DataFrame

Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances

Parameters

rows (Sequence[TypeVar(_RowSpec)]) – Rows as a sequence of dataclass instances

Keyword Arguments
Return type

Union[DataFrame, Tyble[TypeVar(_RowSpec)]]

Returns

A pandas DataFrame, possibly wrapped in a Tyble

read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['DataFrame'] = 'DataFrame', **kw_args) pandas.core.frame.DataFrame[source]
read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['Tyble'], **kw_args) Tyble[_RowSpec]

Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand

Parameters

filepath_or_buffer (Union[TextIO, str, bytes, PathLike]) – Path or open file to read from

Keyword Arguments
  • return_type – Whether to return a pandas DataFrame (default) or a Tyble instance

  • kw_args – Additional keyword arguments not listed above are passed to pandas.read_csv()

Return type

Union[DataFrame, Tyble[TypeVar(_RowSpec)]]

Returns

A pandas dataframe, possibly wrapped in a Tyble

process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['DataFrame'] = 'DataFrame') pandas.core.frame.DataFrame[source]
process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['Tyble']) Tyble[_RowSpec]
Parameters

df (DataFrame) –

Dataframe to process, will be mutated.

In any case, one should use the dataframe returned by this function. (The code may or may not mutate in place this given dataframe.)

Raises
  • ValueError – If the dataframe fails the missing_columns or extra_columns checks

  • TypeError – If typeguard validation failed

  • BeartypeException – If beartype failed

Return type

Union[DataFrame, Tyble[TypeVar(_RowSpec)]]

Returns

The processed dataframe or a dataframe wrapped in a Tyble instance

__init__(row_spec, order_columns, missing_columns, extra_columns, validate, field_names, dtypes, annotated_types)
static __new__(cls, *args, **kwds)