Schema¶

Bases: Generic[tybles._RowSpec]

Describes the structure of a Pandas dataframe

In Tybles, a schema is derived from a dataclass describing one row of the dataframe.

Attributes

Note: attributes inherited from parent classes are not shown here, if any

`row_spec`	Row specification
`order_columns`	Whether to order columns in the dataframe as in the row specification
`missing_columns`	What to do with missing columns
`extra_columns`	What to do with extra columns present, that are not part of the row specification
`validate`	Whether to run validation on every row of the data
`field_names`	Names of the fields in the schema, in order of definition
`dtypes`	Mapping of field names with associated dtypes
`annotated_types`	Mapping of field names with associated annotated types

Methods

`from_rows`	Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances
`process_raw_data_frame`	type df `DataFrame`
`read_csv`	Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand
`validate_row`	Validates the given row and raises an exception if validation fails

List of members of Schema

row_spec: Type[tybles._RowSpec]¶: Row specification

order_columns: bool¶: Whether to order columns in the dataframe as in the row specification

missing_columns: Union[Literal['error'], Literal['missing'], Literal['fill']]¶

What to do with missing columns

This occurs when reading/creating a dataframe and when writing/exporting a dataframe.

The possible values are:

“error”: raise an error (default)
“missing”: leave the missing columns missing (set validate to False then)
“fill”: fill the columns with the dtype default value

extra_columns: Union[Literal['drop'], Literal['keep'], Literal['error']]¶

What to do with extra columns present, that are not part of the row specification

“drop”: remove the extra columns from the dataframe (default)
“keep”: keep the extra columns in the dataframe (note that the dtype is autodetected)
“error”: raise an error

validate: bool¶

Whether to run validation on every row of the data

If the typeguard library is present, this will use typeguard.check_type(), otherwise a simple isinstance() check will be done.

field_names: Sequence[str]¶: Names of the fields in the schema, in order of definition

dtypes: Mapping[str, numpy.dtype]¶

Mapping of field names with associated dtypes

Can also serve as a dtype= argument for various Pandas functions

annotated_types: Mapping[str, type]¶: Mapping of field names with associated annotated types

validate_row(row)[source]¶

Validates the given row and raises an exception if validation fails

Parameters

row (TypeVar(_RowSpec)) – Row to validate

Raises

TypeError – If typeguard or standard validation failed
BeartypeException – If beartype failed

Return type

None

from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['DataFrame'] = 'DataFrame', **kwargs) → pandas.core.frame.DataFrame[source]¶

from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['Tyble'], **kwargs) → pandas.core.frame.DataFrame

Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances

Parameters

rows (Sequence[TypeVar(_RowSpec)]) – Rows as a sequence of dataclass instances

Keyword Arguments

return_type – Whether to return a pandas DataFrame (default) or a Tyble instance
kwargs – Extra keyword arguments are passed to pandas.DataFrame.from_records()

Return type

Union[DataFrame, Tyble[TypeVar(_RowSpec)]]

Returns

A pandas DataFrame, possibly wrapped in a Tyble

read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['DataFrame'] = 'DataFrame', **kw_args) → pandas.core.frame.DataFrame[source]¶

read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['Tyble'], **kw_args) → Tyble[_RowSpec]

Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand

Parameters

filepath_or_buffer (Union[TextIO, str, bytes, PathLike]) – Path or open file to read from

Keyword Arguments

return_type – Whether to return a pandas DataFrame (default) or a Tyble instance
kw_args – Additional keyword arguments not listed above are passed to pandas.read_csv()

Return type

Union[DataFrame, Tyble[TypeVar(_RowSpec)]]

Returns

A pandas dataframe, possibly wrapped in a Tyble

process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['DataFrame'] = 'DataFrame') → pandas.core.frame.DataFrame[source]¶

process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['Tyble']) → Tyble[_RowSpec]

Parameters

df (DataFrame) –

Dataframe to process, will be mutated.

In any case, one should use the dataframe returned by this function. (The code may or may not mutate in place this given dataframe.)

Raises

ValueError – If the dataframe fails the missing_columns or extra_columns checks
TypeError – If typeguard validation failed
BeartypeException – If beartype failed

Return type

Union[DataFrame, Tyble[TypeVar(_RowSpec)]]

Returns

The processed dataframe or a dataframe wrapped in a Tyble instance

__init__(row_spec, order_columns, missing_columns, extra_columns, validate, field_names, dtypes, annotated_types)¶

static __new__(cls, *args, **kwds)¶