Schema
Schema¶
- class Schema[source]¶
Bases:
Generic
[tybles._RowSpec
]Describes the structure of a Pandas dataframe
In Tybles, a schema is derived from a dataclass describing one row of the dataframe.
Attributes
Note: attributes inherited from parent classes are not shown here, if any
Row specification
Whether to order columns in the dataframe as in the row specification
What to do with missing columns
What to do with extra columns present, that are not part of the row specification
Whether to run validation on every row of the data
Names of the fields in the schema, in order of definition
Mapping of field names with associated dtypes
Mapping of field names with associated annotated types
Methods
Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances
- type df
Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand
Validates the given row and raises an exception if validation fails
List of members of Schema
- row_spec: Type[tybles._RowSpec]¶
Row specification
- missing_columns: Union[Literal['error'], Literal['missing'], Literal['fill']]¶
What to do with missing columns
This occurs when reading/creating a dataframe and when writing/exporting a dataframe.
The possible values are:
“error”: raise an error (default)
“missing”: leave the missing columns missing (set
validate
to False then)“fill”: fill the columns with the dtype default value
- extra_columns: Union[Literal['drop'], Literal['keep'], Literal['error']]¶
What to do with extra columns present, that are not part of the row specification
“drop”: remove the extra columns from the dataframe (default)
“keep”: keep the extra columns in the dataframe (note that the dtype is autodetected)
“error”: raise an error
- validate: bool¶
Whether to run validation on every row of the data
If the typeguard library is present, this will use
typeguard.check_type()
, otherwise a simpleisinstance()
check will be done.
- dtypes: Mapping[str, numpy.dtype]¶
Mapping of field names with associated dtypes
Can also serve as a
dtype=
argument for various Pandas functions
- from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['DataFrame'] = 'DataFrame', **kwargs) pandas.core.frame.DataFrame [source]¶
- from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['Tyble'], **kwargs) pandas.core.frame.DataFrame
Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances
- Parameters
rows (
Sequence
[TypeVar
(_RowSpec
)]) – Rows as a sequence of dataclass instances- Keyword Arguments
return_type – Whether to return a pandas
DataFrame
(default) or aTyble
instancekwargs – Extra keyword arguments are passed to
pandas.DataFrame.from_records()
- Return type
- Returns
A pandas DataFrame, possibly wrapped in a Tyble
- read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['DataFrame'] = 'DataFrame', **kw_args) pandas.core.frame.DataFrame [source]¶
- read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['Tyble'], **kw_args) Tyble[_RowSpec]
Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand
- Parameters
filepath_or_buffer (
Union
[TextIO
,str
,bytes
,PathLike
]) – Path or open file to read from- Keyword Arguments
return_type – Whether to return a pandas
DataFrame
(default) or aTyble
instancekw_args – Additional keyword arguments not listed above are passed to
pandas.read_csv()
- Return type
- Returns
A pandas dataframe, possibly wrapped in a Tyble
- process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['DataFrame'] = 'DataFrame') pandas.core.frame.DataFrame [source]¶
- process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['Tyble']) Tyble[_RowSpec]
- Parameters
df (
DataFrame
) –Dataframe to process, will be mutated.
In any case, one should use the dataframe returned by this function. (The code may or may not mutate in place this given dataframe.)
- Raises
ValueError – If the dataframe fails the
missing_columns
orextra_columns
checksTypeError – If typeguard validation failed
BeartypeException – If beartype failed
- Return type
- Returns
The processed dataframe or a dataframe wrapped in a
Tyble
instance
- __init__(row_spec, order_columns, missing_columns, extra_columns, validate, field_names, dtypes, annotated_types)¶
- static __new__(cls, *args, **kwds)¶