Schema
Schema¶
- class Schema[source]¶
- Bases: - Generic[- tybles._RowSpec]- Describes the structure of a Pandas dataframe - In Tybles, a schema is derived from a dataclass describing one row of the dataframe. - Attributes - Note: attributes inherited from parent classes are not shown here, if any - Row specification - Whether to order columns in the dataframe as in the row specification - What to do with missing columns - What to do with extra columns present, that are not part of the row specification - Whether to run validation on every row of the data - Names of the fields in the schema, in order of definition - Mapping of field names with associated dtypes - Mapping of field names with associated annotated types - Methods - Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances - type df
 - Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand - Validates the given row and raises an exception if validation fails - List of members of Schema - row_spec: Type[tybles._RowSpec]¶
- Row specification 
 - missing_columns: Union[Literal['error'], Literal['missing'], Literal['fill']]¶
- What to do with missing columns - This occurs when reading/creating a dataframe and when writing/exporting a dataframe. - The possible values are: - “error”: raise an error (default) 
- “missing”: leave the missing columns missing (set - validateto False then)
- “fill”: fill the columns with the dtype default value 
 
 - extra_columns: Union[Literal['drop'], Literal['keep'], Literal['error']]¶
- What to do with extra columns present, that are not part of the row specification - “drop”: remove the extra columns from the dataframe (default) 
- “keep”: keep the extra columns in the dataframe (note that the dtype is autodetected) 
- “error”: raise an error 
 
 - validate: bool¶
- Whether to run validation on every row of the data - If the typeguard library is present, this will use - typeguard.check_type(), otherwise a simple- isinstance()check will be done.
 - dtypes: Mapping[str, numpy.dtype]¶
- Mapping of field names with associated dtypes - Can also serve as a - dtype=argument for various Pandas functions
 - from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['DataFrame'] = 'DataFrame', **kwargs) pandas.core.frame.DataFrame[source]¶
- from_rows(rows: Sequence[tybles._RowSpec], return_type: Literal['Tyble'], **kwargs) pandas.core.frame.DataFrame
- Returns a pandas DataFrame (possibly as an enriched Tyble) from row instances - Parameters
- rows ( - Sequence[- TypeVar(- _RowSpec)]) – Rows as a sequence of dataclass instances
- Keyword Arguments
- return_type – Whether to return a pandas - DataFrame(default) or a- Tybleinstance
- kwargs – Extra keyword arguments are passed to - pandas.DataFrame.from_records()
 
- Return type
- Returns
- A pandas DataFrame, possibly wrapped in a Tyble 
 
 - read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['DataFrame'] = 'DataFrame', **kw_args) pandas.core.frame.DataFrame[source]¶
- read_csv(filepath_or_buffer: Union[TextIO, str, bytes, os.PathLike], return_type: Literal['Tyble'], **kw_args) Tyble[_RowSpec]
- Reads a pandas DataFrame from a CSV file, shaping up and validating the data on demand - Parameters
- filepath_or_buffer ( - Union[- TextIO,- str,- bytes,- PathLike]) – Path or open file to read from
- Keyword Arguments
- return_type – Whether to return a pandas - DataFrame(default) or a- Tybleinstance
- kw_args – Additional keyword arguments not listed above are passed to - pandas.read_csv()
 
- Return type
- Returns
- A pandas dataframe, possibly wrapped in a Tyble 
 
 - process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['DataFrame'] = 'DataFrame') pandas.core.frame.DataFrame[source]¶
- process_raw_data_frame(df: pandas.core.frame.DataFrame, return_type: Literal['Tyble']) Tyble[_RowSpec]
- Parameters
- df ( - DataFrame) –- Dataframe to process, will be mutated. - In any case, one should use the dataframe returned by this function. (The code may or may not mutate in place this given dataframe.) 
- Raises
- ValueError – If the dataframe fails the - missing_columnsor- extra_columnschecks
- TypeError – If typeguard validation failed 
- BeartypeException – If beartype failed 
 
- Return type
- Returns
- The processed dataframe or a dataframe wrapped in a - Tybleinstance
 
 - __init__(row_spec, order_columns, missing_columns, extra_columns, validate, field_names, dtypes, annotated_types)¶
 - static __new__(cls, *args, **kwds)¶