dfchain.backends.pandas package
Submodules
dfchain.backends.pandas.executor_impl module
Pandas backend executor implementation.
This module provides PandasExecutor, an in‑memory implementation of
dfchain.core.executor.Executor that wraps a single
:pandas:`pandas.DataFrame` instance and exposes grouping and chunking hooks
used by higher‑level APIs.
- class dfchain.backends.pandas.executor_impl.PandasExecutor(_groupkey: Hashable | None = None, _df: DataFrameLike | None = None, is_eager: bool = False, is_inplace: bool = False, chunksize: int | None = None)[source]
Bases:
ExecutorExecutor implementation backed by an in‑memory
pandas.DataFrame.PandasExecutoris a lightweight in‑memory executor that wraps apandas.DataFrameand implements the grouping and chunking hooks defined bydfchain.core.executor.PartitionAble.- _df
The wrapped dataframe. It can be provided at construction time or later via
df().- Type:
pandas.DataFrameor None, default None
- is_eager
Hint for task execution mode. When
True, tasks may execute eagerly rather than building a deferred plan. The exact semantics are defined by higher‑level APIs.- Type:
bool, default False
- is_inplace
When
True, task functions are expected to mutate_dfin place. WhenFalse, tasks should treat_dfas immutable and reassign a new dataframe instead.- Type:
bool, default False
- chunksize
Optional hint used by higher‑level code to determine how many rows to process per chunk when streaming or partitioning the data.
- Type:
int or None, default None
Note
The pandas backend is designed for in‑memory use and does not maintain an index by group key. As a result, methods that would write changes back to specific groups (
update_group,clear_groups,rebuild_groups) raiseNotImplementedError.- clear_groups() None[source]
Clear any cached grouping state.
The pandas executor does not cache grouped state keyed by a group index, so this method raises
NotImplementedError.
- df(df)[source]
Set the wrapped dataframe and return self for fluent construction.
- Parameters:
df (
pandas.DataFrame) – Dataframe to wrap.- Returns:
The executor instance.
- Return type:
self (
PandasExecutor)
- iter_chunks() Iterable[DataFrame][source]
Iterate dataframe chunks.
The default in‑memory strategy yields a single chunk containing the entire dataframe. Callers that require more advanced chunking behaviour should subclass
PandasExecutoror use a differentExecutorimplementation.Note
The default implementation yields a
(key, chunk)pair where thekeyis0andchunkis the full dataframe. Higher‑level code should account for this convention when consuming the iterator.
- iter_groups() Iterable[tuple[Hashable, DataFrame]][source]
Iterate grouped data as
(key, group_df)pairs.If
_groupkeyisNone, yield a single pair(None, self._df)containing the whole dataframe. Otherwise, performself._df.groupby(self._groupkey)and yield the resulting(key, group)pairs produced by pandas.
- rebuild_groups(flush_every: int = 1)[source]
Rebuild or re‑materialize groups.
- Parameters:
flush_every (int, optional) – Hint controlling how often to flush intermediate state. Not implemented for the in‑memory pandas backend.
- update_group(df: DataFrame) None[source]
Update the current group with the provided dataframe.
The pandas backend does not maintain an index by group key, so there is no safe default way to update a single group in place. This method therefore raises
NotImplementedError. Backends that support indexed group updates (for example, a database backend) should provide an implementation.
Module contents
Pandas backend for dfchain.
This subpackage provides an Executor
implementation backed by an in‑memory pandas.DataFrame.
The main entry point is PandasExecutor, which wraps a single
dataframe and exposes the generic executor interface used throughout
dfchain.
Typical usage:
import pandas as pd
from dfchain.backends.pandas import PandasExecutor
df = pd.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
ex = PandasExecutor().df(df).build()
# iterate as a single group
for key, group in ex.iter_groups():
...