model¶
Data model classes.
- class s3manifesto.model.Base[source]¶
Base class providing common functionality for all data model classes.
Enables efficient serialization and deserialization for distributed processing where task definitions need to be passed between workers and coordinators.
- class s3manifesto.model.FileSpec(uri: str, value: int)[source]¶
Lightweight file specification containing URI and a numeric value for grouping.
Essential for divide-and-conquer algorithms that need to partition files by size or record count without loading full metadata, enabling efficient task distribution.
- Parameters:
uri – Unique identifier for the file location
value – Numeric value used for grouping (size in bytes or record count)
- class s3manifesto.model.GroupSpec(file_specs: List[FileSpec], value: int)[source]¶
Represents a balanced group of files with their collective value for optimal task sizing.
Critical for divide-and-conquer processing where work must be distributed evenly across parallel workers, ensuring consistent resource utilization and predictable execution times.
- Parameters:
file_specs – List of
FileSpecgrouped togethervalue – Total combined value of all files in this group
- class s3manifesto.model.DataFile(uri: str, etag: str | None = None, size: int | None = None, n_record: int | None = None)[source]¶
Complete metadata specification for a data file including integrity and size information.
Enables divide-and-conquer workflows to make informed decisions about task partitioning while providing data integrity verification through ETags for reliable distributed processing.
- Parameters:
uri – Unique S3 URI or file path identifier
etag – AWS S3 ETag for data integrity verification
size – File size in bytes for resource planning
n_record – Number of records for workload estimation
- class s3manifesto.model.DataFileGroup(data_files: List[DataFile], attr_name: str, value: int)[source]¶
A collection of
DataFilegrouped together for optimal parallel processing.Facilitates divide-and-conquer strategies by providing ready-to-execute task units where each group represents a balanced workload for distributed worker nodes.
- Parameters:
data_files – List of DataFile objects that should be processed together
value – Total aggregated value (size or record count) for the entire group
- class s3manifesto.model.ManifestSummary(manifest: str, size: int | None = None, n_record: int | None = None, fingerprint: str | None = None, details: dict[str, ~typing.Any] = <factory>)[source]¶
Compact summary metadata for a manifest file providing quick access to aggregate statistics.
Enables divide-and-conquer coordinators to make informed decisions about task distribution without loading the full manifest data, optimizing planning overhead in large-scale processing.
- Parameters:
manifest – URI reference to the associated manifest data file
size – Total aggregate size in bytes of all files in the manifest
n_record – Total aggregate record count across all files in the manifest
fingerprint – Unique hash for detecting data changes and cache invalidation
details – Additional metadata for workflow-specific information