Release and Version History¶
x.y.z (Backlog)¶
Features and Improvements
Minor Improvements
Bugfixes
Miscellaneous
1.0.0 (2025-06-23)¶
🎉 Major Release - Complete Architectural Rewrite
This release represents a fundamental transformation of s3manifesto from a basic file grouping utility into a comprehensive enterprise-grade manifest system for big data ETL orchestration.
💥 Breaking Changes
Complete API redesign with immutable dataclasses (
frozen=True, slots=True)Method signatures changed:
partition_files_by_size()andpartition_files_by_n_record()replace previous grouping methodsAll data structures now use modern type hints with
typing_extensionsDrop support for Python 3.8 and earlier; minimum requirement is now Python 3.9
Features and Improvements
- 🚀 Revolutionary Performance Improvements
Heap-optimized Best Fit Decreasing (BFD) Algorithm: O(n log k) complexity vs O(n×k) for large datasets
10x performance gains: 10K files creating 1K groups: ~0.1s vs ~10s
Scalability breakthrough: Handles 1000+ groups with minimal performance degradation
Memory optimization: Single-pass calculation for size, n_record, and fingerprint
- 🏗️ New Architectural Features
Dual-file Manifest System: Manifest Summary (JSON) + Manifest Data (Parquet) for optimal storage and retrieval
Fingerprinting: Unique MD5 hashes for change detection and cache invalidation
Hierarchical Grouping: Support for million+ file scenarios with bounded memory usage
- 🔧 Enhanced API and Integrations
Polars Integration: Native DataFrame operations with
dump_many_to_dataframe()andload_many_from_dataframe()S3-native Operations: Direct AWS S3 integration with automatic compression
Human-readable Properties:
size_for_humanacross all data classesComprehensive Type Safety: 100% type coverage with modern Python type hints
- 🎯 Divide-and-Conquer Philosophy
Separation of Concerns: Clean separation between metadata collection, partitioning algorithms, and execution
Composable Abstractions: Works as both in-memory calculator and persistent storage
Miscellaneous
- 📚 Documentation and Developer Experience
Comprehensive Documentation: Philosophy guide, performance patterns, and real-world examples
Jupyter-style Tutorials: Progressive learning with 8 hands-on sections
Research Documentation: Algorithm comparisons and performance benchmarks translated to English
🔄 Migration Guide
Users upgrading from 0.x versions should:
Update method calls:
group_files_into_tasks_by_*→partition_files_by_*Handle new return types: Methods now return
DataFileGroupobjects with metadataLeverage fingerprints: Use
manifest.fingerprintfor change detection workflows
This 1.0.0 release establishes s3manifesto as the foundational tool for big data divide-and-conquer workflows, providing the missing link between file discovery and distributed execution in modern data lake architectures.
0.4.1 (2024-08-10)¶
Features and Improvements
Add
ManifestFile.detailsattribute. It is a dictionary that contains additional information about the manifest file.
0.3.1 (2024-08-10)¶
💥Breaking Changes
group_files,ManifestFile.group_files_into_tasks_by_size,ManifestFile.group_files_into_tasks_by_n_recordnow returnsList[Tuple[List[T_DATA_FILE, int]]]instead ofList[List[T_DATA_FILE]].
0.2.1 (2024-08-10)¶
💥Breaking Changes
Remove
md5key from Data File typed dictionary, addetagkey instead.Remove
ManifestFile.group_files_into_tasksfrom public API.
Features and Improvements
- Add the following public APIs:
ManifestFile.group_files_into_tasks_by_sizeManifestFile.group_files_into_tasks_by_n_record
Add fingerprint attribute to
ManifestFileclass. It is a unique fingerprint for the manifest file. It is calculated based on the URI and ETag of the data files.
Minor Improvements
Bugfixes
Miscellaneous
0.1.1 (2024-08-08)¶
First release