Adds TransformNode to FuzzyFind Author Names

2025-12-23 17:53:37 +01:00 · 2025-12-23 17:53:37 +01:00 · 72765532d3
commit 72765532d3
parent 64df8fb328
11 changed files with 696 additions and 58 deletions
--- a/transform/README.md
+++ b/transform/README.md
@ -6,10 +6,15 @@ Data transformation pipeline for the Knack scraper project.

 This folder contains the transformation logic that processes data from the SQLite database. It runs on a scheduled basis (every weekend) via cron.

+The pipeline supports **parallel execution** of independent transform nodes, allowing you to leverage multi-core processors for faster data transformation.
+
 ## Structure

 - `base.py` - Abstract base class for transform nodes
- `main.py` - Main entry point and pipeline orchestration
+- `pipeline.py` - Parallel pipeline orchestration system
+- `main.py` - Main entry point and pipeline execution
+- `author_node.py` - NER-based author classification node
+- `example_node.py` - Template for creating new nodes
 - `Dockerfile` - Docker image configuration with cron setup
 - `requirements.txt` - Python dependencies