Adds TransformNode to FuzzyFind Author Names

This commit is contained in:
quorploop 2025-12-23 17:53:37 +01:00
parent 64df8fb328
commit 72765532d3
11 changed files with 696 additions and 58 deletions

View file

@ -6,10 +6,15 @@ Data transformation pipeline for the Knack scraper project.
This folder contains the transformation logic that processes data from the SQLite database. It runs on a scheduled basis (every weekend) via cron.
The pipeline supports **parallel execution** of independent transform nodes, allowing you to leverage multi-core processors for faster data transformation.
## Structure
- `base.py` - Abstract base class for transform nodes
- `main.py` - Main entry point and pipeline orchestration
- `pipeline.py` - Parallel pipeline orchestration system
- `main.py` - Main entry point and pipeline execution
- `author_node.py` - NER-based author classification node
- `example_node.py` - Template for creating new nodes
- `Dockerfile` - Docker image configuration with cron setup
- `requirements.txt` - Python dependencies