Adds TransformNode to FuzzyFind Author Names
This commit is contained in:
parent
64df8fb328
commit
72765532d3
11 changed files with 696 additions and 58 deletions
|
|
@ -6,10 +6,15 @@ Data transformation pipeline for the Knack scraper project.
|
|||
|
||||
This folder contains the transformation logic that processes data from the SQLite database. It runs on a scheduled basis (every weekend) via cron.
|
||||
|
||||
The pipeline supports **parallel execution** of independent transform nodes, allowing you to leverage multi-core processors for faster data transformation.
|
||||
|
||||
## Structure
|
||||
|
||||
- `base.py` - Abstract base class for transform nodes
|
||||
- `main.py` - Main entry point and pipeline orchestration
|
||||
- `pipeline.py` - Parallel pipeline orchestration system
|
||||
- `main.py` - Main entry point and pipeline execution
|
||||
- `author_node.py` - NER-based author classification node
|
||||
- `example_node.py` - Template for creating new nodes
|
||||
- `Dockerfile` - Docker image configuration with cron setup
|
||||
- `requirements.txt` - Python dependencies
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue