# Understanding Krawler
krawler is powered by Feathers (opens new window) and rely on two of its main abstractions: services (opens new window) and hooks (opens new window). We assume you are familiar with this technology.
# Main concepts
krawler manipulates three kind of entities:
- a store define where the extracted/processed data will reside,
- a task define what data to be extracted and how to query it,
- a job define what tasks to be run to fulfill a request (i.e. sequencing).
On top of this hooks (opens new window) provide a set of functions that can be typically run before/after a task/job such as a conversion after a download or task generation before a job run. More or less, this allows to create a processing pipeline (opens new window).
Regarding the store management we rely on abstract-blob-store (opens new window), which abstracts a lot of different storage backends (local file system, AWS S3, Google Drive, etc.), and is already used by feathers-blob (opens new window).
# Global overview
The following figure depicts the global architecture and all concepts at play:
# What is inside ?
krawler is possible and mainly powered by the following stack:
- Feathers (opens new window)
- Lodash (opens new window) A JavaScript utility library
- node-gdal (opens new window) the Node.js binding of GDAL / OGR (opens new window) used to process rasters and vectors
- js-yaml (opens new window) used to process YAML files
- xml2js (opens new window) used to process XML files
- json2csv (opens new window) used to process CSV files
- fast-csv (opens new window) used to stream CSV files
- abstract-blob-store (opens new window) used to abstract storage
- request (opens new window) used to manage HTTP requests
- node-postgres (opens new window) used to manage PostgreSQL database
- node-mongodb (opens new window) used to manage MongoDB database