Hooks
TIP
Although all are not applicable, common FeathersJS hooks are exposed in addition to krawler hooks and can be used in jobs, e.g. you can add disallow: 'external'
to avoid exposing some services when deploying as a web app.
Common options
All hooks can have the following options:
- match: a match filter to be applied on the input hook data using any option supported by sift, fields can be templates, learn more about templating, if the data is filtered the hook will not be applied
- predicate: an additional predicate function taking the hook item as input and returning true if matching occurs (can be async)
- faultTolerant: will catch any error raised in the hook so that the hook chain will continue anyway
TIP
Due to templating restricted to string output any ISO date string or comparison operator value in the match filter will be converted back to native types so that matching will work as expected in JS
Matching is for instance useful when you'd like to apply a hook to only a subset of your tasks, e.g. all the CSV files but not the JSON files.
Fault tolerance is for instance useful when you use unreliable data sources and you don't want the job to stop when some requests fail.
All input/output hooks and store hooks manipulating items, i.e. reading/writing/transforming/removing data in a store like readJson
, writeJson
or gzipToStore
, can have the following options:
- storePath: property path where to read the store to be used on the hook object or params, defaults to
data.store
- store: property containing the ID of the store to be used, not defined by default
- key: input/output key for the file in store, can be a template with item as context, learn more about templating
- storageOptions: write options for the underlying store
Authentication
basicAuth(options)
Add header to HTTP requests for basic authorization, hook options are the following:
- type: type of authorization used as the key in the header, defaults to
Authorization
but could be changed toProxy-Authorization
for instance - optionsPath: the property path to the request options that contains the authorization options, defaults to
options
The authorization options have to be structured like this, e.g. on a task (or similarly on a task template in a job):
httpTask: {
type: 'http',
options: {
// Target request URL
url: 'xxx',
auth: {
// Your user identity
user: 'yyy',
password: 'zzz'
}
}
}
httpTask: {
type: 'http',
options: {
// Target request URL
url: 'xxx',
auth: {
// Your user identity
user: 'yyy',
password: 'zzz'
}
}
}
You can also send authentication information as form data like this:
auth: {
// The login URL receiving form data
url: 'xxx'
// Your user identity to be send as form data
form: {
user: 'yyy',
password: 'zzz'
},
// Set this to enable cookie
jar: true
}
auth: {
// The login URL receiving form data
url: 'xxx'
// Your user identity to be send as form data
form: {
user: 'yyy',
password: 'zzz'
},
// Set this to enable cookie
jar: true
}
OAuth(options)
Add header with a token retrieved from an OAuth authorization server to HTTP requests, hook options are the following:
- type: type of authorization used as the key in the header, defaults to
Authorization
but could be changed toProxy-Authorization
for instance - optionsPath: the property path to the request options that contains the authorization options, defaults to
options
The authorization options have to be structured like this, e.g. on a task (or similarly on a task template in a job):
httpTask: {
type: 'http',
options: {
// Target request URL
url: 'www',
oauth: {
// Token endpoint
url: 'xxx',
// Your client identity
client_id: 'yyy',
client_secret: 'zzz',
// Client authentication method to be used to get access token
method: 'client_secret_post' // Or 'client_secret_basic'
}
}
}
httpTask: {
type: 'http',
options: {
// Target request URL
url: 'www',
oauth: {
// Token endpoint
url: 'xxx',
// Your client identity
client_id: 'yyy',
client_secret: 'zzz',
// Client authentication method to be used to get access token
method: 'client_secret_post' // Or 'client_secret_basic'
}
}
}
Clearing
clearOutputs(options)
Clear output files generated by tasks and hooks, hook options are the following:
- storePath: see description in common options
- store: see description in common options
- type: the type of output to be cleared by this hook, defaults to
intermediate
clearData(options)
Clear output data generated by hooks, hook options are the following:
- dataPath: property path to clean on the hook object, defaults to
result.data
TIP
Use this hook if you load large datasets (e.g. JSON files) because all hook data are still referenced in memory until the job is finished
CSV
readCSV(options)
Read a CSV from an input stream/store and convert it to in-memory JSON values, hook options are the following:
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- any option supported by Papaparse parse config object
writeCSV(options)
Generate a CSV file from in-memory JSON values, hook options are the following:
- dataPath: property path where to read the input JSON object on the hook object, defaults to
result
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- outputType: the type of output produced by this hook, defaults to
intermediate
- any option supported by Papaparse unparse config object
mergeCSV(options)
Generate a CSV file from a set of input CSV files, hook options are the following:
- storePath: see description in common options
- store: see description in common options
- mergeKey: input key for the CSV files to be merged in store, must be a template with item as context, learn more about templating
- key: see description in common options
- outputType: the type of output produced by this hook, defaults to
intermediate
- parse any option supported by Papaparse parse config object
- unparse any option supported by Papaparse unparse config object
The input hook result is expected to be an array of tasks which output will be read back from the store.
Docker
The Docker hooks allows you to interact with a Docker daemon. It is based on dockerode, a Docker remote API.
connectDocker(options)
Connect to the Docker daemon. The connection options of the client are defined in the hook options plus:
- clientPath: property path where to store the client object to be used by the Docker hooks, defaults to
client
disconnectDocker(options)
Disconnect from the Docker daemon. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
pullDockerImage(options)
Pull a docker image. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- any options supported by dockerode for image pulling
- clientPath: property path where to retrieve the client object, defaults to
client
TIP
options
can contain an auth
object to pull the image from a private repository.
createDockerContainer(options)
Run a docker container. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- any options supported by dockerode for container creation
TIP
Cmd
and Env
options can be templates, learn more about templating
createDockerService(options)
Create a docker service on a Swarm cluster. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- any options supported by dockerode for service creation
TIP
Options can be templates, learn more about templating
runDockerContainerCommand(options)
Run a command against a docker container. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- command: the name of the command to be run
- arguments: the arguments of the command to be run
- support any command/option supported by dockerode on containers
When the getArchive
command is used, additional hook options are the following:
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- outputType: the type of output produced by this hook, defaults to
intermediate
TIP
Cmd
, Env
and path
options can be templates, learn more about templating
TIP
The hook take care to wait for exec
to finish and automatically write the tar in the hook store for getArchive
FTP
TIP
FTP hooks rely on lftp. Consequently, uou need to have the executable lftp
installed on your computer.
listFTP(options)
List the files from a remote directory on the FTP server, hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- remoteDir: the remote directory to list
- key: see description in common options
globFTP(options)
List the files from a remote directory with names matching a pattern, hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- remoteDir: the remote directory to list
- key: see description in common options
- pattern: the pattern use to match the file names, default to
*
getFTP(options)
Get a remote file from the FTP server, hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- remoteFile: the file to be copied on the FTP server
- localFile: the destination file on the local host
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
putFTP(options)
Put a local file on the FTP server, hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
Geographic grid
generateGrid(options)
Generate geographic grid parameters from an input location (grid center), area width and resolution. Required input hook data are the following:
- longitude: grid center longitude (in degrees)
- latitude: grid center latitude (in degrees)
- resolution: grid cell resolution (in meters)
- halfWidth: grid half-width (in meters)
- blockResolution: grid block resolution (in meters)
Output hook data are the following:
- origin: grid bounding box minimum longitude and latitude (in degrees)
- size: number of grid cells in longitude and latitude
- resolution: grid cell resolution (in degrees) for longitude and latitude
- nbBlocks: number of grid blocks in longitude and latitude
- blockSize: number of grid cells in longitude and latitude within each block
- blockResolution: grid block resolution (in degrees) for longitude and latitude
WARNING
This hook works only for EPSG 4326
generateGridTasks(options)
Generate WMS/WCS request tasks to download data for each cell of a geographic grid (see previous hook for grid definition). It is intended to be used a a job hook and the required input data on the task template are the following:
- type: task type (either
wms
orwcs
) - options.version: WMS/WCS service version
- options.longitudeLabel: name of the longitude axis in WCS service
- options.latitudeLabel: name of the latitude axis in WCS service
WARNING
This hook works only for EPSG 4326
resampleGrid(options)
A lot of geographical data (e.g. weather data) are distributed as gridded data, which is two-dimensional data representing an element value along an evenly spaced matrix of geographical positions. Usually, the grid has a longitude (x-axis or width) and a latitude (y-axis or height) dimension and is computed using the Equirectangular projection with a constant spacing called the resolution of the grid. The gridded data is assumed to be internally stored as a Javascript array (1D).
You can use this hook to compute element value at any location from input gridded data (a process called interpolation) with the following options:
- input: input grid specification
- bounds: the geographical bounds covered by the input grid as an array of decimal values
[min longitude, min latitude, max longitude, max latitude]
, - origin: the geographical origin of the input data grid as an array of decimal values
[longitude origin, latitude origin]
, - size: the size of the input data grid as an array of integer values
[width, height]
, - resolution : the geographical resolution of the input data grid as an array of decimal values
[longitude resolution, latitude resolution]
- bounds: the geographical bounds covered by the input grid as an array of decimal values
- output: output/resampled grid specification
- origin: the geographical origin of the data grid as an array of decimal values
[longitude origin, latitude origin]
, - size: the size of the data grid as an array of integer values
[width, height]
, - resolution: the geographical resolution of the data grid as an array of decimal values
[longitude resolution, latitude resolution]
- origin: the geographical origin of the data grid as an array of decimal values
WARNING
The values of the element are assumed to be the one measured at the grid vertices
IMAP
TIP
IMAP hooks rely on ImapFlow project.
connectIMAP
Connect to an IMAP server. The connection options of the client are defined in the hook options plus:
- clientPath: property path where to store the client object created when getting connected, defaults to
client
.
disconnectIMAP
Disconnect from an IMAP server. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
.
listIMAPMailboxes
List the available mailboxes from an IMAP server. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
. - any option supported by ImapFlow list function
fetchIMAPMessages
Fetch messages from an IMAP server. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
. - mailbox: the mailbox where to fetch the messages.
- range: the ImapFlow range parameter
- query: the ImapFlow query parameter
- any option supported by ImapFlow fetch function
The following example fetches unseen messages:
fetchIMAPMessages: {
mailbox: 'INBOX',
range: { seen: false },
query: { uid: true },
clientPath: 'taskTemplate.imapClient',
dataPath: 'data.messages'
}
fetchIMAPMessages: {
mailbox: 'INBOX',
range: { seen: false },
query: { uid: true },
clientPath: 'taskTemplate.imapClient',
dataPath: 'data.messages'
}
downloadIMAPAttachments
Download attachments from messages. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
. - mailbox: the mailbox where to downlaod the atachments.
- range: the ImapFlow range parameter
- type: the attachment content type.
- any option supported by ImapFlow download function
flagIMAPMessages
Add flags to messages. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
. - mailbox: the mailbox where to downlaod the atachments.
- range: the ImapFlow range parameter
- flags: the array of flags to be added
- any option supported by ImapFlow messageFlagsAdd function
unflagIMAPMessages
Remove flags from messages. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
. - mailbox: the mailbox where to downlaod the atachments.
- range: the ImapFlow range parameter
- flags: the array of flags to be removed
- any option supported by ImapFlow messageFlagsRemove function
deleteIMAPMessages
Delete messages from an IMAP server. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
. - mailbox: the mailbox where to downlaod the atachments.
- range: the ImapFlow range parameter
- any option supported by ImapFlow messageDelete function
JSON
readJson(options)
Read a JSON from an input stream/store and convert it to in-memory JSON values, hook options are the following:
- objectPath: property path where to read the JSON object in the JSON coming from the store, not defined by default so that the whole JSON is retrieved
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- transform: perform transformation using these options after read, see description in transformJson
- features: this boolean indicates if only the features are extracted when reading a GeoJson collection, defaults to
false
writeJson(options)
Generate a JSON file from in-memory JSON values, hook options are the following:
- dataPath: property path where to read the input JSON object on the hook object, defaults to
result
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- outputType: the type of output produced by this hook, defaults to
intermediate
- transform: perform transformation using these options before write, see description in transformJson
transformJson(options)
Restructure in-memory JSON values, hook options are the following:
- dataPath: property path where to read the input JSON object on the hook object, defaults to
result.data
- transformPath: property path where to read/write the JSON part to be transformed in the input JSON object
- inputPath: property path where to read the JSON part to be transformed in the input JSON object
- outputPath: property path where to write the JSON part to be transformed in the input JSON object
- toArray: boolean indicating if the JSON object will be transformed into an array using Lodash, defaults to
false
- toObjects: if your input JSON objects are flat arrays it will be transformed into objects according to the given indexed list of property names to be used as keys, not defined by default
- filter: a filter to be applied on the JSON object using any option supported by sift
- mapping: a map between input key path and output key path supporting dot notation, the values of the map can also be a structure like this:
- path: output key path
- value: a map between input values and output values
- delete: boolean indicating if the input key path should be deleted or not after mapping
- unitMapping: a map between input key path supporting dot notation and from/to units to convert using math.js for numbers or moment.js for dates, a value of the map is a structure like this:
- from: the unit or date format to convert from, e.g.
feet
orYYYY-MM-DD HH:mm:ss.SSS
- to: the unit or date format to convert to, e.g.
m
orMM-DD-YYYY HH:mm:ss.SSS
, if given for a date the date object will be converted back to string - asDate: mandatory to indicate if the value is a date, could be
utc
orlocal
to interpret it as UTC or Local Time - asString: mandatory to convert numbers to strings, indicates the radix to be used if any
- asNumber: mandatory to convert strings to numbers
- asCase: target case to be used as the name of a Lodash (e.g.
lowerCase
) or JS string (e.g.toUpperCase
) case conversion function - empty: value to be set if the input value is empty
- from: the unit or date format to convert from, e.g.
- pick: an array of properties to be picked using Lodash
- omit: an array of properties to be omitted using Lodash
- merge: an object to be merged with each JSON objects using Lodash
- asObject: this boolean indicates if the output should be transformed into an object if the array contains a single object, defaults to
false
- asArray: this boolean indicates if the output should be transformed into an array containing the object, defaults to
false
- inPlace: this boolean indicates if the input data is transformed in place or simply before writing it when part of a write hook, defaults to
true
Example:
toArray: true, // The following input object { 1: { property: 'a' }, 2: { property: 'b' } } will be transformed into [{ property: 'a' }, { property: 'b' }]
toObjects: ['1', '2'], // The following input object ['a', 'b'] will be transformed into { 1: 'a', 2: 'b' }
mapping: {
sourceProperty: 'targetProperty',
sourceProperty: {
path: 'targetProperty',
values: {
'a': 'c' // Will map { xxx: 'a' } to { yyy: 'c' }
}
},
'source.property': 'target.property',
sourceProperty: 'targetArrayProperty[0]'
},
unitMapping: {
property: { from: 'feet', to: 'm' } // This one will be converted from feet to meters
},
pick: ['onlyThisPropertyWillBeKept'],
omit: ['onlyThisPropertyWillBeRemoved'],
merge: { newProperty: 'will be added to the final objects' }
toArray: true, // The following input object { 1: { property: 'a' }, 2: { property: 'b' } } will be transformed into [{ property: 'a' }, { property: 'b' }]
toObjects: ['1', '2'], // The following input object ['a', 'b'] will be transformed into { 1: 'a', 2: 'b' }
mapping: {
sourceProperty: 'targetProperty',
sourceProperty: {
path: 'targetProperty',
values: {
'a': 'c' // Will map { xxx: 'a' } to { yyy: 'c' }
}
},
'source.property': 'target.property',
sourceProperty: 'targetArrayProperty[0]'
},
unitMapping: {
property: { from: 'feet', to: 'm' } // This one will be converted from feet to meters
},
pick: ['onlyThisPropertyWillBeKept'],
omit: ['onlyThisPropertyWillBeRemoved'],
merge: { newProperty: 'will be added to the final objects' }
TIP
The transformations are applied in the order of the documentation, e.g. filtering occurs before mapping.
mergeJson(options)
Generate a in-memory JSON object from a set of input in-memory JSON objects, hook options are the following:
- mergeBy: property name to be used as a unique identifier to perform merging using Lodash, it can also be a function returning a unique identifier
- deep: this boolean indicates if properties from multiple objects with the same identifier are merged, otherwise only the first object matching the merge condition will be kept, defaults to
false
- sortBy: property name to be used as value for sorting items prior merging using Lodash, it can also be a function returning a unique identifier
- transform: perform transformation using these options before deep merging objects with the same identifier, see description in transformJson
- dataPath: property path where to read the input JSON object on the result hook objects, defaults to
data
The input hook result is expected to be an array of tasks which output will be read in-memory.
writeTemplate(options)
Generate a file from an input template and injected in-memory JSON values, hook options are the following:
- dataPath: property path where to read the input JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- templateStorePath: property path where to read the store to be used for reading template on the hook object or params, defaults to
data.templateStore
- templateStore: property containing the ID of the store to be used for reading template, not defined by default
- templateFile: file name of the template file to be used
- outputType: the type of output produced by this hook, defaults to
intermediate
TIP
Learn more about templating
GeoJSON
readSequentialGeoJson(options)
Read a JSON from an input stream/store and convert it to in-memory JSON values, hook options are the following:
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- transform: perform transformation using these options after read, see description in transformJson
- asFeatureCollection: this boolean indicates if the resulting JSON object is converted to a GeoJson collection, otherwise it will be an array of GeoJSON features, defaults to
false
convertToGeoJson(options)
Convert in-memory JSON values to a GeoJSON collection. For each in-memory object, the hook generates a corresponding GeoJSON feature using specific properties to build the geometry
property. For now, it only allows to generate features of type of Point
. Moreover, the coordinate reference system is a geographic coordinate reference system, using the World Geodetic System 1984 (WGS 84), with longitude and latitude expressed in decimal degrees. The entire object is stored under the properties
property of the feature. Hook options are the following:
- longitude: property path where to read the longitude value defaults to
longitude
- latitude: property path where to read the latitude value defaults to
latitude
- altitude: property path where to read the altitude value defaults to
altitude
- keepGeometryProperties: boolean indicating if longitude, latitude and altitude values are also kept as properties, defaults to
true
convertOSMToGeoJson(options)
Convert in-memory OSM JSON values to a GeoJSON collection. It relies on osmtogeojson. Hook options are the following:
- any option supported by the osmtogeojson API
- dataPath: property path where to read the OSM object, defaults to
result.data
reprojectGeoJson(options)
Reproject a GeoJSON from a given projection system to another one, hook options are the following:
- from: EPSG code of the input projection, defaults to
EPSG:4326
- to: EPSG code of the output projection, defaults to
EPSG:4326
- dataPath: property path where to store the resulting GeoJSON object on the hook object, defaults to
result.data
KML
readKML(options)
Read a KML from an input stream/store and convert it to in-memory JSON values, hook options are the following:
- objectPath: property path where to read the KML object in the KML coming from the store, not defined by default so that the whole KML is retrieved
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- transform: perform transformation using these options after read, see description in transformJson
- features: this boolean indicates if only the features are extracted when reading a GeoJson collection, defaults to
false
MongoDB
connectMongo(options)
Connect to a MongoDB database. The connection options of the client are defined in the hook options plus:
- url: MongoDB URI connection string
- dbName: the name of the DB to connect to
- clientPath: property path where to store the client object to be used by the MongoDB hooks, defaults to
client
TIP
Since Krawler relies on the version 3.1.13 of the MongoDB driver, it automatically adds the option useNewUrlParser: true
when connecting to the database.
disconnectMongo(options)
Disconnect from a MongoDB database. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
dropMongoCollection(options)
Drop if exists a collection in a MongoDB database. Hook options are the following:
- collection: the name of the collection to be removed, defaults to the hook object ID
- clientPath: property path where to retrieve the client object, defaults to
client
createMongoCollection(options)
Create a collection in a MongoDB database. Hook options are the following:
- collection: the name of the collection to be created, defaults to the hook object ID
- index/indices: the specification of the index associated to the collection, uses an array as
indices
if multiple indices are provided - clientPath: property path where to retrieve the client object, defaults to
client
dropMongoIndex(options)
Drop if exists a collection index in a MongoDB database. Hook options are the following:
- collection: the name of the collection to be removed, defaults to the hook object ID
- index: the specification of the index associated to the collection
- clientPath: property path where to retrieve the client object, defaults to
client
createMongoIndex(options)
Create a collection index in a MongoDB database. Hook options are the following:
- collection: the name of the collection to be created, defaults to the hook object ID
- index: the specification of the index associated to the collection
- clientPath: property path where to retrieve the client object, defaults to
client
readMongoCollection(options)
Read JSON documents from an existing collection. Hook options are the following:
- collection: the name of the collection to be read, defaults to the hook object ID
- dataPath: property path where to write the output JSON objects on the hook object, defaults to
data.result
- clientPath: property path where to retrieve the client object, defaults to
client
- transform: perform transformation using these options after read, see description in transformJson
- query: find query to be performed, fields can be templates, learn more about templating
- excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number
- project: project options for cursor
- sort: sort options for cursor
- skip: skip options for cursor
- limit: limit options for cursor
WARNING
Due to templating restricted to string output any ISO date string or comparison operator value in the query object will be automatically converted back to native types so that matching will work as expected in JS
writeMongoCollection(options)
Inserts JSON into an existing collection (uses insertOne operations under-the-hood). Hook options are the following:
- collection: the name of the collection to be written, defaults to the hook object ID
- dataPath: property path where to read the input JSON object on the hook object, defaults to
data.result
- chunkSize: number of GeoJson features for the batch insert
- clientPath: property path where to retrieve the client object, defaults to
client
- transform: perform transformation using these options before write, see description in transformJson
- any option supported by
options
argument of the bulkWrite function.
TIP
If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.
updateMongoCollection(options)
Updates JSON into an existing collection (uses updateOne operations under-the-hood). Hook options are the following:
- collection: the name of the collection to be written, defaults to the hook object ID
- dataPath: property path where to read the input JSON object on the hook object, defaults to
data.result
- chunkSize: number of GeoJson features for the batch insert
- clientPath: property path where to retrieve the client object, defaults to
client
- transform: perform transformation using these options before update, see description in transformJson
- filter/upsert/hint: corresponding option for
updateOne
operation, filter fields can be templates, learn more about templating - excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number
- any option supported by
options
argument of the bulkWrite function.
TIP
If the input data is a GeoJSON collection the array of features will be updated into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.
deleteMongoCollection(options)
Removes documents from an existing collection (uses deleteMany operations under-the-hood). Hook options are the following:
- collection: the name of the collection to remove documents from, defaults to the hook object ID
- filter: deletion criteria for
deleteMany
operation, fields can be templates, learn more about templating - excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number
createMongoAggregation(options)
Creates an aggregation pipeline on an existing collection. Hook options are the following:
- collection: the name of the collection to used, defaults to the hook object ID
- dataPath: property path where to store the result of the aggregation, defaults to
data.result
- clientPath: property path where to retrieve the client object, defaults to
client
- transform: perform transformation using these options before write, see description in transformJson
- pipeline: the aggregation pipeline to be executed
- any option supported by
options
argument of the aggregate function.
TIP
If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.
dropMongoBucket(options)
Drop if exists a bucket in a MongoDB database. Hook options are the following:
- bucket: the name of the bucket to be removed, defaults to the hook object ID
- clientPath: property path where to retrieve the client object, defaults to
client
createMongoBucket(options)
Create a bucket in a MongoDB database. Hook options are the following:
- bucket: the name of the bucket to be created, defaults to the hook object ID
- clientPath: property path where to retrieve the client object, defaults to
client
readMongoBucket(options)
Read file from an existing bucket. Hook options are the following:
- bucket: the name of the bucket to be read, defaults to the hook object ID
- storePath: see description in common options, specify store to write file to
- store: see description in common options, specify store to write file to
- key: see description in common options, defaults to the hook object ID
writeMongoBucket(options)
Insert file into an existing bucket. Hook options are the following:
- bucket: the name of the bucket to be written, defaults to the hook object ID
- storePath: see description in common options, specify store to read file from
- store: see description in common options, specify store to read file from
- key: see description in common options, defaults to the hook object ID
If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections
Feathers
connectFeathers(options)
Connect to a Feathers API. The connection options of the client are defined in the hook options plus:
- distributed: Boolean indicating if the target service is retrieved using distribution (you will need to set the
distribution
job options and CLIapi
option), in this case you don't need the others properties - origin: Feathers connection URL
- path: the Feathers API path prefix if any
- authentication: the Feathers API authentication options if any (including service
path
) - clientPath: property path where to store the client object to be used by the Feathers hooks, defaults to
client
TIP
Krawler uses on the version 5 of the Feathers client.
disconnectFeathers(options)
Disconnect from a Feathers API. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
callFeathersServiceMethod(options)
Performs a service operation using the API. Hook options are the following:
- service: the name of the service to be used, defaults to the hook object ID
- method: the name of the method to be called, defaults to
find
- id: the ID of the item to read/write, defaults to item ID
- data: the data payload of the operation, if not given will be hook item data
- dataPath: property path where to read/write the input/output JSON objects on the hook object, defaults to
data.result
- chunkSize: number of item for a multi operation
- clientPath: property path where to retrieve the client object, defaults to
client
- transform: perform transformation using these options after/before read/write, see description in transformJson
- query: operation query to be performed (use only if not giving the whole params object), fields can be templates, learn more about templating
- excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number
- params: operation params to be used, fields can be templates, learn more about templating
- updateResult: if
true
service operation results will not replace item data (default for read operations)
WARNING
Due to templating restricted to string output any ISO date string or comparison operator value in the query object will be automatically converted back to native types so that matching will work as expected in JS
writeMongoCollection(options)
Inserts JSON into an existing collection (uses insertOne operations under-the-hood). Hook options are the following:
- collection: the name of the collection to be written, defaults to the hook object ID
- dataPath: property path where to read the input JSON object on the hook object, defaults to
data.result
- chunkSize: number of GeoJson features for the batch insert
- clientPath: property path where to retrieve the client object, defaults to
client
- transform: perform transformation using these options before write, see description in transformJson
- any option supported by
options
argument of the bulkWrite function.
TIP
If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.
Numerical Weather Prediction
Numerical Weather Prediction (NWP) data are now available from the major meteorological agencies and institutions on a day-to-day basis. These hooks aim at gathering weather forecast data generated by forecast models easily.
Each forecast model output hundreds of forecast elements (a.k.a. meteorological elements) such as temperature, wind direction, etc. The production of a set of forecast data is called a run of the model and occurs on a regular daily basis, e.g. every 6 hours. The spatial properties of a model are completely defined by a longitude/latitude grid and a set of altitude levels (meter or pressure scale). The temporal properties are defined by interval values describing at which frequency/time the forecast data are produced (a.k.a. run interval) and which time steps are available (a.k.a. forecast interval).
generateNwpTasks(options)
Generate tasks to download data for each variable. It is intended to be used a job hook and the required hook options (can be overriden by input data) are the following:
- elements: the array of meteorological elements to be retrieved
- runInterval: the run interval in seconds
- runIndex: the index of the run to be retrieved, 0 means nearest from current time, -1 the previous one, etc.
- interval: the forecast interval in seconds
- lowerLimit: the lowest offset in seconds from which forecast data are retrieved (e.g. 3600 means we start gathering at T0 + 1h)
- upperLimit: the highest offset in seconds at which forecast data are not retrieved (e.g. 10800 means we stop gathering at T0 + 3h)
A task will be generated for each element, level and gathered forecast time with the following properties: level, runTime, forecastTime, timeOffset.
TIP
This hook is intended to work with task templating to generate the actual download tasks (e.g. HTTP or WCS request)
OGC
getCapabilities(options)
Execute a GetCapabilties
request to get the general information about an OGC service such as WMS, WCS, WPS... Hook options are the following:
- url: the bas url of the request to be executed
- service: the service to request
- token: an access token if required by the server
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
The following example illustrates how to use this hook:
getCapabilities: {
url: 'http://geoserver.kalisio.xyz/geoserver/Kalisio/wms',
service: 'WMS'
}
getCapabilities: {
url: 'http://geoserver.kalisio.xyz/geoserver/Kalisio/wms',
service: 'WMS'
}
PostgreSQL
connectPG(options)
Connect to a PostgreSQL database. The connection options of the client are defined in the hook options plus:
- clientPath: property path where to store the client object to be used by the PostgreSQL hooks, defaults to
client
Also, this hook allows you to use the same environment variables as node-postgres to store the connection information:
PGUSER=dbuser
PGPASSWORD=secretpassword
PGHOST=database.server.com
PGPORT=5432
PGDATABASE=database
Finaly and for some security reason, it is highly recommended to combine both ways such as in the following example:
connectPG: {
user: process.env.PG_USER,
password: process.env.PG_PASSWORD,
host: 'localhost',
database: 'test',
port: 5432,
clientPath: 'taskTemplate.client'
}
connectPG: {
user: process.env.PG_USER,
password: process.env.PG_PASSWORD,
host: 'localhost',
database: 'test',
port: 5432,
clientPath: 'taskTemplate.client'
}
disconnectPG(options)
Disconnect from a PostgresSQL database. Hook options are the following:
- clientPath: property path where to retrieve the client object, defaults to
client
dropPGTable(options)
Drop if exists a table in a PostgreSQL database. Hook options are the following:
- table: the name of the table to be removed, defaults to the hook object ID
- clientPath: property path where to retrieve the client object, defaults to
client
createPGTable(options)
Create a table in a PostgreSQL database with the following structure:
id
: a SERIAL (primary key)geom
: a PostGIS geometry of type ofPOINTZ
expressed in Geodetic reference system.properties
: an object of type of JSON.
For now the structure has been defined to store GeoJSON collection. Hook options are the following:
- table: the name of the table to be created, defaults to the hook object ID
- clientPath: property path where to retrieve the client object, defaults to
client
writePGTable(options)
Inserts a GeoJSON collection or an array of features into an existing table. THe table must have the same structured as a table created using the createPGTable hook. Hook options are the following:
- dataPath: property path where to read the input JSON object on the hook object, defaults to
data.result
- chunkSize: number of GeoJson features for the batch insert
- clientPath: property path where to retrieve the client object, defaults to
client
Raster
readGeoTiff(options)
Read a GeoTiff from an input stream/store and convert it to in-memory JSON values, hook options are the following:
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- fields: set of fields to be exported for each cell, if empty values will be directly exported as a JSON array, otherwise fields among the following can be selected
- x: pixel x-coordinate
- y: pixel y-coordinate
- bbox: pixel bounding box
- value: pixel value
computeStatistics(options)
Computes minimum and maximum values on a GeoTiff file, hook options are the following:
- min: boolean indicating if minimum value should be computed
- max: boolean indicating if maximum value should be computed
- statisticsPath: property path where to write the output statistics on the hook object, defaults to
result
Store
createStores(options)
Create (a set of) store(s), hook options are the (array of) following the following:
- any option supported by the stores service
- storePath: property path where to set the created store on the hook object, if not given the store will be created through the service but not stored on the hook
removeStores(options)
Remove (a set of) store(s), hook options are (array of) the following:
- id: the store ID
- storePath: property path where to unset the removed store on the hook object, if not given the store will be removed through service but not on the hook
TIP
As a shortcut the options provided can only be store IDs when storePath is not used
discardIfExistsInStore(options)
Discard the task if a target file already exists in an output store, hook options are the following:
- output: the output store options, see description in common options
copyToStore(options)
Copy the item(s) from an input store to an output store, hook options are the following:
- input: the input store options, see description in common options
- output: the output store options, see description in common options
gzipToStore(options)
Gzip the item(s) from an input store to an output store, hook options are the following:
- input: the input store options, see description in common options
- output: the output store options, see description in common options
gunzipFromStore(options)
Gunzip the item(s) from an input store to an output store, hook options are the following:
- input: the input store options, see description in common options
- output: the output store options, see description in common options
unzipFromStore(options)
Unzip the item(s) from an input store to an output store, hook options are the following:
- input: the input store options, see description in common options
- output: the output store options, see description in common options
- path: the output path in output store
System
tar(options)
Tar files or directories using node-tar, hook options are the following:
- files: array of paths to add to the tarball
- any option supported by node-tar for packing
TIP
file
, files
and cwd
options can be templates, learn more about templating
untar(options)
Untar files or directories using node-tar, hook options are the following:
- files: array of paths to extract from the tarball
- any option supported by node-tar for unpacking
TIP
file
, files
and cwd
options can be templates, learn more about templating
runCommand(options)
Run a system command. Hook options are the following:
- command: the template of the command to be run with the hook object as context (could be an array commands for a sequence)
- spawn:
true
to usechild_process.spawn
instead ofchild_process.exec
(default) to run the command(s), in that case a command is given as an array of args instead of a single string - stdout: boolean indicating if stdout is logged and stored in the hook object
- stderr: boolean indicating if stderr is logged and stored in the hook object
TIP
Learn more about templating
envsubst(options)
Provides file-level environment variable substitution. Hook options are the following:
- templateFile: the file to apply the substitution
- outputFile: the resulting file
- any option supported by envusb for substituting
TXT
readTXT(options)
Read a TXT from an input stream/store and convert it to in-memory JSON values, hook options are the following:
- objectPath: property path where to read the KML object in the KML coming from the store, not defined by default so that the whole KML is retrieved
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- transform: perform transformation using these options after read, see description in transformJson
Utils
generateId(options)
Generate a UUID (V1) for the item using node-uuid.
template(options)
Perform templating of the options using the item as context and merge it with item.
discardIf(options)
Discard all subsequent hooks and task if the input data passes the given match filter options, filter options are similar to the match filter described in common options.
apply(options)
Apply a given function to the hook item(s), hook options are the following:
- function: a function taking the hook item(s) as input and updating it (can be async)
healthcheck(options)
Apply a given function to the hook item(s) and healthcheck structure, hook options are the following:
- function: a function taking the hook item(s) and healthcheck structure as input and updating it
addOutputs(outputs)
Declare a new output for the job/task, hook options is an array of objects with the following properties:
- name: the name of the output
- type: the type of the output (defaults to
intermediate
so that it will be cleaned)
Tasks and write hooks automatically track generated outputs but sometimes outputs are generated by an external process (eg. command hook) so that you need to declare it in order to properly clean it with the clearOutputs hook.
runTask(options)
Run a given task, hook options are those of a task.
emitEvent(options)
Emit a 'krawler'
event on the underlying service, hook options are the following:
- type: the custom type of the event to be emitted
- any transformation option, see description in transformJson, the transformed object will be used as event payload in the
data
field
XML
readXML(options)
Read an XML file from a store and convert it to in-memory JSON values, hook options are the following:
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- parser: the parser options
- key: see description in common options
YAML
readYAML(options)
Read a YAML file from a store and convert it to in-memory JSON values, hook options are the following:
- dataPath: property path where to store the resulting JSON object on the hook object, defaults to
result.data
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
writeYAML(options)
Generate a YAML file from in-memory JSON values, hook options are the following:
- dataPath: property path where to read the input JSON object on the hook object, defaults to
result
- storePath: see description in common options
- store: see description in common options
- key: see description in common options
- outputType: the type of output produced by this hook, defaults to
intermediate
- any option supported by js-yaml