Skip to content

Hooks

TIP

Although all are not applicable, common FeathersJS hooks are exposed in addition to krawler hooks and can be used in jobs, e.g. you can add disallow: 'external' to avoid exposing some services when deploying as a web app.

Common options

All hooks can have the following options:

  • match: a match filter to be applied on the input hook data using any option supported by sift, fields can be templates, learn more about templating, if the data is filtered the hook will not be applied
  • predicate: an additional predicate function taking the hook item as input and returning true if matching occurs (can be async)
  • faultTolerant: will catch any error raised in the hook so that the hook chain will continue anyway

TIP

Due to templating restricted to string output any ISO date string or comparison operator value in the match filter will be converted back to native types so that matching will work as expected in JS

Matching is for instance useful when you'd like to apply a hook to only a subset of your tasks, e.g. all the CSV files but not the JSON files.

Fault tolerance is for instance useful when you use unreliable data sources and you don't want the job to stop when some requests fail.

All input/output hooks and store hooks manipulating items, i.e. reading/writing/transforming/removing data in a store like readJson, writeJson or gzipToStore, can have the following options:

  • storePath: property path where to read the store to be used on the hook object or params, defaults to data.store
  • store: property containing the ID of the store to be used, not defined by default
  • key: input/output key for the file in store, can be a template with item as context, learn more about templating
  • storageOptions: write options for the underlying store

Authentication

source

basicAuth(options)

Add header to HTTP requests for basic authorization, hook options are the following:

  • type: type of authorization used as the key in the header, defaults to Authorization but could be changed to Proxy-Authorization for instance
  • optionsPath: the property path to the request options that contains the authorization options, defaults to options

The authorization options have to be structured like this, e.g. on a task (or similarly on a task template in a job):

js
httpTask: {
  type: 'http',
  options: {
    // Target request URL
    url: 'xxx',
    auth: {
      // Your user identity
      user: 'yyy',
      password: 'zzz'
    }
  }
}
httpTask: {
  type: 'http',
  options: {
    // Target request URL
    url: 'xxx',
    auth: {
      // Your user identity
      user: 'yyy',
      password: 'zzz'
    }
  }
}

You can also send authentication information as form data like this:

js
auth: {
  // The login URL receiving form data
  url: 'xxx'
  // Your user identity to be send as form data
  form: {
    user: 'yyy',
    password: 'zzz'
  },
  // Set this to enable cookie
  jar: true
}
auth: {
  // The login URL receiving form data
  url: 'xxx'
  // Your user identity to be send as form data
  form: {
    user: 'yyy',
    password: 'zzz'
  },
  // Set this to enable cookie
  jar: true
}

OAuth(options)

Add header with a token retrieved from an OAuth authorization server to HTTP requests, hook options are the following:

  • type: type of authorization used as the key in the header, defaults to Authorization but could be changed to Proxy-Authorization for instance
  • optionsPath: the property path to the request options that contains the authorization options, defaults to options

The authorization options have to be structured like this, e.g. on a task (or similarly on a task template in a job):

js
httpTask: {
  type: 'http',
  options: {
    // Target request URL
    url: 'www',
    oauth: {
      // Token endpoint
      url: 'xxx',
      // Your client identity
      client_id: 'yyy',
      client_secret: 'zzz',
      // Client authentication method to be used to get access token
      method: 'client_secret_post' // Or 'client_secret_basic'
    }
  }
}
httpTask: {
  type: 'http',
  options: {
    // Target request URL
    url: 'www',
    oauth: {
      // Token endpoint
      url: 'xxx',
      // Your client identity
      client_id: 'yyy',
      client_secret: 'zzz',
      // Client authentication method to be used to get access token
      method: 'client_secret_post' // Or 'client_secret_basic'
    }
  }
}

Clearing

source

clearOutputs(options)

Clear output files generated by tasks and hooks, hook options are the following:

  • storePath: see description in common options
  • store: see description in common options
  • type: the type of output to be cleared by this hook, defaults to intermediate

clearData(options)

Clear output data generated by hooks, hook options are the following:

  • dataPath: property path to clean on the hook object, defaults to result.data

TIP

Use this hook if you load large datasets (e.g. JSON files) because all hook data are still referenced in memory until the job is finished

CSV

source

readCSV(options)

Read a CSV from an input stream/store and convert it to in-memory JSON values, hook options are the following:

writeCSV(options)

Generate a CSV file from in-memory JSON values, hook options are the following:

mergeCSV(options)

Generate a CSV file from a set of input CSV files, hook options are the following:

The input hook result is expected to be an array of tasks which output will be read back from the store.

Docker

source

The Docker hooks allows you to interact with a Docker daemon. It is based on dockerode, a Docker remote API.

connectDocker(options)

Connect to the Docker daemon. The connection options of the client are defined in the hook options plus:

  • clientPath: property path where to store the client object to be used by the Docker hooks, defaults to client

disconnectDocker(options)

Disconnect from the Docker daemon. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client

pullDockerImage(options)

Pull a docker image. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client
  • any options supported by dockerode for image pulling
  • clientPath: property path where to retrieve the client object, defaults to client

TIP

options can contain an auth object to pull the image from a private repository.

createDockerContainer(options)

Run a docker container. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client
  • any options supported by dockerode for container creation

TIP

Cmd and Env options can be templates, learn more about templating

createDockerService(options)

Create a docker service on a Swarm cluster. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client
  • any options supported by dockerode for service creation

TIP

Options can be templates, learn more about templating

runDockerContainerCommand(options)

Run a command against a docker container. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client
  • command: the name of the command to be run
  • arguments: the arguments of the command to be run
  • support any command/option supported by dockerode on containers

When the getArchive command is used, additional hook options are the following:

TIP

Cmd, Env and path options can be templates, learn more about templating

TIP

The hook take care to wait for exec to finish and automatically write the tar in the hook store for getArchive

FTP

source

TIP

FTP hooks rely on lftp. Consequently, uou need to have the executable lftp installed on your computer.

listFTP(options)

List the files from a remote directory on the FTP server, hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client
  • remoteDir: the remote directory to list
  • key: see description in common options

globFTP(options)

List the files from a remote directory with names matching a pattern, hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client
  • remoteDir: the remote directory to list
  • key: see description in common options
  • pattern: the pattern use to match the file names, default to *

getFTP(options)

Get a remote file from the FTP server, hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client
  • remoteFile: the file to be copied on the FTP server
  • localFile: the destination file on the local host
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options

putFTP(options)

Put a local file on the FTP server, hook options are the following:

Geographic grid

source

generateGrid(options)

Generate geographic grid parameters from an input location (grid center), area width and resolution. Required input hook data are the following:

  • longitude: grid center longitude (in degrees)
  • latitude: grid center latitude (in degrees)
  • resolution: grid cell resolution (in meters)
  • halfWidth: grid half-width (in meters)
  • blockResolution: grid block resolution (in meters)

Output hook data are the following:

  • origin: grid bounding box minimum longitude and latitude (in degrees)
  • size: number of grid cells in longitude and latitude
  • resolution: grid cell resolution (in degrees) for longitude and latitude
  • nbBlocks: number of grid blocks in longitude and latitude
  • blockSize: number of grid cells in longitude and latitude within each block
  • blockResolution: grid block resolution (in degrees) for longitude and latitude

WARNING

This hook works only for EPSG 4326

generateGridTasks(options)

Generate WMS/WCS request tasks to download data for each cell of a geographic grid (see previous hook for grid definition). It is intended to be used a a job hook and the required input data on the task template are the following:

  • type: task type (either wmsor wcs)
  • options.version: WMS/WCS service version
  • options.longitudeLabel: name of the longitude axis in WCS service
  • options.latitudeLabel: name of the latitude axis in WCS service

WARNING

This hook works only for EPSG 4326

resampleGrid(options)

A lot of geographical data (e.g. weather data) are distributed as gridded data, which is two-dimensional data representing an element value along an evenly spaced matrix of geographical positions. Usually, the grid has a longitude (x-axis or width) and a latitude (y-axis or height) dimension and is computed using the Equirectangular projection with a constant spacing called the resolution of the grid. The gridded data is assumed to be internally stored as a Javascript array (1D).

You can use this hook to compute element value at any location from input gridded data (a process called interpolation) with the following options:

  • input: input grid specification
    • bounds: the geographical bounds covered by the input grid as an array of decimal values [min longitude, min latitude, max longitude, max latitude],
    • origin: the geographical origin of the input data grid as an array of decimal values [longitude origin, latitude origin],
    • size: the size of the input data grid as an array of integer values [width, height],
    • resolution : the geographical resolution of the input data grid as an array of decimal values [longitude resolution, latitude resolution]
  • output: output/resampled grid specification
    • origin: the geographical origin of the data grid as an array of decimal values [longitude origin, latitude origin],
    • size: the size of the data grid as an array of integer values [width, height],
    • resolution: the geographical resolution of the data grid as an array of decimal values [longitude resolution, latitude resolution]

WARNING

The values of the element are assumed to be the one measured at the grid vertices

IMAP

source

TIP

IMAP hooks rely on ImapFlow project.

connectIMAP

Connect to an IMAP server. The connection options of the client are defined in the hook options plus:

  • clientPath: property path where to store the client object created when getting connected, defaults to client.

disconnectIMAP

Disconnect from an IMAP server. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client.

listIMAPMailboxes

List the available mailboxes from an IMAP server. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client.
  • any option supported by ImapFlow list function

fetchIMAPMessages

Fetch messages from an IMAP server. Hook options are the following:

The following example fetches unseen messages:

js
fetchIMAPMessages: {
  mailbox: 'INBOX',
  range: { seen: false },
  query: { uid: true },
  clientPath: 'taskTemplate.imapClient',     
  dataPath: 'data.messages' 
}
fetchIMAPMessages: {
  mailbox: 'INBOX',
  range: { seen: false },
  query: { uid: true },
  clientPath: 'taskTemplate.imapClient',     
  dataPath: 'data.messages' 
}

downloadIMAPAttachments

Download attachments from messages. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client.
  • mailbox: the mailbox where to downlaod the atachments.
  • range: the ImapFlow range parameter
  • type: the attachment content type.
  • any option supported by ImapFlow download function

flagIMAPMessages

Add flags to messages. Hook options are the following:

unflagIMAPMessages

Remove flags from messages. Hook options are the following:

deleteIMAPMessages

Delete messages from an IMAP server. Hook options are the following:

JSON

source

readJson(options)

Read a JSON from an input stream/store and convert it to in-memory JSON values, hook options are the following:

  • objectPath: property path where to read the JSON object in the JSON coming from the store, not defined by default so that the whole JSON is retrieved
  • dataPath: property path where to store the resulting JSON object on the hook object, defaults to result.data
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options
  • transform: perform transformation using these options after read, see description in transformJson
  • features: this boolean indicates if only the features are extracted when reading a GeoJson collection, defaults to false

writeJson(options)

Generate a JSON file from in-memory JSON values, hook options are the following:

  • dataPath: property path where to read the input JSON object on the hook object, defaults to result
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options
  • outputType: the type of output produced by this hook, defaults to intermediate
  • transform: perform transformation using these options before write, see description in transformJson

transformJson(options)

Restructure in-memory JSON values, hook options are the following:

  • dataPath: property path where to read the input JSON object on the hook object, defaults to result.data
  • transformPath: property path where to read/write the JSON part to be transformed in the input JSON object
  • inputPath: property path where to read the JSON part to be transformed in the input JSON object
  • outputPath: property path where to write the JSON part to be transformed in the input JSON object
  • toArray: boolean indicating if the JSON object will be transformed into an array using Lodash, defaults to false
  • toObjects: if your input JSON objects are flat arrays it will be transformed into objects according to the given indexed list of property names to be used as keys, not defined by default
  • filter: a filter to be applied on the JSON object using any option supported by sift
  • mapping: a map between input key path and output key path supporting dot notation, the values of the map can also be a structure like this:
    • path: output key path
    • value: a map between input values and output values
    • delete: boolean indicating if the input key path should be deleted or not after mapping
  • unitMapping: a map between input key path supporting dot notation and from/to units to convert using math.js for numbers or moment.js for dates, a value of the map is a structure like this:
    • from: the unit or date format to convert from, e.g. feet or YYYY-MM-DD HH:mm:ss.SSS
    • to: the unit or date format to convert to, e.g. m or MM-DD-YYYY HH:mm:ss.SSS, if given for a date the date object will be converted back to string
    • asDate: mandatory to indicate if the value is a date, could be utc or local to interpret it as UTC or Local Time
    • asString: mandatory to convert numbers to strings, indicates the radix to be used if any
    • asNumber: mandatory to convert strings to numbers
    • asCase: target case to be used as the name of a Lodash (e.g. lowerCase) or JS string (e.g. toUpperCase) case conversion function
    • empty: value to be set if the input value is empty
  • pick: an array of properties to be picked using Lodash
  • omit: an array of properties to be omitted using Lodash
  • merge: an object to be merged with each JSON objects using Lodash
  • asObject: this boolean indicates if the output should be transformed into an object if the array contains a single object, defaults to false
  • asArray: this boolean indicates if the output should be transformed into an array containing the object, defaults to false
  • inPlace: this boolean indicates if the input data is transformed in place or simply before writing it when part of a write hook, defaults to true

Example:

js
toArray: true, // The following input object { 1: { property: 'a' }, 2: { property: 'b' } } will be transformed into [{ property: 'a' }, { property: 'b' }]
toObjects: ['1', '2'], // The following input object ['a', 'b'] will be transformed into { 1: 'a', 2: 'b' }
mapping: {
  sourceProperty: 'targetProperty',
  sourceProperty: {
    path: 'targetProperty',
    values: {
      'a': 'c' // Will map { xxx: 'a' } to { yyy: 'c' }
    }
  },
  'source.property': 'target.property',
  sourceProperty: 'targetArrayProperty[0]'
},
unitMapping: {
  property: { from: 'feet', to: 'm' } // This one will be converted from feet to meters
},
pick: ['onlyThisPropertyWillBeKept'],
omit: ['onlyThisPropertyWillBeRemoved'],
merge: { newProperty: 'will be added to the final objects' }
toArray: true, // The following input object { 1: { property: 'a' }, 2: { property: 'b' } } will be transformed into [{ property: 'a' }, { property: 'b' }]
toObjects: ['1', '2'], // The following input object ['a', 'b'] will be transformed into { 1: 'a', 2: 'b' }
mapping: {
  sourceProperty: 'targetProperty',
  sourceProperty: {
    path: 'targetProperty',
    values: {
      'a': 'c' // Will map { xxx: 'a' } to { yyy: 'c' }
    }
  },
  'source.property': 'target.property',
  sourceProperty: 'targetArrayProperty[0]'
},
unitMapping: {
  property: { from: 'feet', to: 'm' } // This one will be converted from feet to meters
},
pick: ['onlyThisPropertyWillBeKept'],
omit: ['onlyThisPropertyWillBeRemoved'],
merge: { newProperty: 'will be added to the final objects' }

TIP

The transformations are applied in the order of the documentation, e.g. filtering occurs before mapping.

mergeJson(options)

Generate a in-memory JSON object from a set of input in-memory JSON objects, hook options are the following:

  • mergeBy: property name to be used as a unique identifier to perform merging using Lodash, it can also be a function returning a unique identifier
  • deep: this boolean indicates if properties from multiple objects with the same identifier are merged, otherwise only the first object matching the merge condition will be kept, defaults to false
  • sortBy: property name to be used as value for sorting items prior merging using Lodash, it can also be a function returning a unique identifier
  • transform: perform transformation using these options before deep merging objects with the same identifier, see description in transformJson
  • dataPath: property path where to read the input JSON object on the result hook objects, defaults to data

The input hook result is expected to be an array of tasks which output will be read in-memory.

writeTemplate(options)

Generate a file from an input template and injected in-memory JSON values, hook options are the following:

  • dataPath: property path where to read the input JSON object on the hook object, defaults to result.data
  • storePath: see description in common options
  • store: see description in common options
  • templateStorePath: property path where to read the store to be used for reading template on the hook object or params, defaults to data.templateStore
  • templateStore: property containing the ID of the store to be used for reading template, not defined by default
  • templateFile: file name of the template file to be used
  • outputType: the type of output produced by this hook, defaults to intermediate

TIP

Learn more about templating

GeoJSON

source

readSequentialGeoJson(options)

Read a JSON from an input stream/store and convert it to in-memory JSON values, hook options are the following:

  • dataPath: property path where to store the resulting JSON object on the hook object, defaults to result.data
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options
  • transform: perform transformation using these options after read, see description in transformJson
  • asFeatureCollection: this boolean indicates if the resulting JSON object is converted to a GeoJson collection, otherwise it will be an array of GeoJSON features, defaults to false

convertToGeoJson(options)

Convert in-memory JSON values to a GeoJSON collection. For each in-memory object, the hook generates a corresponding GeoJSON feature using specific properties to build the geometry property. For now, it only allows to generate features of type of Point. Moreover, the coordinate reference system is a geographic coordinate reference system, using the World Geodetic System 1984 (WGS 84), with longitude and latitude expressed in decimal degrees. The entire object is stored under the properties property of the feature. Hook options are the following:

  • longitude: property path where to read the longitude value defaults to longitude
  • latitude: property path where to read the latitude value defaults to latitude
  • altitude: property path where to read the altitude value defaults to altitude
  • keepGeometryProperties: boolean indicating if longitude, latitude and altitude values are also kept as properties, defaults to true

convertOSMToGeoJson(options)

Convert in-memory OSM JSON values to a GeoJSON collection. It relies on osmtogeojson. Hook options are the following:

  • any option supported by the osmtogeojson API
  • dataPath: property path where to read the OSM object, defaults to result.data

reprojectGeoJson(options)

Reproject a GeoJSON from a given projection system to another one, hook options are the following:

  • from: EPSG code of the input projection, defaults to EPSG:4326
  • to: EPSG code of the output projection, defaults to EPSG:4326
  • dataPath: property path where to store the resulting GeoJSON object on the hook object, defaults to result.data

KML

source

readKML(options)

Read a KML from an input stream/store and convert it to in-memory JSON values, hook options are the following:

  • objectPath: property path where to read the KML object in the KML coming from the store, not defined by default so that the whole KML is retrieved
  • dataPath: property path where to store the resulting JSON object on the hook object, defaults to result.data
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options
  • transform: perform transformation using these options after read, see description in transformJson
  • features: this boolean indicates if only the features are extracted when reading a GeoJson collection, defaults to false

MongoDB

source

connectMongo(options)

Connect to a MongoDB database. The connection options of the client are defined in the hook options plus:

  • url: MongoDB URI connection string
  • dbName: the name of the DB to connect to
  • clientPath: property path where to store the client object to be used by the MongoDB hooks, defaults to client

TIP

Since Krawler relies on the version 3.1.13 of the MongoDB driver, it automatically adds the option useNewUrlParser: true when connecting to the database.

disconnectMongo(options)

Disconnect from a MongoDB database. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client

dropMongoCollection(options)

Drop if exists a collection in a MongoDB database. Hook options are the following:

  • collection: the name of the collection to be removed, defaults to the hook object ID
  • clientPath: property path where to retrieve the client object, defaults to client

createMongoCollection(options)

Create a collection in a MongoDB database. Hook options are the following:

  • collection: the name of the collection to be created, defaults to the hook object ID
  • index/indices: the specification of the index associated to the collection, uses an array as indices if multiple indices are provided
  • clientPath: property path where to retrieve the client object, defaults to client

dropMongoIndex(options)

Drop if exists a collection index in a MongoDB database. Hook options are the following:

  • collection: the name of the collection to be removed, defaults to the hook object ID
  • index: the specification of the index associated to the collection
  • clientPath: property path where to retrieve the client object, defaults to client

createMongoIndex(options)

Create a collection index in a MongoDB database. Hook options are the following:

  • collection: the name of the collection to be created, defaults to the hook object ID
  • index: the specification of the index associated to the collection
  • clientPath: property path where to retrieve the client object, defaults to client

readMongoCollection(options)

Read JSON documents from an existing collection. Hook options are the following:

  • collection: the name of the collection to be read, defaults to the hook object ID
  • dataPath: property path where to write the output JSON objects on the hook object, defaults to data.result
  • clientPath: property path where to retrieve the client object, defaults to client
  • transform: perform transformation using these options after read, see description in transformJson
  • query: find query to be performed, fields can be templates, learn more about templating
  • excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number
  • project: project options for cursor
  • sort: sort options for cursor
  • skip: skip options for cursor
  • limit: limit options for cursor

WARNING

Due to templating restricted to string output any ISO date string or comparison operator value in the query object will be automatically converted back to native types so that matching will work as expected in JS

writeMongoCollection(options)

Inserts JSON into an existing collection (uses insertOne operations under-the-hood). Hook options are the following:

  • collection: the name of the collection to be written, defaults to the hook object ID
  • dataPath: property path where to read the input JSON object on the hook object, defaults to data.result
  • chunkSize: number of GeoJson features for the batch insert
  • clientPath: property path where to retrieve the client object, defaults to client
  • transform: perform transformation using these options before write, see description in transformJson
  • any option supported by options argument of the bulkWrite function.

TIP

If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.

updateMongoCollection(options)

Updates JSON into an existing collection (uses updateOne operations under-the-hood). Hook options are the following:

  • collection: the name of the collection to be written, defaults to the hook object ID
  • dataPath: property path where to read the input JSON object on the hook object, defaults to data.result
  • chunkSize: number of GeoJson features for the batch insert
  • clientPath: property path where to retrieve the client object, defaults to client
  • transform: perform transformation using these options before update, see description in transformJson
  • filter/upsert/hint: corresponding option for updateOne operation, filter fields can be templates, learn more about templating
  • excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number
  • any option supported by options argument of the bulkWrite function.

TIP

If the input data is a GeoJSON collection the array of features will be updated into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.

deleteMongoCollection(options)

Removes documents from an existing collection (uses deleteMany operations under-the-hood). Hook options are the following:

  • collection: the name of the collection to remove documents from, defaults to the hook object ID
  • filter: deletion criteria for deleteMany operation, fields can be templates, learn more about templating
  • excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number

createMongoAggregation(options)

Creates an aggregation pipeline on an existing collection. Hook options are the following:

  • collection: the name of the collection to used, defaults to the hook object ID
  • dataPath: property path where to store the result of the aggregation, defaults to data.result
  • clientPath: property path where to retrieve the client object, defaults to client
  • transform: perform transformation using these options before write, see description in transformJson
  • pipeline: the aggregation pipeline to be executed
  • any option supported by options argument of the aggregate function.

TIP

If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.

dropMongoBucket(options)

Drop if exists a bucket in a MongoDB database. Hook options are the following:

  • bucket: the name of the bucket to be removed, defaults to the hook object ID
  • clientPath: property path where to retrieve the client object, defaults to client

createMongoBucket(options)

Create a bucket in a MongoDB database. Hook options are the following:

  • bucket: the name of the bucket to be created, defaults to the hook object ID
  • clientPath: property path where to retrieve the client object, defaults to client

readMongoBucket(options)

Read file from an existing bucket. Hook options are the following:

  • bucket: the name of the bucket to be read, defaults to the hook object ID
  • storePath: see description in common options, specify store to write file to
  • store: see description in common options, specify store to write file to
  • key: see description in common options, defaults to the hook object ID

writeMongoBucket(options)

Insert file into an existing bucket. Hook options are the following:

  • bucket: the name of the bucket to be written, defaults to the hook object ID
  • storePath: see description in common options, specify store to read file from
  • store: see description in common options, specify store to read file from
  • key: see description in common options, defaults to the hook object ID

If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections

Feathers

source

connectFeathers(options)

Connect to a Feathers API. The connection options of the client are defined in the hook options plus:

  • distributed: Boolean indicating if the target service is retrieved using distribution (you will need to set the distribution job options and CLI api option), in this case you don't need the others properties
  • origin: Feathers connection URL
  • path: the Feathers API path prefix if any
  • authentication: the Feathers API authentication options if any (including service path)
  • clientPath: property path where to store the client object to be used by the Feathers hooks, defaults to client

TIP

Krawler uses on the version 5 of the Feathers client.

disconnectFeathers(options)

Disconnect from a Feathers API. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client

callFeathersServiceMethod(options)

Performs a service operation using the API. Hook options are the following:

  • service: the name of the service to be used, defaults to the hook object ID
  • method: the name of the method to be called, defaults to find
  • id: the ID of the item to read/write, defaults to item ID
  • data: the data payload of the operation, if not given will be hook item data
  • dataPath: property path where to read/write the input/output JSON objects on the hook object, defaults to data.result
  • chunkSize: number of item for a multi operation
  • clientPath: property path where to retrieve the client object, defaults to client
  • transform: perform transformation using these options after/before read/write, see description in transformJson
  • query: operation query to be performed (use only if not giving the whole params object), fields can be templates, learn more about templating
  • excludedProperties: array of property names to be excluded from automated type conversion after templating, useful if you have a number-like string (eg '81') that you don't want to convert automatically into a number
  • params: operation params to be used, fields can be templates, learn more about templating
  • updateResult: if true service operation results will not replace item data (default for read operations)

WARNING

Due to templating restricted to string output any ISO date string or comparison operator value in the query object will be automatically converted back to native types so that matching will work as expected in JS

writeMongoCollection(options)

Inserts JSON into an existing collection (uses insertOne operations under-the-hood). Hook options are the following:

  • collection: the name of the collection to be written, defaults to the hook object ID
  • dataPath: property path where to read the input JSON object on the hook object, defaults to data.result
  • chunkSize: number of GeoJson features for the batch insert
  • clientPath: property path where to retrieve the client object, defaults to client
  • transform: perform transformation using these options before write, see description in transformJson
  • any option supported by options argument of the bulkWrite function.

TIP

If the input data is a GeoJSON collection the array of features will be pushed into the collection not the root object, this is to conform with MongoDB geospatial capabilities that can not handle recursive collections.

Numerical Weather Prediction

source

Numerical Weather Prediction (NWP) data are now available from the major meteorological agencies and institutions on a day-to-day basis. These hooks aim at gathering weather forecast data generated by forecast models easily.

Each forecast model output hundreds of forecast elements (a.k.a. meteorological elements) such as temperature, wind direction, etc. The production of a set of forecast data is called a run of the model and occurs on a regular daily basis, e.g. every 6 hours. The spatial properties of a model are completely defined by a longitude/latitude grid and a set of altitude levels (meter or pressure scale). The temporal properties are defined by interval values describing at which frequency/time the forecast data are produced (a.k.a. run interval) and which time steps are available (a.k.a. forecast interval).

generateNwpTasks(options)

Generate tasks to download data for each variable. It is intended to be used a job hook and the required hook options (can be overriden by input data) are the following:

  • elements: the array of meteorological elements to be retrieved
  • runInterval: the run interval in seconds
  • runIndex: the index of the run to be retrieved, 0 means nearest from current time, -1 the previous one, etc.
  • interval: the forecast interval in seconds
  • lowerLimit: the lowest offset in seconds from which forecast data are retrieved (e.g. 3600 means we start gathering at T0 + 1h)
  • upperLimit: the highest offset in seconds at which forecast data are not retrieved (e.g. 10800 means we stop gathering at T0 + 3h)

A task will be generated for each element, level and gathered forecast time with the following properties: level, runTime, forecastTime, timeOffset.

TIP

This hook is intended to work with task templating to generate the actual download tasks (e.g. HTTP or WCS request)

OGC

source

getCapabilities(options)

Execute a GetCapabilties request to get the general information about an OGC service such as WMS, WCS, WPS... Hook options are the following:

  • url: the bas url of the request to be executed
  • service: the service to request
  • token: an access token if required by the server
  • dataPath: property path where to store the resulting JSON object on the hook object, defaults to result.data

The following example illustrates how to use this hook:

getCapabilities: {
  url: 'http://geoserver.kalisio.xyz/geoserver/Kalisio/wms',
  service: 'WMS'
}
getCapabilities: {
  url: 'http://geoserver.kalisio.xyz/geoserver/Kalisio/wms',
  service: 'WMS'
}

PostgreSQL

source

connectPG(options)

Connect to a PostgreSQL database. The connection options of the client are defined in the hook options plus:

  • clientPath: property path where to store the client object to be used by the PostgreSQL hooks, defaults to client

Also, this hook allows you to use the same environment variables as node-postgres to store the connection information:

  • PGUSER=dbuser
  • PGPASSWORD=secretpassword
  • PGHOST=database.server.com
  • PGPORT=5432
  • PGDATABASE=database

Finaly and for some security reason, it is highly recommended to combine both ways such as in the following example:

connectPG: {
  user: process.env.PG_USER,
  password: process.env.PG_PASSWORD,
  host: 'localhost',
  database: 'test',
  port: 5432,
  clientPath: 'taskTemplate.client'
}
connectPG: {
  user: process.env.PG_USER,
  password: process.env.PG_PASSWORD,
  host: 'localhost',
  database: 'test',
  port: 5432,
  clientPath: 'taskTemplate.client'
}

disconnectPG(options)

Disconnect from a PostgresSQL database. Hook options are the following:

  • clientPath: property path where to retrieve the client object, defaults to client

dropPGTable(options)

Drop if exists a table in a PostgreSQL database. Hook options are the following:

  • table: the name of the table to be removed, defaults to the hook object ID
  • clientPath: property path where to retrieve the client object, defaults to client

createPGTable(options)

Create a table in a PostgreSQL database with the following structure:

  • id: a SERIAL (primary key)
  • geom: a PostGIS geometry of type of POINTZ expressed in Geodetic reference system.
  • properties: an object of type of JSON.

For now the structure has been defined to store GeoJSON collection. Hook options are the following:

  • table: the name of the table to be created, defaults to the hook object ID
  • clientPath: property path where to retrieve the client object, defaults to client

writePGTable(options)

Inserts a GeoJSON collection or an array of features into an existing table. THe table must have the same structured as a table created using the createPGTable hook. Hook options are the following:

  • dataPath: property path where to read the input JSON object on the hook object, defaults to data.result
  • chunkSize: number of GeoJson features for the batch insert
  • clientPath: property path where to retrieve the client object, defaults to client

Raster

source

readGeoTiff(options)

Read a GeoTiff from an input stream/store and convert it to in-memory JSON values, hook options are the following:

  • dataPath: property path where to store the resulting JSON object on the hook object, defaults to result.data
  • fields: set of fields to be exported for each cell, if empty values will be directly exported as a JSON array, otherwise fields among the following can be selected
    • x: pixel x-coordinate
    • y: pixel y-coordinate
    • bbox: pixel bounding box
    • value: pixel value

computeStatistics(options)

Computes minimum and maximum values on a GeoTiff file, hook options are the following:

  • min: boolean indicating if minimum value should be computed
  • max: boolean indicating if maximum value should be computed
  • statisticsPath: property path where to write the output statistics on the hook object, defaults to result

Store

source

createStores(options)

Create (a set of) store(s), hook options are the (array of) following the following:

  • any option supported by the stores service
  • storePath: property path where to set the created store on the hook object, if not given the store will be created through the service but not stored on the hook

removeStores(options)

Remove (a set of) store(s), hook options are (array of) the following:

  • id: the store ID
  • storePath: property path where to unset the removed store on the hook object, if not given the store will be removed through service but not on the hook

TIP

As a shortcut the options provided can only be store IDs when storePath is not used

discardIfExistsInStore(options)

Discard the task if a target file already exists in an output store, hook options are the following:

copyToStore(options)

Copy the item(s) from an input store to an output store, hook options are the following:

gzipToStore(options)

Gzip the item(s) from an input store to an output store, hook options are the following:

gunzipFromStore(options)

Gunzip the item(s) from an input store to an output store, hook options are the following:

unzipFromStore(options)

Unzip the item(s) from an input store to an output store, hook options are the following:

  • input: the input store options, see description in common options
  • output: the output store options, see description in common options
    • path: the output path in output store

System

source

tar(options)

Tar files or directories using node-tar, hook options are the following:

  • files: array of paths to add to the tarball
  • any option supported by node-tar for packing

TIP

file, files and cwd options can be templates, learn more about templating

untar(options)

Untar files or directories using node-tar, hook options are the following:

  • files: array of paths to extract from the tarball
  • any option supported by node-tar for unpacking

TIP

file, files and cwd options can be templates, learn more about templating

runCommand(options)

Run a system command. Hook options are the following:

  • command: the template of the command to be run with the hook object as context (could be an array commands for a sequence)
  • spawn: true to use child_process.spawn instead of child_process.exec (default) to run the command(s), in that case a command is given as an array of args instead of a single string
  • stdout: boolean indicating if stdout is logged and stored in the hook object
  • stderr: boolean indicating if stderr is logged and stored in the hook object

TIP

Learn more about templating

envsubst(options)

Provides file-level environment variable substitution. Hook options are the following:

  • templateFile: the file to apply the substitution
  • outputFile: the resulting file
  • any option supported by envusb for substituting

TXT

source

readTXT(options)

Read a TXT from an input stream/store and convert it to in-memory JSON values, hook options are the following:

  • objectPath: property path where to read the KML object in the KML coming from the store, not defined by default so that the whole KML is retrieved
  • dataPath: property path where to store the resulting JSON object on the hook object, defaults to result.data
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options
  • transform: perform transformation using these options after read, see description in transformJson

Utils

source

generateId(options)

Generate a UUID (V1) for the item using node-uuid.

template(options)

Perform templating of the options using the item as context and merge it with item.

discardIf(options)

Discard all subsequent hooks and task if the input data passes the given match filter options, filter options are similar to the match filter described in common options.

apply(options)

Apply a given function to the hook item(s), hook options are the following:

  • function: a function taking the hook item(s) as input and updating it (can be async)

healthcheck(options)

Apply a given function to the hook item(s) and healthcheck structure, hook options are the following:

  • function: a function taking the hook item(s) and healthcheck structure as input and updating it

addOutputs(outputs)

Declare a new output for the job/task, hook options is an array of objects with the following properties:

  • name: the name of the output
  • type: the type of the output (defaults to intermediate so that it will be cleaned)

Tasks and write hooks automatically track generated outputs but sometimes outputs are generated by an external process (eg. command hook) so that you need to declare it in order to properly clean it with the clearOutputs hook.

runTask(options)

Run a given task, hook options are those of a task.

emitEvent(options)

Emit a 'krawler' event on the underlying service, hook options are the following:

  • type: the custom type of the event to be emitted
  • any transformation option, see description in transformJson, the transformed object will be used as event payload in the data field

XML

source

readXML(options)

Read an XML file from a store and convert it to in-memory JSON values, hook options are the following:

YAML

source

readYAML(options)

Read a YAML file from a store and convert it to in-memory JSON values, hook options are the following:

  • dataPath: property path where to store the resulting JSON object on the hook object, defaults to result.data
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options

writeYAML(options)

Generate a YAML file from in-memory JSON values, hook options are the following:

  • dataPath: property path where to read the input JSON object on the hook object, defaults to result
  • storePath: see description in common options
  • store: see description in common options
  • key: see description in common options
  • outputType: the type of output produced by this hook, defaults to intermediate
  • any option supported by js-yaml