2.8.5
Unified place to Parse the options.
The argument passing has the following priority:
options.argv passed as parameter.
Arguments passed to the command line.
Options inside a JSON config file. Must be passed in 2. or
--config-file FILENAME
.(Object
= {}
)
[{}]
- A dictionary with options
Name | Description |
---|---|
options.argv Array
|
An array of arguments. Defaults to
the command-line arguments passed to the process.
|
Creates the hatch and parses arguments.
(string)
A name for the ingester
(string)
ISO 3166-1 alpha-2 two-letter language code, for
use in generating spelling and stopwords dictionaries for search
(Object)
[{}]
- A dictionary with options. For a
list of available options see
config.parse_options
(any
= {}
)
(any
= assetVerifier
)
20-byte hex ID of the document
Set multiple metadata at once.
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.title string
|
See BaseAsset#set_title . |
metadata.synopsis string
|
See BaseAsset#set_synopsis . |
metadata.thumbnail ImageAsset
|
See BaseAsset#set_thumbnail . |
metadata.canonical_uri string
|
See BaseAsset#set_canonical_uri . |
metadata.last_modified_date (Date | number | string)
|
See BaseAsset#set_last_modified_date . |
metadata.date_published (Date | number | string)
|
See BaseAsset#set_date_published . |
metadata.sequence_number number
|
See BaseAsset#set_sequence_number . |
metadata.license string
|
See BaseAsset#set_license . |
metadata.can_export boolean
|
See BaseAsset#set_can_export . |
metadata.can_print boolean
|
See BaseAsset#set_can_print . |
metadata.tags Array<string>
|
See BaseAsset#set_tags . |
Sets an image asset to be the article's thumbnail. (An image asset is returned e.g. from util.download_img or util.download_image.)
(ImageAsset)
Image
Pass the date that the post or article was published. You can pass a
Date object here. If it is not a Date object, then new Date()
will be
called on it, so you can also pass e.g. a timestamp.
Pass in an array of ID strings for "sets" (categories, tags). The ID string is not the human-readable string (though it's possible for the two to be identical); you can create a human-readable name for the set in your app using other tools.
Compare NewsArticle#set_section.
Extends BaseAsset
(any
= {}
)
Set multiple metadata at once.
Extends BaseAsset#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.author string
|
See BlogArticle#set_author . |
metadata.main_image ImageAsset
|
See BlogArticle#set_main_image . |
metadata.main_image_caption string
|
See BlogArticle#set_main_image_caption . |
metadata.body (string | Cheerio)
|
See BlogArticle#set_body . |
metadata.custom_scss string
|
See BlogArticle#set_custom_scss . |
metadata.read_more_text string
|
See BlogArticle#set_read_more_text . |
metadata.as_static_page boolean
|
See BlogArticle#set_as_static_page . |
Pass in the name of the post author here. (The Blog article format currently only is designed to handle one author.) For the News article format, see NewsArticle#set_authors.
(string)
The author of the blog post
Marks an image as the "main" image of the post, which is marked up specially. (Make sure to remove this image from the HTML that you pass to NewsArticle#set_body, otherwise it will be present twice in the rendered post.)
(ImageAsset)
Main image of the article
Pass the body of the article as HTML here. Note that this will be
rendered mostly unchanged, inside an <article>
element. The ingester
itself is responsible for cleaning up this HTML such that the stylesheet
applies cleanly to it.
If you use BlogArticle#set_main_image, make sure to remove the main image from this HTML.
The ingester is responsible for passing cleaned-up HTML that the stylesheet's rules will apply to. In many cases this will not be necessary, but in other cases quite a lot of cleanup might be needed. Some hints to take into account:
<figure>
elements, not paragraphs<figcaption>
elements<blockquote>
elements, properly formatted
inside with <p>
(and <cite>
if applicable)<p>
elements, not denoted with <br>
linebreaks((string | Cheerio))
Body HTML of article
Here is where to customize the stylesheet. Most customizations should be able to be accomplished just by tweaking some SCSS variables which the stylesheet exposes. Here is a list of the variables:
$primary-light-color
$primary-dark-color
$accent-light-color
$accent-dark-color
$background-light-color
$background-dark-color
$title-font
$body-font
$context-font
$support-font
The default stylesheet is included with @import '_default';
. (Note
that you can also leave out this import and start from a blank
stylesheet to completely do your own thing.) If you are adding rules as
well as customizing variables, make sure that the variables are
specified before the import, and the rules are specified after it.
(string)
A string containing SCSS code
Sets the text rendered at the bottom of the post. Unlike NewsArticle#set_read_more_link, this takes text rather than HTML, because the blog article format is a bit more strict about what it renders there. An example might be "Original article at Planet GNOME" which is turned into a link during rendering.
(string)
Text for the post bottom
Pass in an array of ID strings for "sets" (categories, tags). The ID string is not the human-readable string (though it's possible for the two to be identical); you can create human-readable name for the set in your app using other tools. (Compare NewsArticle#set_section.)
Marks the page as a "static" page so that it will show up on the main menu of an app. Use this for e.g. "About the author" or "FAQ" pages.
Takes the metadata that has been set on this asset so far, and renders an HTML page internally in preparation for saving into a hatch.
You must call render()
before passing this asset to
Hatch#save_asset.
No other methods should be called on the asset after calling render()
.
Set multiple metadata at once.
Extends BaseAsset#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.word string
|
See DictionaryAsset#set_word . |
metadata.definition string
|
See DictionaryAsset#set_definition . |
metadata.part_of_speech string
|
See DictionaryAsset#set_part_of_speech . |
metadata.source string
|
See DictionaryAsset#set_source . |
metadata.body (string | Cheerio)
|
See DictionaryAsset#set_body . |
metadata.custom_scss string
|
See DictionaryAsset#set_custom_scss . |
metadata.main_image ImageAsset
|
See DictionaryAsset#set_main_image . |
Set multiple metadata at once.
Extends BlogArticle#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.temporal_coverage string
|
See GalleryImageArticle#set_temporal_coverage . |
Set multiple metadata at once.
Extends BlogArticle#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.main_video string
|
See GalleryVideoArticle#set_main_video . |
Extends BaseAsset
(any
= {}
)
Set multiple metadata at once.
Extends BaseAsset#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.image_data (Buffer | Promise<Buffer>)
|
See ImageAsset#set_image_data . |
Extends BaseAsset
(any
= {}
)
Set multiple metadata at once.
Extends BaseAsset#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.authors (string | Array<string>)
|
See NewsArticle#set_authors . |
metadata.main_image Object
|
See NewsArticle#set_main_image . |
metadata.section string
|
See NewsArticle#set_section . |
metadata.as_static_page boolean
|
See NewsArticle#set_as_static_page . |
metadata.source string
|
See NewsArticle#set_source . |
metadata.lede (string | Cheerio)
|
See NewsArticle#set_lede . |
metadata.body (string | Cheerio)
|
See NewsArticle#set_body . |
metadata.custom_scss string
|
See NewsArticle#set_custom_scss . |
metadata.read_more_link (string | Cheerio)
|
See NewsArticle#set_read_more_link . |
Pass an array of names of article authors. If value
is a string
instead of an array, it is assumed there is only one author. For the
blog article format, see BlogArticle#set_author.
Marks an image as the "main" image of the article, which is marked up specially. Includes an optional caption. Compare BlogArticle.set_main_image and BlogArticle.set_main_image_caption.
(Make sure to remove this image from the HTML that you pass to NewsArticle#set_body, otherwise it will be present twice in the rendered post.)
(ImageAsset)
The main image
((string | Cheerio)?)
HTML for the main image's caption
Pass in the ID string of a "set" (category, tag). This is not the human-readable string (though it's possible for the two to be identical); you can create human-readable name for the set in your app using other tools.
(Compare BlogArticle#set_tags for the blog article format; news articles are usually in one section of the news site at a time.)
(string)
The ID string of the set to give this article
Marks the page as a "static" page so that it will show up on the main menu of an app. Use this for e.g. "About the author" or "FAQ" pages.
Sets the lede (opening paragraph) of the news article. This paragraph is marked up specially.
(Make sure to remove this paragraph from the HTML that you pass to NewsArticle#set_body, otherwise it will be present twice in the rendered article.)
((string | Cheerio))
Opening paragraph of article
Pass the body of the article here. Note that this will be rendered
mostly unchanged, inside an <article>
element. The ingester itself is
responsible for cleaning up this HTML such that the stylesheet applies
cleanly to it.
Note that if you use NewsArticle#set_lede, you should make sure
the lede is not included in value
, and likewise for
NewsArticle#set_main_image.
The ingester is responsible for passing cleaned-up HTML that the stylesheet's rules will apply to. In many cases this will not be necessary, but in other cases quite a lot of cleanup might be needed. Some hints to take into account:
<figure>
elements, not paragraphs<figcaption>
elements<blockquote>
elements, properly formatted
inside with <p>
(and <cite>
if applicable)<p>
elements, not denoted with <br>
linebreaks((string | Cheerio))
Body HTML of article
Here is where to customize the stylesheet. Most customizations should be able to be accomplished just by tweaking some SCSS variables which the stylesheet exposes. Here is a list of the variables:
$primary-light-color
$primary-dark-color
$accent-light-color
$accent-dark-color
$background-light-color
$background-dark-color
$title-font
$body-font
$context-font
$support-font
The default stylesheet is included with @import '_default';
. (Note
that you can also leave out this import and start from a blank
stylesheet to completely do your own thing.) If you are adding rules as
well as customizing variables, make sure that the variables are
specified before the import, and the rules are specified after it.
(string)
A string containing SCSS code
Sets the HTML rendered at the bottom of the article, which can include a link back to the original source. An example might be:
Read more at <a href="http://...">Planet GNOME</a>
Use this for crediting original sources, for example.
((string | Cheerio))
HTML for the link
Takes the metadata that has been set on this asset so far, and renders an HTML page internally in preparation for saving into a hatch.
You must call render()
before passing this asset to
Hatch#save_asset.
No other methods should be called on the asset after calling render()
.
Extends BaseAsset
(any
= {}
)
Extends BaseAsset
(any
= {}
)
Set multiple metadata at once.
Extends BaseAsset#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.download_uri string
|
See VideoAsset#set_download_uri . |
A simple wrapper over VideoAsset that should be used for toplevel videos.
This asset type is for videos that aren't embedded as part of another piece of content, but should instead be directly accessible from content listings and search results. For instance, if you were building an app where the only content was videos and those videos required no other context other than some displayable metadata such as the authors and a brief synopsis, then you should use this asset type.
In the past, developers used GalleryVideoArticle, which wrapped the video in another HTML page. That asset type is now deprecated since apps are capable of displaying the video and its corresponding metadata correctly without the need for a page in between.
Extends VideoAsset
Set multiple metadata at once.
Extends VideoAsset#set_metadata
(Object)
The metadata can be:
Name | Description |
---|---|
metadata.author string
|
See VideoArticle#set_author . |
Returns a DOM object and a NewsArticle corresponding to the URI provided. If fetching the DOM fails, the NewsArticle is marked as failed in the hatch.
(Hatch)
If the asset fails it will be saved as
failed in this Hatch.
(string)
URI to load
(string?)
A text encoding, overriding the one
specified by @uri
Promise<Object>
:
a DOM object constructed from the HTML
at @uri, and a
NewsArticle
Returns a DOM object and a BlogArticle corresponding to the URI provided. If fetching the DOM fails, the BlogArticle is marked as failed in the hatch.
(Hatch)
If the asset fails it will be saved as
failed in this Hatch.
(string)
URI to load
(string?)
A text encoding, overriding the one
specified by @uri
Promise<Object>
:
a DOM object constructed from the HTML
at @uri, and a
BlogArticle
(Cheerio)
DOM element to replace with a placeholder for
the video to be downloaded later
(string)
URI from which to download the video
VideoAsset
:
(string)
URI of an image
ImageAsset
:
Downloads a thumbnail image for the Youtube video specified by $embedUrl
.
(string)
Youtube embed video URL
ImageAsset
:
Downloads an image specified by the src
attribute of an <img>
element
represented by $img
. It modifies the <img>
element to associate it with
the image asset, which will be used later on in the ingestion process.
This is a convenience function which saves you from having to extract the image URI yourself and pass it to download_image.
Pass baseUri
to handle relative links in the src
attribute.
(Cheerio)
DOM tree of an
<img>
element
(string
= ''
)
Base URI of the document
ImageAsset
:
Get parsed settings to limit the ingested items.
The maxItems
and maxDaysOld
parameters can be overridden by the
LIBINGESTER_MAX_ITEMS
and LIBINGESTER_MAX_DAYS_OLD
environment
variables, respectively.
(number
= Infinity
)
Maximum number of entries to ingest
(number
= 1
)
Maximum age of entries to ingest, in days
Object
:
Object with parsed settings
max_items
and
oldest_date
A utility for fetching entries from an RSS feed for further processing.
The max_items
and max_days_old
parameters can be overridden by the
LIBINGESTER_MAX_ITEMS
and LIBINGESTER_MAX_DAYS_OLD
environment
variables, respectively.
(number
= Infinity
)
Maximum number of entries to return
(number
= 1
)
Maximum age of entries returned, in days
(Object
= {}
)
Options to pass to the RSS request
Promise<Array<Object>>
:
Array of RSS feed entries
A utility to create a pagination function for Wordpress-generated feeds, since they are so common. Make sure to verify that your feed is generated by Wordpress before using this!
function
:
A paginator function that can be passed to
util.fetch_rss_entries
Utility to extend utilcleanup_body defaults.
Name | Description |
---|---|
options.remove any
(default [] )
|
|
options.removeNoText any
(default [] )
|
|
options.removeData any
(default [] )
|
|
options.removeAttrs any
(default {} )
|
|
options.noComments any
|
Object
:
This is a merge of the
utilCLEANUP_DEFAULTS
and the passed options.
Utility to remove the article body.
The cleanup should be among the last operations done with the body. For example, if your ingester tries to make Video assets from iframes, the cleanup utility removes iframe elements by default.
(Cheerio)
The Cheerio object that will be cleaned
up. Is called body because it usually represents an article body.
(Object
= {}
)
Options to configure the cleanup, by
default
utilCLEANUP_DEFAULTS
is used. You can expand or
override the defaults with
utilextend_cleanup_defaults
.
Name | Description |
---|---|
options.remove Array
(default CLEANUP_DEFAULTS.remove )
|
Array of CSS selectors. The elements matching the selectors will be removed. |
options.removeNoText Array
(default CLEANUP_DEFAULTS.removeNoText )
|
Array of CSS selectors. If the elements matching the selectors don’t have text inside, they will be removed. |
options.removeData Array
(default CLEANUP_DEFAULTS.removeData )
|
Array of CSS selectors. Any custom data attribute will be removed from the matching elements. |
options.removeAttrs Object<string, Array>
(default CLEANUP_DEFAULTS.removeAttrs )
|
Use this to remove attributes from elements. ‘removeAttrs’ is an Object with CSS selectors as keys and an Array of attributes as values. |
options.noComments boolean
(default CLEANUP_DEFAULTS.noComments )
|
If true (default), all HTML comments will be removed from the body. |