aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorGravatar eta-orionis <3466670+eta-orionis@users.noreply.github.com> 2024-01-10 08:23:45 +0100
committerGravatar GitHub <noreply@github.com> 2024-01-10 08:23:45 +0100
commit9c97d8ca729e3cfb067445c0d3c9ad8284132aeb (patch)
tree256588d7a65cc8658c808bc7852c816f6ccc1cd2 /docs
parent9a80dde238caf1338b803f67003cd459393efdc3 (diff)
JSONFeeds, JSON scraping, and POST requests for feeds (#5662)
* allow POST requests for feeds * added json dotpath and jsonfeed subscriptions. No translation strings yet * debug and fix jsonfeed parser * bugfix params saved when editing feed * added translations for JSON features * Update docs for web scraping * make fix-all and revert unrelated changes, plus a few manual fixes, but there are still several type errors * Fix some i18n * refactor json parsing for both feed types * cleanup unnecessary comment * refactored generation of SimplePie for XPath and JSON feeds * Fix merge error * Update to newer FreshRSS code * A bit of refactoring * doc, whitespace * JSON Feed is in two words * Add support for array syntax * Whitespace * Add OPML export/import * Work on i18n * Accept application/feed+json * Rework POST * Fix update * OPML for cURL options * Fix types * Fix Typos --------- Co-authored-by: Erion Elmasllari <elmasllari@factorsixty.com> Co-authored-by: Alexandre Alapetite <alexandre@alapetite.fr>
Diffstat (limited to 'docs')
-rw-r--r--docs/en/developers/OPML.md38
-rw-r--r--docs/en/users/11_website_scraping.md41
2 files changed, 76 insertions, 3 deletions
diff --git a/docs/en/developers/OPML.md b/docs/en/developers/OPML.md
index 5191592a8..0cc7cabbf 100644
--- a/docs/en/developers/OPML.md
+++ b/docs/en/developers/OPML.md
@@ -44,6 +44,44 @@ The following attributes are using similar naming conventions than [RSS-Bridge](
* `frss:xPathItemCategories`: XPath expression for extracting a list of categories (tags) from the item context.
* `frss:xPathItemUid`: XPath expression for extracting an item’s unique ID from the item context. If left empty, a hash is computed automatically.
+### JSON+DotPath
+
+* `<outline type="JSON+DotPath" ...`: Similar to `HTML+XPath` but for JSON and using a dot/bracket syntax such as `object.object.array[2].property`.
+
+* `frss:jsonItem`: JSON dot path for extracting the feed items from the source page.
+ * Example: `data.items`
+* `frss:jsonItemTitle`: JSON dot path for extracting the item’s title from the item context.
+ * Example: `meta.title`
+* `frss:jsonItemContent`: JSON dot path for extracting an item’s content from the item context.
+ * Example: `content`
+* `frss:jsonItemUri`: JSON dot path for extracting an item link from the item context.
+ * Example: `meta.links[0]`
+* `frss:jsonItemAuthor`: JSON dot path for extracting an item author from the item context.
+* `frss:jsonItemTimestamp`: JSON dot path for extracting an item timestamp from the item context. The result will be parsed by [`strtotime()`](https://php.net/strtotime).
+* `frss:jsonItemTimeFormat`: Date/Time format to parse the timestamp, according to [`DateTime::createFromFormat()`](https://php.net/datetime.createfromformat).
+* `frss:jsonItemThumbnail`: JSON dot path for extracting an item’s thumbnail (image) URL from the item context.
+* `frss:jsonItemCategories`: JSON dot path for extracting a list of categories (tags) from the item context.
+* `frss:jsonItemUid`: JSON dot path for extracting an item’s unique ID from the item context. If left empty, a hash is computed automatically.
+
+### JSON Feed
+
+* `<outline type="JSONFeed" ...`: Uses `JSON+DotPath` behind the scenes to parse a [JSON Feed](https://www.jsonfeed.org/).
+
+### cURL
+
+A number of [cURL options](https://curl.se/libcurl/c/curl_easy_setopt.html) are supported:
+
+* `frss:CURLOPT_COOKIE`
+* `frss:CURLOPT_COOKIEFILE`
+* `frss:CURLOPT_FOLLOWLOCATION`
+* `frss:CURLOPT_HTTPHEADER`
+* `frss:CURLOPT_MAXREDIRS`
+* `frss:CURLOPT_POST`
+* `frss:CURLOPT_POSTFIELDS`
+* `frss:CURLOPT_PROXY`
+* `frss:CURLOPT_PROXYTYPE`
+* `frss:CURLOPT_USERAGENT`
+
### Miscellaneous
* `frss:cssFullContent`: [CSS Selector](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors) to enable the download and extraction of the matching HTML section of each articles’ Web address.
diff --git a/docs/en/users/11_website_scraping.md b/docs/en/users/11_website_scraping.md
index f51c2ba40..f725280ea 100644
--- a/docs/en/users/11_website_scraping.md
+++ b/docs/en/users/11_website_scraping.md
@@ -5,9 +5,14 @@ FreshRSS has a built-in [Web scraping](https://en.wikipedia.org/wiki/Web_scrapin
## How to add
Go to “Subscription Management” where a new feed can be added.
-Change the “Type of feed source” to “HTML + XPath (Web scraping)”.
-An additional list of text boxes to configure the web scraping.
-[XPath 1.0](https://www.w3.org/TR/xpath-10/) is used as traversing language.
+Change the “Type of feed source” to one of:
+- “HTML + XPath (Web scraping)”
+- JSON Feed (see [`jsonfeed.org`](https://www.jsonfeed.org/))
+- JSON (Dotted paths)
+
+An additional list of text boxes to configure the Web scraping will show.
+
+For HTML + XPath, [XPath 1.0](https://www.w3.org/TR/xpath-10/) is used as traversing language.
### Get the XPath path
@@ -15,6 +20,36 @@ Firefox: the built-in “inspect” tool may be used to help create a valid XPat
Select the node in the HTML, right click with your mouse and chose “Copy” and “XPath”.
The XPath is stored in your clipboard now.
+### Get the JSON dotted path
+
+Suppose the JSON to which you are subscribing to (or scraping) looks like this:
+
+```json
+{
+ "data": {
+ "items": [
+ {
+ "meta": {"title": "Some news item"},
+ "content": "Content of the news",
+ "links": ["https://example.net/1", "https://example.org/1"]
+ },
+ {
+ "meta": {"title": "Some other news item"},
+ "content": "Yet more content",
+ "links": ["https://example.net/2", "https://example.org/2"]
+ }
+ ]
+ }
+}
+```
+
+The *dot notation* and *bracket notation* (only numeric) are supported.
+
+Then the items are under `data.items`, and within each item, the title is `meta.title`,
+and the link would be `links[1]`.
+
+It is a similar syntax to the JavaScript way to access JSON: `object.object.array[2].property`.
+
## Tips & tricks
- [Timezone of date](https://github.com/FreshRSS/FreshRSS/discussions/5483)