diff options
| author | 2024-01-10 08:23:45 +0100 | |
|---|---|---|
| committer | 2024-01-10 08:23:45 +0100 | |
| commit | 9c97d8ca729e3cfb067445c0d3c9ad8284132aeb (patch) | |
| tree | 256588d7a65cc8658c808bc7852c816f6ccc1cd2 /docs | |
| parent | 9a80dde238caf1338b803f67003cd459393efdc3 (diff) | |
JSONFeeds, JSON scraping, and POST requests for feeds (#5662)
* allow POST requests for feeds
* added json dotpath and jsonfeed subscriptions. No translation strings yet
* debug and fix jsonfeed parser
* bugfix params saved when editing feed
* added translations for JSON features
* Update docs for web scraping
* make fix-all
and revert unrelated changes, plus a few manual fixes, but there are still several type errors
* Fix some i18n
* refactor json parsing for both feed types
* cleanup unnecessary comment
* refactored generation of SimplePie for XPath and JSON feeds
* Fix merge error
* Update to newer FreshRSS code
* A bit of refactoring
* doc, whitespace
* JSON Feed is in two words
* Add support for array syntax
* Whitespace
* Add OPML export/import
* Work on i18n
* Accept application/feed+json
* Rework POST
* Fix update
* OPML for cURL options
* Fix types
* Fix Typos
---------
Co-authored-by: Erion Elmasllari <elmasllari@factorsixty.com>
Co-authored-by: Alexandre Alapetite <alexandre@alapetite.fr>
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/en/developers/OPML.md | 38 | ||||
| -rw-r--r-- | docs/en/users/11_website_scraping.md | 41 |
2 files changed, 76 insertions, 3 deletions
diff --git a/docs/en/developers/OPML.md b/docs/en/developers/OPML.md index 5191592a8..0cc7cabbf 100644 --- a/docs/en/developers/OPML.md +++ b/docs/en/developers/OPML.md @@ -44,6 +44,44 @@ The following attributes are using similar naming conventions than [RSS-Bridge]( * `frss:xPathItemCategories`: XPath expression for extracting a list of categories (tags) from the item context. * `frss:xPathItemUid`: XPath expression for extracting an item’s unique ID from the item context. If left empty, a hash is computed automatically. +### JSON+DotPath + +* `<outline type="JSON+DotPath" ...`: Similar to `HTML+XPath` but for JSON and using a dot/bracket syntax such as `object.object.array[2].property`. + +* `frss:jsonItem`: JSON dot path for extracting the feed items from the source page. + * Example: `data.items` +* `frss:jsonItemTitle`: JSON dot path for extracting the item’s title from the item context. + * Example: `meta.title` +* `frss:jsonItemContent`: JSON dot path for extracting an item’s content from the item context. + * Example: `content` +* `frss:jsonItemUri`: JSON dot path for extracting an item link from the item context. + * Example: `meta.links[0]` +* `frss:jsonItemAuthor`: JSON dot path for extracting an item author from the item context. +* `frss:jsonItemTimestamp`: JSON dot path for extracting an item timestamp from the item context. The result will be parsed by [`strtotime()`](https://php.net/strtotime). +* `frss:jsonItemTimeFormat`: Date/Time format to parse the timestamp, according to [`DateTime::createFromFormat()`](https://php.net/datetime.createfromformat). +* `frss:jsonItemThumbnail`: JSON dot path for extracting an item’s thumbnail (image) URL from the item context. +* `frss:jsonItemCategories`: JSON dot path for extracting a list of categories (tags) from the item context. +* `frss:jsonItemUid`: JSON dot path for extracting an item’s unique ID from the item context. If left empty, a hash is computed automatically. + +### JSON Feed + +* `<outline type="JSONFeed" ...`: Uses `JSON+DotPath` behind the scenes to parse a [JSON Feed](https://www.jsonfeed.org/). + +### cURL + +A number of [cURL options](https://curl.se/libcurl/c/curl_easy_setopt.html) are supported: + +* `frss:CURLOPT_COOKIE` +* `frss:CURLOPT_COOKIEFILE` +* `frss:CURLOPT_FOLLOWLOCATION` +* `frss:CURLOPT_HTTPHEADER` +* `frss:CURLOPT_MAXREDIRS` +* `frss:CURLOPT_POST` +* `frss:CURLOPT_POSTFIELDS` +* `frss:CURLOPT_PROXY` +* `frss:CURLOPT_PROXYTYPE` +* `frss:CURLOPT_USERAGENT` + ### Miscellaneous * `frss:cssFullContent`: [CSS Selector](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors) to enable the download and extraction of the matching HTML section of each articles’ Web address. diff --git a/docs/en/users/11_website_scraping.md b/docs/en/users/11_website_scraping.md index f51c2ba40..f725280ea 100644 --- a/docs/en/users/11_website_scraping.md +++ b/docs/en/users/11_website_scraping.md @@ -5,9 +5,14 @@ FreshRSS has a built-in [Web scraping](https://en.wikipedia.org/wiki/Web_scrapin ## How to add Go to “Subscription Management” where a new feed can be added. -Change the “Type of feed source” to “HTML + XPath (Web scraping)”. -An additional list of text boxes to configure the web scraping. -[XPath 1.0](https://www.w3.org/TR/xpath-10/) is used as traversing language. +Change the “Type of feed source” to one of: +- “HTML + XPath (Web scraping)” +- JSON Feed (see [`jsonfeed.org`](https://www.jsonfeed.org/)) +- JSON (Dotted paths) + +An additional list of text boxes to configure the Web scraping will show. + +For HTML + XPath, [XPath 1.0](https://www.w3.org/TR/xpath-10/) is used as traversing language. ### Get the XPath path @@ -15,6 +20,36 @@ Firefox: the built-in “inspect” tool may be used to help create a valid XPat Select the node in the HTML, right click with your mouse and chose “Copy” and “XPath”. The XPath is stored in your clipboard now. +### Get the JSON dotted path + +Suppose the JSON to which you are subscribing to (or scraping) looks like this: + +```json +{ + "data": { + "items": [ + { + "meta": {"title": "Some news item"}, + "content": "Content of the news", + "links": ["https://example.net/1", "https://example.org/1"] + }, + { + "meta": {"title": "Some other news item"}, + "content": "Yet more content", + "links": ["https://example.net/2", "https://example.org/2"] + } + ] + } +} +``` + +The *dot notation* and *bracket notation* (only numeric) are supported. + +Then the items are under `data.items`, and within each item, the title is `meta.title`, +and the link would be `links[1]`. + +It is a similar syntax to the JavaScript way to access JSON: `object.object.array[2].property`. + ## Tips & tricks - [Timezone of date](https://github.com/FreshRSS/FreshRSS/discussions/5483) |
