diff options
| author | 2022-02-28 20:22:43 +0100 | |
|---|---|---|
| committer | 2022-02-28 20:22:43 +0100 | |
| commit | 1fe66ad020ca8f0560bb9c6e311852ed77228f78 (patch) | |
| tree | df78da3f33a9f13a9d6ba3f2744c369bd6e313a6 /app/views | |
| parent | fa23ae76ea46b329fb65329081df95e864b03b23 (diff) | |
Implement Web scraping "HTML + XPath" (#4220)
* More PHP type hints for Fever
Follow-up of https://github.com/FreshRSS/FreshRSS/pull/4201
Related to https://github.com/FreshRSS/FreshRSS/issues/4200
* Detail
* Draft
* Progress
* More draft
* Fix thumbnail PHP type hint
https://github.com/FreshRSS/FreshRSS/issues/4215
* More types
* A bit more
* Refactor FreshRSS_Entry::fromArray
* Progress
* Starts to work
* Categories
* Fonctional
* Layout update
* Fix relative URLs
* Cache system
* Forgotten files
* Remove a debug line
* Automatic form validation of XPath expressions
* data-leave-validation
* Fix reload action
* Simpler examples
* Fix column type for PostgreSQL
* Enforce HTTP encoding
* Readme
* Fix get full content
* target="_blank"
* gitignore
* htmlspecialchars_utf8
* Implement HTML <base>
And fix/revert `xml:base` support in SimplePie https://github.com/simplepie/simplepie/commit/e49c578817aa504d8d05cd7f33857aeda9d41908
* SimplePie upstream PR merged
https://github.com/simplepie/simplepie/pull/723
Diffstat (limited to 'app/views')
| -rw-r--r-- | app/views/helpers/export/articles.phtml | 2 | ||||
| -rw-r--r-- | app/views/helpers/feed/update.phtml | 104 | ||||
| -rw-r--r-- | app/views/index/normal.phtml | 7 | ||||
| -rw-r--r-- | app/views/index/reader.phtml | 2 | ||||
| -rwxr-xr-x | app/views/index/rss.phtml | 30 | ||||
| -rw-r--r-- | app/views/subscription/add.phtml | 91 |
6 files changed, 229 insertions, 7 deletions
diff --git a/app/views/helpers/export/articles.phtml b/app/views/helpers/export/articles.phtml index c131b8474..ad5210968 100644 --- a/app/views/helpers/export/articles.phtml +++ b/app/views/helpers/export/articles.phtml @@ -22,7 +22,7 @@ foreach ($this->entriesRaw as $entryRaw) { if ($entryRaw == null) { continue; } - $entry = FreshRSS_EntryDAO::daoToEntry($entryRaw); + $entry = FreshRSS_Entry::fromArray($entryRaw); if (!isset($this->feed)) { $feed = FreshRSS_CategoryDAO::findFeed($this->categories, $entry->feed()); if ($feed === null) { diff --git a/app/views/helpers/feed/update.phtml b/app/views/helpers/feed/update.phtml index 264881f77..f71be5135 100644 --- a/app/views/helpers/feed/update.phtml +++ b/app/views/helpers/feed/update.phtml @@ -373,6 +373,110 @@ </div> </div> + <legend><?= _t('sub.feed.kind') ?></legend> + <div class="form-group"> + <label class="group-name" for="feed_kind"><?= _t('sub.feed.kind') ?></label> + <div class="group-controls"> + <select name="feed_kind" id="feed_kind" class="select-show"> + <option value="<?= FreshRSS_Feed::KIND_RSS ?>" <?= $this->feed->kind() == FreshRSS_Feed::KIND_RSS ? 'selected="selected"' : '' ?>><?= _t('sub.feed.kind.rss') ?></option> + <option value="<?= FreshRSS_Feed::KIND_HTML_XPATH ?>" <?= $this->feed->kind() == FreshRSS_Feed::KIND_HTML_XPATH ? 'selected="selected"' : '' ?> data-show="html_xpath"><?= _t('sub.feed.kind.html_xpath') ?></option> + </select> + </div> + </div> + + <fieldset id="html_xpath"> + <?php + $xpath = Minz_Helper::htmlspecialchars_utf8($this->feed->attributes('xpath')); + ?> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.help') ?></p> + <div class="form-group"> + <label class="group-name" for="xPathFeedTitle"><small><?= _t('sub.feed.kind.html_xpath.xpath') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.feed_title') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathFeedTitle" id="xPathFeedTitle" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['feedTitle'] ?? '' ?>"><?= $xpath['feedTitle'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.feed_title.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItem"><small><?= _t('sub.feed.kind.html_xpath.xpath') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItem" id="xPathItem" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['item'] ?? '' ?>"><?= $xpath['item'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemTitle"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_title') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemTitle" id="xPathItemTitle" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['itemTitle'] ?? '' ?>"><?= $xpath['itemTitle'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_title.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemContent"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_content') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemContent" id="xPathItemContent" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['itemContent'] ?? '' ?>"><?= $xpath['itemContent'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_content.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemUri"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_uri') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemUri" id="xPathItemUri" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['itemUri'] ?? '' ?>"><?= $xpath['itemUri'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_uri.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemThumbnail"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_thumbnail') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemThumbnail" id="xPathItemThumbnail" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['itemThumbnail'] ?? '' ?>"><?= $xpath['itemThumbnail'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_thumbnail.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemAuthor"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_author') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemAuthor" id="xPathItemAuthor" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['itemAuthor'] ?? '' ?>"><?= $xpath['itemAuthor'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_author.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemTimestamp"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_timestamp') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemTimestamp" id="xPathItemTimestamp" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['itemTimestamp'] ?? '' ?>"><?= $xpath['itemTimestamp'] ?? '' ?></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_timestamp.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemCategories"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_categories') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemCategories" id="xPathItemCategories" rows="2" cols="64" spellcheck="false" + data-leave-validation="<?= $xpath['itemCategories'] ?? '' ?>"><?= $xpath['itemCategories'] ?? '' ?></textarea> + </div> + </div> + </fieldset> + <div class="form-group form-actions"> + <div class="group-controls"> + <button class="btn btn-important"><?= _t('gen.action.submit') ?></button> + <button type="reset" class="btn"><?= _t('gen.action.cancel') ?></button> + </div> + </div> + <legend><?= _t('sub.feed.advanced') ?></legend> <div class="form-group"> <label class="group-name" for="path_entries"><?= _t('sub.feed.css_path') ?></label> diff --git a/app/views/index/normal.phtml b/app/views/index/normal.phtml index 5dde2a171..06323dcb0 100644 --- a/app/views/index/normal.phtml +++ b/app/views/index/normal.phtml @@ -21,14 +21,17 @@ $today = @strtotime('today'); </div><?php $lastEntry = null; $nbEntries = 0; + /** @var FreshRSS_Entry */ foreach ($this->entries as $item): $lastEntry = $item; $nbEntries++; ob_flush(); - $this->entry = Minz_ExtensionManager::callHook('entry_before_display', $item); - if ($this->entry == null) { + /** @var FreshRSS_Entry */ + $item = Minz_ExtensionManager::callHook('entry_before_display', $item); + if ($item == null) { continue; } + $this->entry = $item; // We most likely already have the feed object in cache $this->feed = FreshRSS_CategoryDAO::findFeed($this->categories, $this->entry->feed()); diff --git a/app/views/index/reader.phtml b/app/views/index/reader.phtml index e4fb74708..b408e3480 100644 --- a/app/views/index/reader.phtml +++ b/app/views/index/reader.phtml @@ -15,10 +15,12 @@ $content_width = FreshRSS_Context::$user_conf->content_width; </div><?php $lastEntry = null; $nbEntries = 0; + /** @var FreshRSS_Entry */ foreach ($this->entries as $item): $lastEntry = $item; $nbEntries++; ob_flush(); + /** @var FreshRSS_Entry */ $item = Minz_ExtensionManager::callHook('entry_before_display', $item); if ($item == null) { continue; diff --git a/app/views/index/rss.phtml b/app/views/index/rss.phtml index eedb31fa4..0b07a02f3 100755 --- a/app/views/index/rss.phtml +++ b/app/views/index/rss.phtml @@ -1,15 +1,26 @@ <?php /** @var FreshRSS_View $this */ ?> <?= '<?xml version="1.0" encoding="UTF-8" ?>'; ?> -<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"> +<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/" + <?= $this->rss_base == '' ? '' : ' xml:base="' . $this->rss_base . '"' ?> +> <channel> <title><?= $this->rss_title ?></title> - <link><?= Minz_Url::display('', 'html', true) ?></link> + <link><?= $this->internal_rendering ? $this->rss_url : Minz_Url::display('', 'html', true) ?></link> <description><?= _t('index.feed.rss_of', $this->rss_title) ?></description> <pubDate><?= date('D, d M Y H:i:s O') ?></pubDate> <lastBuildDate><?= gmdate('D, d M Y H:i:s') ?> GMT</lastBuildDate> - <atom:link href="<?= Minz_Url::display($this->url, 'html', true) ?>" rel="self" type="application/rss+xml" /> + <atom:link href="<?= $this->internal_rendering ? $this->rss_url : + Minz_Url::display($this->rss_url, 'html', true) ?>" rel="self" type="application/rss+xml" /> <?php +/** @var FreshRSS_Entry */ foreach ($this->entries as $item) { + if (!$this->internal_rendering) { + /** @var FreshRSS_Entry */ + $item = Minz_ExtensionManager::callHook('entry_before_display', $item); + if ($item == null) { + continue; + } + } ?> <item> <title><?= $item->title() ?></title> @@ -27,12 +38,23 @@ foreach ($this->entries as $item) { echo "\t\t\t" , '<category>', $category, '</category>', "\n"; } } + $enclosures = $item->enclosures(false); + if (is_array($enclosures)) { + foreach ($enclosures as $enclosure) { + // https://www.rssboard.org/media-rss + echo "\t\t\t" , '<media:content url="' . $enclosure['url'] + . (empty($enclosure['medium']) ? '' : '" medium="' . $enclosure['medium']) + . (empty($enclosure['type']) ? '' : '" type="' . $enclosure['type']) + . (empty($enclosure['length']) ? '' : '" length="' . $enclosure['length']) + . '"></media:content>', "\n"; + } + } ?> <description><![CDATA[<?php echo $item->content(); ?>]]></description> <pubDate><?= date('D, d M Y H:i:s O', $item->date(true)) ?></pubDate> - <guid isPermaLink="false"><?= $item->id() ?></guid> + <guid isPermaLink="false"><?= $item->id() > 0 ? $item->id() : $item->guid() ?></guid> </item> <?php } ?> diff --git a/app/views/subscription/add.phtml b/app/views/subscription/add.phtml index 380f5434f..344e25ade 100644 --- a/app/views/subscription/add.phtml +++ b/app/views/subscription/add.phtml @@ -53,6 +53,97 @@ <details class="form-advanced"> <summary class="form-advanced-title"> + <?= _t('sub.feed.kind') ?> + </summary> + + <div class="form-group"> + <label class="group-name" for="feed_kind"><?= _t('sub.feed.kind') ?></label> + <div class="group-controls"> + <select name="feed_kind" id="feed_kind" class="select-show"> + <option value="<?= FreshRSS_Feed::KIND_RSS ?>" selected="selected"><?= _t('sub.feed.kind.rss') ?></option> + <option value="<?= FreshRSS_Feed::KIND_HTML_XPATH ?>" data-show="html_xpath"><?= _t('sub.feed.kind.html_xpath') ?></option> + </select> + </div> + </div> + + <fieldset id="html_xpath"> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.help') ?></p> + <div class="form-group"> + <label class="group-name" for="xPathFeedTitle"><small><?= _t('sub.feed.kind.html_xpath.xpath') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.feed_title') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathFeedTitle" id="xPathFeedTitle" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.feed_title.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItem"><small><?= _t('sub.feed.kind.html_xpath.xpath') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItem" id="xPathItem" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemTitle"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_title') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemTitle" id="xPathItemTitle" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_title.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemContent"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_content') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemContent" id="xPathItemContent" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_content.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemUri"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_uri') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemUri" id="xPathItemUri" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_uri.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemThumbnail"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_thumbnail') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemThumbnail" id="xPathItemThumbnail" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_thumbnail.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemAuthor"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_author') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemAuthor" id="xPathItemAuthor" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_author.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemTimestamp"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_timestamp') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemTimestamp" id="xPathItemTimestamp" rows="2" cols="64" spellcheck="false"></textarea> + <p class="help"><?= _i('help') ?> <?= _t('sub.feed.kind.html_xpath.item_timestamp.help') ?></p> + </div> + </div> + <div class="form-group"> + <label class="group-name" for="xPathItemCategories"><small><?= _t('sub.feed.kind.html_xpath.relative') ?></small><br /> + <?= _t('sub.feed.kind.html_xpath.item_categories') ?></label> + <div class="group-controls"> + <textarea class="valid-xpath" name="xPathItemCategories" id="xPathItemCategories" rows="2" cols="64" spellcheck="false"></textarea> + </div> + </div> + </fieldset> + </details> + + <details class="form-advanced"> + <summary class="form-advanced-title"> <?= _t('sub.feed.advanced') ?> </summary> |
