summaryrefslogtreecommitdiff
path: root/lib/lib_phpQuery.php
diff options
context:
space:
mode:
authorGravatar Alexandre Alapetite <alexandre@alapetite.fr> 2022-02-28 20:22:43 +0100
committerGravatar GitHub <noreply@github.com> 2022-02-28 20:22:43 +0100
commit1fe66ad020ca8f0560bb9c6e311852ed77228f78 (patch)
treedf78da3f33a9f13a9d6ba3f2744c369bd6e313a6 /lib/lib_phpQuery.php
parentfa23ae76ea46b329fb65329081df95e864b03b23 (diff)
Implement Web scraping "HTML + XPath" (#4220)
* More PHP type hints for Fever Follow-up of https://github.com/FreshRSS/FreshRSS/pull/4201 Related to https://github.com/FreshRSS/FreshRSS/issues/4200 * Detail * Draft * Progress * More draft * Fix thumbnail PHP type hint https://github.com/FreshRSS/FreshRSS/issues/4215 * More types * A bit more * Refactor FreshRSS_Entry::fromArray * Progress * Starts to work * Categories * Fonctional * Layout update * Fix relative URLs * Cache system * Forgotten files * Remove a debug line * Automatic form validation of XPath expressions * data-leave-validation * Fix reload action * Simpler examples * Fix column type for PostgreSQL * Enforce HTTP encoding * Readme * Fix get full content * target="_blank" * gitignore * htmlspecialchars_utf8 * Implement HTML <base> And fix/revert `xml:base` support in SimplePie https://github.com/simplepie/simplepie/commit/e49c578817aa504d8d05cd7f33857aeda9d41908 * SimplePie upstream PR merged https://github.com/simplepie/simplepie/pull/723
Diffstat (limited to 'lib/lib_phpQuery.php')
-rw-r--r--lib/lib_phpQuery.php3
1 files changed, 2 insertions, 1 deletions
diff --git a/lib/lib_phpQuery.php b/lib/lib_phpQuery.php
index 411aa120c..1fabfcb6d 100644
--- a/lib/lib_phpQuery.php
+++ b/lib/lib_phpQuery.php
@@ -436,7 +436,8 @@ class DOMDocumentWrapper {
}
protected function isXML($markup) {
// return strpos($markup, '<?xml') !== false && stripos($markup, 'xhtml') === false;
- return strpos(substr($markup, 0, 100), '<'.'?xml') !== false;
+ $head = substr($markup, 0, 100);
+ return strpos($head, '<'.'?xml') !== false && stripos($head, '<html ') === false;
}
protected function contentTypeToArray($contentType) {
$matches = explode(';', trim(strtolower($contentType)));