[PHP function] URL page image tag extraction

개발 꿀팁/PHP

[PHP function] URL page image tag extraction

Jammie 2018. 1. 16. 01:33

Let's take a look at how to use HTML extraction and regular expressions to extract only images from a web page.

(Unauthorized use of images on other websites is a violation of copyright, so if you want to use it commercially, you should proceed after the copyright agreement. Otherwise, please use it only for the management or research of your own website.)

First of all, we created two functions.

- The getImgTag function extracts the necessary values using regular expressions.

- A function called getRC functions to get the source of the URL address.

After getting the source of the URL address through the two functions, we extracted the element of the img tag using the regular expression, and returned it in an array.

<?php

// getImgTag ('URL address', 'Tag', 'Attribute')

print_r(getImgTag('{Web page URL address)}', 'img', 'src'));

// This is a function that extracts Tag and Attribute values using regular expressions.

function getImgTag($url, $tag, $attribute = null)

{

if (!empty($tag)) {

$htmlDom = getRC($url);

preg_match_all("/<".$tag."[^>]*".$attribute."=[\"']?([^>\"']+)[\"']?[^>]*>/i", $htmlDom, $imageList);

$result = null;

if (empty($attribute)) {

// Extract the entire img tag.

$result = $imageList[0];

} else {

// extract only src value of img tag.

$result = $imageList[1];

}

// Return in array form.

return $result;

} else {

return null;

}

// HTML extraction function via URL address.

function getRC($url)

{

if (ini_get('allow_url_fopen') == '1') {

// Separate the hostname and url path values.

$parsedUrl = parse_url($url);

$host = $parsedUrl['host'];

if (isset($parsedUrl['path'])) {

$path = $parsedUrl['path'];

} else {

$path = '/';

}

if (isset($parsedUrl['query'])) {

$path .= '?' . $parsedUrl['query'];

}

if (isset($parsedUrl['port'])) {

$port = $parsedUrl['port'];

} else {

$port = '80';

}

$timeout = 10;

$response = '';

// Connect to remote server.

$fp = fsockopen($host, $port, $errno, $errstr, $timeout);

if (!$fp) {

echo "Cannot retrieve $url";

} else {

// send the necessary headers.

fputs($fp, "GET $path HTTP/1.0\r\n" .

"Host: $host\r\n" .

"User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3\r\n" .

"Accept: */*\r\n" .

"Accept-Language: en-us,en;q=0.5\r\n" .

"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n" .

"Keep-Alive: 300\r\n" .

"Connection: keep-alive\r\n" .

"Referer: http://$host\r\n\r\n");

// Start receiving response from the remote server.

while ($line = fread($fp, 4096)) {

$response .= $line;

}

fclose($fp);

// Remove the header part.

$pos = strpos($response, "\r\n\r\n");

$response = substr($response, $pos + 4);

}

} else {

// If allow_url_fopen is disabled, create curl or function yourself.

$curl = curl_init();

$timeOut = 10;

curl_setopt($curl, CURLOPT_URL, $url);

curl_setopt($curl, CURLOPT_HEADER, false);

curl_setopt($curl, CURLOPT_TIMEOUT, $timeOut);

curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

$curlData = curl_exec($curl);

curl_close($curl);

$response = json_decode($curlData);

}

// Returns the response-processed value.

return $response;

}

저작자표시 비영리 변경금지 (새창열림)

'개발 꿀팁 > PHP' 카테고리의 다른 글

[PHP] Check for the existence of a variable, Isset() Empty() (0)	2018.01.19
[PHP] Decimal point conversion, truncation, rounding, and rounding of numeric type variables (0)	2018.01.19
Remove empty elements from PHP array (0)	2018.01.16
[PHP] Back to previous page (0)	2018.01.16
How to check data types in PHP, Gettype() (0)	2018.01.16

현재글[PHP function] URL page image tag extraction

프로그래밍 꿀팁모음

PHP, 파이선, JQuery 등 다양한 프로그래밍 팁을 공유합니다.

Web, pjp, Python, CTF, JavaScript, redis, forach, class, CentOS, apache, myaql, docker, MySQL, java, Linux, php, XAMPP, nginx, pytton, phpword,

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

프로그래밍 꿀팁모음