Let's take a look at how to use HTML extraction and regular expressions to extract only images from a web page.
(Unauthorized use of images on other websites is a violation of copyright, so if you want to use it commercially, you should proceed after the copyright agreement. Otherwise, please use it only for the management or research of your own website.)
First of all, we created two functions.
- The getImgTag function extracts the necessary values using regular expressions.
- A function called getRC functions to get the source of the URL address.
After getting the source of the URL address through the two functions, we extracted the element of the img tag using the regular expression, and returned it in an array.
<?php
// getImgTag ('URL address', 'Tag', 'Attribute')
print_r(getImgTag('{Web page URL address)}', 'img', 'src'));
// This is a function that extracts Tag and Attribute values using regular expressions.
function getImgTag($url, $tag, $attribute = null)
{
if (!empty($tag)) {
$htmlDom = getRC($url);
preg_match_all("/<".$tag."[^>]*".$attribute."=[\"']?([^>\"']+)[\"']?[^>]*>/i", $htmlDom, $imageList);
$result = null;
if (empty($attribute)) {
// Extract the entire img tag.
$result = $imageList[0];
} else {
// extract only src value of img tag.
}
// Return in array form.
return $result;
} else {
return null;
}
}
// HTML extraction function via URL address.
function getRC($url)
{
if (ini_get('allow_url_fopen') == '1') {
// Separate the hostname and url path values.
$parsedUrl = parse_url($url);
$host = $parsedUrl['host'];
if (isset($parsedUrl['path'])) {
$path = $parsedUrl['path'];
} else {
$path = '/';
}
if (isset($parsedUrl['query'])) {
$path .= '?' . $parsedUrl['query'];
}
if (isset($parsedUrl['port'])) {
$port = $parsedUrl['port'];
} else {
$port = '80';
}
$timeout = 10;
$response = '';
// Connect to remote server.
$fp = fsockopen($host, $port, $errno, $errstr, $timeout);
if (!$fp) {
echo "Cannot retrieve $url";
} else {
// send the necessary headers.
fputs($fp, "GET $path HTTP/1.0\r\n" .
"Host: $host\r\n" .
"User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3\r\n" .
"Accept: */*\r\n" .
"Accept-Language: en-us,en;q=0.5\r\n" .
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n" .
"Keep-Alive: 300\r\n" .
"Connection: keep-alive\r\n" .
"Referer: http://$host\r\n\r\n");
// Start receiving response from the remote server.
while ($line = fread($fp, 4096)) {
$response .= $line;
}
fclose($fp);
// Remove the header part.
$pos = strpos($response, "\r\n\r\n");
$response = substr($response, $pos + 4);
}
} else {
// If allow_url_fopen is disabled, create curl or function yourself.
$curl = curl_init();
$timeOut = 10;
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_TIMEOUT, $timeOut);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$curlData = curl_exec($curl);
curl_close($curl);
$response = json_decode($curlData);
}
// Returns the response-processed value.
}
'개발 꿀팁 > PHP' 카테고리의 다른 글
[PHP] Check for the existence of a variable, Isset() Empty() (0) | 2018.01.19 |
---|---|
[PHP] Decimal point conversion, truncation, rounding, and rounding of numeric type variables (0) | 2018.01.19 |
Remove empty elements from PHP array (0) | 2018.01.16 |
[PHP] Back to previous page (0) | 2018.01.16 |
How to check data types in PHP, Gettype() (0) | 2018.01.16 |