Replacing text that does NOT match a given string
Agaric likes to ask the hard questions...
ereg_replace beginning is not found
ereg_replace not present
ereg_replace not equal
ereg_replace character does not match
This is what Agaric came up with. We think that's what the ^ (carrot) character does in that context: negate, deny, exclude the following characters.
Using the . to also match on shttp, or any letter http if they've got others. Oh man, that's https and the . doesn't work that way anyway. Aborting that part of this.
From our scraper module, no, not right:
// point image paths to original
$output = ereg_replace('<img src="/', '<img src="' . $domain . $output);// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^http(.)?://]', '<img src="' . $domain . $path, $output);// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);
// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^.http://]' . $domain . $path, $output);
Currently:
// point image paths to original
$output = ereg_replace('<img src="/', '<img src="' . $domain . $output);// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^https?://]', '<img src="' . $domain . $path, $output);// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);
// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^https?]' . $domain . $path, $output);
I think preg_replace (which is reportedly faster) works the same.
Agaric likes to ask the hard questions...
ereg_replace beginning is not found
ereg_replace not present
ereg_replace not equal
ereg_replace character does not match
This is what Agaric came up with. We think that's what the ^ (carrot) character does in that context: negate, deny, exclude the following characters.
Using the . to also match on shttp, or any letter http if they've got others. Oh man, that's https and the . doesn't work that way anyway. Aborting that part of this.
From our scraper module, no, not right:
// point image paths to original
$output = ereg_replace('<img src="/', '<img src="' . $domain . $output);// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^http(.)?://]', '<img src="' . $domain . $path, $output);// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);
// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^.http://]' . $domain . $path, $output);
Currently:
// point image paths to original
$output = ereg_replace('<img src="/', '<img src="' . $domain . $output);// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^https?://]', '<img src="' . $domain . $path, $output);// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);
// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^https?]' . $domain . $path, $output);
I think preg_replace (which is reportedly faster) works the same.
Comments
Post new comment