Replacing text that does NOT match a given string

Submitted by Benjamin Melançon on June 9, 2007 - 4:17pm

Agaric likes to ask the hard questions...

ereg_replace beginning is not found
ereg_replace not present
ereg_replace not equal
ereg_replace character does not match

This is what Agaric came up with. We think that's what the ^ (carrot) character does in that context: negate, deny, exclude the following characters.

Using the . to also match on shttp, or any letter http if they've got others. Oh man, that's https and the . doesn't work that way anyway. Aborting that part of this.

From our scraper module, no, not right:

// point image paths to original $output = ereg_replace('<img src="/', '<img src="' . $domain . $output);

// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^http(.)?://]', '<img src="' . $domain . $path, $output);

// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);

// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^.http://]' . $domain . $path, $output);

Currently:

// point image paths to original $output = ereg_replace('<img src="/', '<img src="' . $domain . $output);

// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^https?://]', '<img src="' . $domain . $path, $output);

// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);

// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^https?]' . $domain . $path, $output);

I think preg_replace (which is reportedly faster) works the same.

Agaric likes to ask the hard questions...

ereg_replace beginning is not found
ereg_replace not present
ereg_replace not equal
ereg_replace character does not match

This is what Agaric came up with. We think that's what the ^ (carrot) character does in that context: negate, deny, exclude the following characters.

Using the . to also match on shttp, or any letter http if they've got others. Oh man, that's https and the . doesn't work that way anyway. Aborting that part of this.

From our scraper module, no, not right:

// point image paths to original $output = ereg_replace('<img src="/', '<img src="' . $domain . $output);

// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^http(.)?://]', '<img src="' . $domain . $path, $output);

// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);

// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^.http://]' . $domain . $path, $output);

Currently:

// point image paths to original $output = ereg_replace('<img src="/', '<img src="' . $domain . $output);

// point relative image paths to original, attempt to avoid mangling full paths
$output = ereg_replace('<img src="[^https?://]', '<img src="' . $domain . $path, $output);

// point URLs to original
$output = ereg_replace('<a href="', '<a href="/' . $path, $output);

// point relative URLs to original, attempt to avoid mangling full URLs
$output = ereg_replace('<a href="', '<a href="[^https?]' . $domain . $path, $output);

I think preg_replace (which is reportedly faster) works the same.

Comments

Post new comment

Your name: *

E-mail: *

The content of this field is kept private and will not be shown publicly.

Homepage:

Subject:

Comment: *

You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
Web page addresses and e-mail addresses turn into links automatically.
Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <blockquote> <small> <h2> <h3> <h4> <h5> <h6> <sub> <sup> <p> <br> <strike> <table> <tr> <td> <thead> <th> <tbody> <tt> <output>
Lines and paragraphs break automatically.

User login

Replacing text that does NOT match a given string

Comments

Post new comment

Search

Agaric?

Agaric Design Collective

Copyright