Jump to content

Recommended Posts

Posted

I've been fooling around with a bit of code lately. It's supposed to convert links like [[link]] in a text to a link like http://www.example.com/index.php?title=link or whatever. Like a Wikipedia link, really.

Here it is:

while($linkpos = strpos($article_text, "[[", $i) && ($linkpos != FALSE))
{
$linkendpos = strpos($article_text, "]]", $linkpos + 1);
$link = substr($article_text, $linkpos + 2, ($linkendpos - 1) - ($linkpos + 2));
$newlink = '<a href="' . $site_url . 'index.php?title="' . $link . '" class="interlink">' . $link . '</a>';
$article_text = substr($article_text, 0, $linkpos - 1) . substr($article_text, $linkpos - 1, ($linkendpos + 2) - ($linkpos - 1)) . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2));
$i = $linkendpos + 2;
}

Of course, it doesn't work. The text comes out unchanged.

 

At the moment, $article_text is the text that is being parsed. $site_url is the url of whatever the site is.

 

Anyways, you're probably thinking "What was he thinking when he wrote that code?" or something like that. Well, I'm not that great at PHP. I've never had to do regex or string manipulation before, really. I'm living off of the book until I can get enough experience. And I don't have that much, at the moment.

 

Any help would be appreciated. Thanks!

Posted

I'd say that, in this instance, regex is definately the way to go. No whiles or anything, and can be done with a one-liner:

 

preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'"index.php?title=\\1">Linkage</a>', $article_text);

 

Something in that order anyway. There's an excellent tutorial on regexps here.

Posted

Just to detail what the original problem might have been, shouldnt the line -

 

$article_text = substr($article_text, 0, $linkpos - 1) . substr($article_text, $linkpos - 1, ($linkendpos + 2) - ($linkpos - 1)) . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2));

 

have been something along the lines of -

 

$article_text = substr($article_text, 0, $linkpos - 1) . $newlink . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2));

 

Thereby placing the new link in the place of the old [[Link]].

 

 

 

Agreed with Dave, regexps are much nicer and make alot of sense once you get to know them (to be honest his regexp there is easier to read than the while code made to do the same thing (not saying your code is nasty, just that the regexp is short and sweet)).

Posted

Never mind that, dave. I forgot to add $article_text = in the front. Although now the text it returns is completely blank... argh.

 

edit: anything comes out blank with your preg_replace in it, now that it actually stores the result to $article_text. Had to comment that out for the moment.

 

I have to give you people some credit; nobody has even replied on WebHostingTalk.

Posted

Crap, sorry - the PHP should read:

 

$article_text = preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'"index.php?title=\\1">Linkage</a>', $article_text);

 

It might be doing the regexp replace, but it wasn't assigning it to $article_text :P

Posted

Well then. That works (I had accidentally removed something from the original one as well, so it all came out blank) but it links right back to index.php? rather than index.php?title=blah.

 

Hmmm...

Posted

Aha! You had an extra quotation mark there.

 

$article_text = preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'index.php?title=\\1">Linkage</a>', $article_text); 

works. (You had added an extra " after $site.url.')

 

Thanks!

 

 

Now, to work on un-parsing it for when you edit the page again...

Posted

Well, store the un-parsed version in the db, and then parse it every time you want it displayed. That's the simplest way.

Posted

But that's slower to display.

 

I'll fool around and see what I can do.

 

edit: sigh. I guess nothing beats the easy way...

 

Well, thanks a lot for the help, folks!

Posted

Well, yes. But preg_replace is stupidly fast, especially for something as simple as that regexp.

Posted

I'll be needing preg, too--bbcode and all that fun stuff.

 

Is ereg_replace just as fast? That's all this book covers.

 

(maybe I should buy one of those "Regular Expressions in a Nutshell" books)

 

edit: dang. Doesn't look like it is. I hate this book.

Posted

More woes from regexp land...

 

I have this code:

$article_text = eregi_replace( "[[:<:]]((http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,10}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\.\,_\?\'/\\\+&%\$#\=~;])*)[[:>:]]",  '<a href="\\1">\\1</a>', $article_text);

to match URLs and convert them to nice neat <a href=""></a> tags. Now I'm trying to add ] tags and proper [url tags. How would I go about making this not parse the url if the url is enclosed in tags?

Posted

Yes, indeed. I already do that with [nobb] tags so the text inside isn't parsed like the rest of it. I can just modify that code to do it for URLs, if necessary. But I think it would be easier to just make the thing not parse a url if it's already in an <a href=""> or <img src=""/> tag. A quick addition to the regex, I think.

Posted

To be honest, I'm not sure it's that simple. Adding additional things to match is fine but negating things and matching others at the same time becomes quite a bit more complicated. I tried doing what you said using Negative Lookbehind and LookAhead but it doesnt seem easy to get it to work properly in all situations.

 

I just cant find anything to simply require that a string not be there for a match (as this is hard to do as if the string cant be there, it will just match everything after the string instead). There may be an easy but non-elegant way of doing it by negating specific characters and i'll try a few things out when i get home but I don't think it's as easy as it might seem.

Posted

$article_text = eregi_replace("\[url=([^\[]+)\]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\2</a>", $article_text);
$article_text = eregi_replace("\[url\]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\1</a>", $article_text);
$article_text = eregi_replace("\[img\]([^\[]+)\[/img\]","<img src=\"\\1\" border=\"0\" alt=\"user posted image\"/>", $article_text);

That seems to do it. Any potential problems you can see with it?

 

I love google... so easy to find free code.

 

(edit: it seems to have slowed my script down a bit. All this bbcode takes an effect. Let's see if it can be made more efficient...)

Posted

Would I be correct in saying that strtr() is faster than str_replace()? In my trials it seems so. In 1000 strtr or str_replace repetitions, strtr is faster by .01+ of a second every time.

 

 

Also: Having trouble making code to replace all < and > with < and > without doing it to <table>, <tr>, <td>, and <th> tags (don't want to have to do bbcode tables).

Posted

$a = preg_replace( '/<(?!\/{0,1}(?:table|tr|td|th)[a-zA-Z0-9\-\.,_\?\\\'\/\+&%$\#\=~;]*?>)/', '<', $a ); // Less thans

 

Seems to work for the less thans, the greater thans are a little bit more tricky (due to restrictions placed on the length variability of Negative LookBehinds). Bare in mind that theres probably a much simpler method than this (some of it is there to ensure stuff like it isnt just cheese<table etc which still needs to be matched) and Ive just got back from work and it's late. I'll take another crack at the greater than replacement in the morning (after i get back from work again around 2 oclock GMT) when ive had some sleep.

Posted

$a = preg_replace( '/<(?!\/{0,1}(?:table|tr|td)[a-zA-Z0-9\-\.,_\?\s\\\'\/\+&%$\#\=~;]*?>)/', '<', $a ); // Less thans

 

for the less thans sorry (forgot to add the space to the character class in there).

 

However doing the same with a Negative LookBehind for the greater thans is proving more of a problem as the LookBehind require the regex within to be a fixed length (ie same number of characters being matched) and so things such as the non-greedy character class repetition in the less than regex wont work. I'm trying to figure out a way to do it but i haven't got my hopes up.

 

A simpler way as i said before with the ignoring, is simply to replace the tags you need with a special symbol or string and store them and then go back after the replacement of the <'s and >'s and replace the symbols with the correct replacements but you said you wanted a single regex for it so...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.