Cap'n Refsmmat Posted July 7, 2005 Posted July 7, 2005 I've been fooling around with a bit of code lately. It's supposed to convert links like [[link]] in a text to a link like http://www.example.com/index.php?title=link or whatever. Like a Wikipedia link, really. Here it is: while($linkpos = strpos($article_text, "[[", $i) && ($linkpos != FALSE)) { $linkendpos = strpos($article_text, "]]", $linkpos + 1); $link = substr($article_text, $linkpos + 2, ($linkendpos - 1) - ($linkpos + 2)); $newlink = '<a href="' . $site_url . 'index.php?title="' . $link . '" class="interlink">' . $link . '</a>'; $article_text = substr($article_text, 0, $linkpos - 1) . substr($article_text, $linkpos - 1, ($linkendpos + 2) - ($linkpos - 1)) . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2)); $i = $linkendpos + 2; } Of course, it doesn't work. The text comes out unchanged. At the moment, $article_text is the text that is being parsed. $site_url is the url of whatever the site is. Anyways, you're probably thinking "What was he thinking when he wrote that code?" or something like that. Well, I'm not that great at PHP. I've never had to do regex or string manipulation before, really. I'm living off of the book until I can get enough experience. And I don't have that much, at the moment. Any help would be appreciated. Thanks!
Dave Posted July 7, 2005 Posted July 7, 2005 I'd say that, in this instance, regex is definately the way to go. No whiles or anything, and can be done with a one-liner: preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'"index.php?title=\\1">Linkage</a>', $article_text); Something in that order anyway. There's an excellent tutorial on regexps here.
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 Dang, you're good. The book I have just doesn't give regex justice. I haven't had much need for it either. I'll test that out in a minute.
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 No luck. It still comes out unchanged. I'm using the url [[blah]].
Aeternus Posted July 7, 2005 Posted July 7, 2005 Just to detail what the original problem might have been, shouldnt the line - $article_text = substr($article_text, 0, $linkpos - 1) . substr($article_text, $linkpos - 1, ($linkendpos + 2) - ($linkpos - 1)) . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2)); have been something along the lines of - $article_text = substr($article_text, 0, $linkpos - 1) . $newlink . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2)); Thereby placing the new link in the place of the old [[Link]]. Agreed with Dave, regexps are much nicer and make alot of sense once you get to know them (to be honest his regexp there is easier to read than the while code made to do the same thing (not saying your code is nasty, just that the regexp is short and sweet)).
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 Indeed. But to no avail, it seems. Still doesn't work. I'll read up on regex and see what I can do. Thanks for the help, people.
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 Never mind that, dave. I forgot to add $article_text = in the front. Although now the text it returns is completely blank... argh. edit: anything comes out blank with your preg_replace in it, now that it actually stores the result to $article_text. Had to comment that out for the moment. I have to give you people some credit; nobody has even replied on WebHostingTalk.
Dave Posted July 7, 2005 Posted July 7, 2005 Crap, sorry - the PHP should read: $article_text = preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'"index.php?title=\\1">Linkage</a>', $article_text); It might be doing the regexp replace, but it wasn't assigning it to $article_text
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 Well then. That works (I had accidentally removed something from the original one as well, so it all came out blank) but it links right back to index.php? rather than index.php?title=blah. Hmmm...
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 Aha! You had an extra quotation mark there. $article_text = preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'index.php?title=\\1">Linkage</a>', $article_text); works. (You had added an extra " after $site.url.') Thanks! Now, to work on un-parsing it for when you edit the page again...
Dave Posted July 7, 2005 Posted July 7, 2005 Well, store the un-parsed version in the db, and then parse it every time you want it displayed. That's the simplest way.
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 But that's slower to display. I'll fool around and see what I can do. edit: sigh. I guess nothing beats the easy way... Well, thanks a lot for the help, folks!
Dave Posted July 7, 2005 Posted July 7, 2005 Well, yes. But preg_replace is stupidly fast, especially for something as simple as that regexp.
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 I'll be needing preg, too--bbcode and all that fun stuff. Is ereg_replace just as fast? That's all this book covers. (maybe I should buy one of those "Regular Expressions in a Nutshell" books) edit: dang. Doesn't look like it is. I hate this book.
radiohead Posted July 7, 2005 Posted July 7, 2005 I don't see why people always instist on buying books when ebooks are free. http://www.radiohead.is-a-geek.org/etc/ebooks I have PHP books in there if you do not like your book. Sorry I couldn't help you sooner, I just saw this post.
Cap'n Refsmmat Posted July 7, 2005 Author Posted July 7, 2005 If you had a book on PCRE syntax I'd be fine. (PCRE = Perl Compatible Regular Expression syntax. preg stuff.)
Dave Posted July 7, 2005 Posted July 7, 2005 Nice collection of eBooks there I may have to steal a few to add to my collection.
Cap'n Refsmmat Posted July 8, 2005 Author Posted July 8, 2005 More woes from regexp land... I have this code: $article_text = eregi_replace( "[[:<:]]((http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,10}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\.\,_\?\'/\\\+&%\$#\=~;])*)[[:>:]]", '<a href="\\1">\\1</a>', $article_text); to match URLs and convert them to nice neat <a href=""></a> tags. Now I'm trying to add tags. How would I go about making this not parse the url if the url is enclosed in tags?
Aeternus Posted July 8, 2005 Posted July 8, 2005 Something that might be simpler is to use the regex you have already from dave to replace all the instances with some symbol or set of symbols that is unique and wont likely be found in the text as well as storing the results in an array (using something like preg_match or similar). Then do a simple regex looking for links and replace the links with the notation and then go back through the article replacing all the special symbols that you used with the original content stored in the array (either finding using strpos() and doing a bit of substr()ing or using preg_replace or something similar in a more complex way than normal). A good regex tutorial can be found Here.
Cap'n Refsmmat Posted July 8, 2005 Author Posted July 8, 2005 Yes, indeed. I already do that with [nobb] tags so the text inside isn't parsed like the rest of it. I can just modify that code to do it for URLs, if necessary. But I think it would be easier to just make the thing not parse a url if it's already in an <a href=""> or <img src=""/> tag. A quick addition to the regex, I think.
Aeternus Posted July 8, 2005 Posted July 8, 2005 To be honest, I'm not sure it's that simple. Adding additional things to match is fine but negating things and matching others at the same time becomes quite a bit more complicated. I tried doing what you said using Negative Lookbehind and LookAhead but it doesnt seem easy to get it to work properly in all situations. I just cant find anything to simply require that a string not be there for a match (as this is hard to do as if the string cant be there, it will just match everything after the string instead). There may be an easy but non-elegant way of doing it by negating specific characters and i'll try a few things out when i get home but I don't think it's as easy as it might seem.
Cap'n Refsmmat Posted July 8, 2005 Author Posted July 8, 2005 $article_text = eregi_replace("\[url=([^\[]+)\]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\2</a>", $article_text); $article_text = eregi_replace("\[url\]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\1</a>", $article_text); $article_text = eregi_replace("\[img\]([^\[]+)\[/img\]","<img src=\"\\1\" border=\"0\" alt=\"user posted image\"/>", $article_text); That seems to do it. Any potential problems you can see with it? I love google... so easy to find free code. (edit: it seems to have slowed my script down a bit. All this bbcode takes an effect. Let's see if it can be made more efficient...)
Cap'n Refsmmat Posted July 8, 2005 Author Posted July 8, 2005 Would I be correct in saying that strtr() is faster than str_replace()? In my trials it seems so. In 1000 strtr or str_replace repetitions, strtr is faster by .01+ of a second every time. Also: Having trouble making code to replace all < and > with < and > without doing it to <table>, <tr>, <td>, and <th> tags (don't want to have to do bbcode tables).
Aeternus Posted July 9, 2005 Posted July 9, 2005 $a = preg_replace( '/<(?!\/{0,1}(?:table|tr|td|th)[a-zA-Z0-9\-\.,_\?\\\'\/\+&%$\#\=~;]*?>)/', '<', $a ); // Less thans Seems to work for the less thans, the greater thans are a little bit more tricky (due to restrictions placed on the length variability of Negative LookBehinds). Bare in mind that theres probably a much simpler method than this (some of it is there to ensure stuff like it isnt just cheese<table etc which still needs to be matched) and Ive just got back from work and it's late. I'll take another crack at the greater than replacement in the morning (after i get back from work again around 2 oclock GMT) when ive had some sleep.
Aeternus Posted July 9, 2005 Posted July 9, 2005 $a = preg_replace( '/<(?!\/{0,1}(?:table|tr|td)[a-zA-Z0-9\-\.,_\?\s\\\'\/\+&%$\#\=~;]*?>)/', '<', $a ); // Less thans for the less thans sorry (forgot to add the space to the character class in there). However doing the same with a Negative LookBehind for the greater thans is proving more of a problem as the LookBehind require the regex within to be a fixed length (ie same number of characters being matched) and so things such as the non-greedy character class repetition in the less than regex wont work. I'm trying to figure out a way to do it but i haven't got my hopes up. A simpler way as i said before with the ignoring, is simply to replace the tags you need with a special symbol or string and store them and then go back after the replacement of the <'s and >'s and replace the symbols with the correct replacements but you said you wanted a single regex for it so...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now