I really need to think of a clever name for my blog

Code samples, geeky links and other musings. Note this blog has recently moved from Posterous, so isn't fully back up to speed yet!

Cutting the Stream: Counting and stripping tags cont.

As I continue to work on truncating long activity stream posts, there aremore and more steps that need to be taken to clean up the code.

I don’t have time to fully explain everything I have done lately, so this iscurrently more a bunch of useful links and the current function and classesto point in the right direction.

First off, with some activities coming from the forum, they have BBCode inthem which is not parsed by the activity stream. Updating thestrip_html_tags() function to remove BBCode tags too is easy:

function strip_html_tags( $text ){    $text = preg_replace(            array(                // Remove BBCode tags                    '@[[/!]*?[^[]]*?]@siu',                // Remove invisible content                    '@<head[^>]*?>.*?</head>@siu',                    '@<style[^>]*?>.*?</style>@siu',                    '@<script[^>]*?.*?</script>@siu',                    '@]*?.*?@siu',                    '@]*?.*?@siu',                    '@<applet[^>]*?.*?</applet>@siu',                    '@<noframes[^>]*?.*?</noframes>@siu',                    '@<noscript[^>]*?.*?</noscript>@siu',                    '@<noembed[^>]*?.*?</noembed>@siu',                // Add line breaks before and after blocks                    '@</?((address)|(blockquote)|(center)|(del))@iu',                    '@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',                    '@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',                    '@</?((table)|(th)|(td)|(caption))@iu',                    '@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',                    '@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',                    '@</?((frameset)|(frame)|(iframe))@iu',            ),            array(                    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',                    "n$0", "n$0", "n$0", "n$0", "n$0", "n$0",                    "n$0", "n$0",            ),            $text );    $text = strip_tags( $text );    $text = nl2br($text);    return $text ;}

Argh! Knackernuts! While I get on to Posterous about the continued issue with object and embed in markdown, get the code from http://pastebin.com/kYRPPYrp

The function now also replaces inserts html line breaks before newline characters with the php nl2br() function after all tags have been stripped. The function name is perhaps becoming inaccurate, but the function is doing what I demand of it!

I also had some issues getting html_count class working in production, where it had been fine in testing.http://www.phpclasses.org/package/2653-PHP-Count-the-occurrences-of-a-given-H…

The issues here was that html_count is designed to parse the contents of an external file, not a string. Now, I can call an individual activity as an external file, but this requires post data to be sent.Wez Furlong’s do_post_request function set me on the track to solving that one:http://wezfurlong.org/blog/2006/nov/http-post-from-php-without-curl

However, it seemed a little (understatement of the year) inefficient to be generating an external file and parsing that for each activity in a stream (which can get very long when you ‘show older posts’ a few times), particularly when all that data being parsed is already in my hands at that point, so html_count needed tweaking to be able to handle strings and files – there are now two classes: string_html_tag_count and file_html_tag_count. These could easily be wrapped into one class with a switch to select which variant you wished to use. I just haven’t spent the extra few minutes doing that, as this is all taking far too long as is!

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// html_count_class.php// // This script was writed by Mahesh V. More maheshmore79 at yahoo dot com// // This program is freeware software; // // for contact me: http://www.maheshmore.tk///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////** * @class name  html_count * @shortdesc   Pattern matching and counting occurrences of HTML tag * @author       Mahesh V. More  maheshmore79 at yahoo dot com * @version      1.0.0 * @date          26th October 2005 * @downloaded from http://www.phpclasses.org/package/2653-PHP-Count-the-occurrences-of-a-given-HTML-tag.html * @modified by David Benson dmbenson1978 at gmail dot com * @modified string_html_tag_count added to permit counting from strings * @modified html_count changed to file_html_tag_count to allow sending of GET/POST params * @methods used  html_count(constructor), call_html_count() **/    class string_html_tag_count{    #    # stores count of number of tags found    #    var $count;    #    # stores regular expression pattern used for checking tag    #    var $pattern;    //     // function name: html_count()    // description: constructor    // purpose: to read data from variable, count Regex matches    // arguments: $string, $pattern    // returns: nothing    // sets: $this->count    //     function string_html_tag_count($string, $pattern)    {        $this->count = 0;        $this->pattern = $pattern;        $matches = array();        preg_match_all($this->pattern, $string, $matches);        //continue until it reaches the end of subject        $this->count += count($matches[0]);    }}//file versionclass file_html_tag_count{    #    # stores count of number of tags found    #    var $count;    #    # stores regular expression pattern used for checking tag    #    var $pattern;    # stores POST data pairs    var $params;    //    // function name: html_count()    // description: constructor    // purpose: to read data from file, calls up the call_html_count() method    // arguments: $file, $pattern    // returns: nothing    //    function file_html_tag_count($file, $params = null, $method = 'GET', $pattern)    {        $this->count = 0;        $this->pattern = $pattern;        $cparams = array(            'http' => array(                'method' => $method,                'ignore_errors' => true            )        );        if ($params !== null) {            $params = http_build_query($params);            if ($method == 'POST') {                $cparams['http']['content'] = $params;            } else {                $url .= '?' . $params;            }        }        $context = stream_context_create($cparams);        $id = fopen($file,"r", false, $context);        while($data = fread($id, 4096)) {            $this->call_html_count($data);        }        fclose($id);    }    //    // function name: call_html_count()    // description: count tags    // purpose: to perform pattern matching, counts tag and display tag name and  path attribute    // arguments: $contents    // returns: nothing    //    function call_html_count($contents)    {        $matches = array();        preg_match_all($this->pattern, $contents, $matches);        //continue until it reaches the end of subject        $this->count += count($matches[0]);    }}