Breaking down the Regex
In my previous post I used some complex Regex with PHP to manipulate some HTML.
The principles in the Regex can be used to do manipulate all kinds of HTML if you know how to break it down. Reading Regex can be a pain, and I always wish people would break it down, so, for my own reference as much as anything else, here it is:
The Regex:
/<object\s+[^>]*width\s*=\s*(?:"([^"]+)")\s+[^>]*height\s*=\s*(?:"([^"]+)")>((<param\s+[^>]*>)*)<embed\s+[^>]*width\s*=\s*(?:"([^"]+)")\s+[^>]*height\s*=\s*(?:"([^"]+)")><\/[e]mbed><\/object>/ims
The Breakdown
Wow. That's some hefty regex, so let's break it down:
| / | start of regex string |
| <object | find string literal <object |
| \s+ | 1+ + blankspace chars \s |
| [^>]* | 0+ chars * not > [^>] |
| width | string literal width |
| \s* | 0+ * blankspace chars \s |
| = | string literal = |
| \s* | 0+ * blankspace chars \s |
| (?:"([^"]+)") |
string literal " |
| \s+ | 1+ + blankspace chars \s |
| [^>]* | 0+ chars * not > [^>] |
| height | string literal height |
| \s* | 0+ * blankspace chars \s |
| = | string literal = |
| \s* | 0+ * blankspace chars \s |
| (?:"([^"]+)") |
string literal " |
| > | string literal > |
| ((<param\s+[^>]*>)*) |
0+ * of the following: |
| <embed | string literal <nmbed |
| \s+ | 1+ + blankspace chars \s |
| [^>]* | 0+ chars * not > [^>] |
| width | string literal width |
| \s* | 0+ * blankspaces \s |
| = | string literal = |
| \s* | 0+ * blankspaces \s |
| (?:"([^"]+)") |
string literal " |
| \s+ | blankspace char \s one or more times + |
| [^>]* | 0+ chars * not > [^>] |
| height | string literal height |
| \s* | 0+ blankspaces \s |
| = | string literal = |
| \s* | 0+ blankspaces \s |
| (?:"([^"]+)") |
string literal " |
| ([^>]*) | 0+ chars * not > [^>] back reference 7 () |
| ><\/[e]mbed><\/object> | string literal ></embed></object> /e is the escape characte, so placing the e in a group [e] isolates it and resolves the issue / is escaped with \ |
| /i |
end regex / |
Test it out! Grab the sample html and the regex above (everything between the start and end slashes) and paste them in to http://regex.larsolavtorvik.com/
You can paste it in one section at a time to see how the regex builds up piece by piece.
