Tag Archives: Regex

JS RX: Strict Mode interferes with back references

Strict mode prevents the use of back references within a regular expression defined with new RegExp() , mistaking them for octal literals. Use escaped strings instead.

This sucks.

The explanation:

I have a regex:

This little snippet is used to extract properties from objects portrayed as strings – as they aren’t JSON valid, but highly predictable, I’m parsing them myself.

This string:   htmlSelector: '.awe-display-code' will produce the following result:

This appears to be all fine and dandy, but we are matching based on this crucial chunk:  [\'\"`] which appears either side of our property value.

Lets say our value is encapsulated with single quotes:
If a double-quote or backtick is present in the value, the regex will stop matching there and then.

(Note, we aren’t concerned about handling escaped single quotes, for the sake of this explanation)

We can ensure that the second encapsulation character matches the first with a backreference. By making a group of the encapsulating chars through wrapping them in parentheses thus :  ([\'\"`])  we can refer to them later in the regex by their group index – in this case the index is 4, as we’ve added it before the group containing the property we wanted. Our regex now looks like this:

Problem solved!

Sadly, if our file declares  'use strict';  this will throw  an error.

The parser is mistaking \4  for a representation of an octal number, which is deprecated in strict mode.

The solution:

Instead of declaring the regular expression with new RegExp(string) , declare it as an escaped string: