WordPress: How to use get_shortcode_regex()

get_shortcode_regex() (GSR() from now on) is used
to parse shortcodes from a post’s text.

I was writing a filter to take the post text,
parse the shortcodes, and modify them by adding
an “id” parameter.

After I spent some time writing a regex to parse
the shortcodes, I discovered GSR(). GSR() was
better and more complete.

Now I just had to learn to use it – and there weren’t
any docs.

Let’s Review How to Use Shortcodes

You’ve basically got five ways to use shortcodes:

[[fe-escape]]
[[fe-escape] ... [/fe-escape]]
[fe-single /]
[fe-single]
[fe-single title="foobar" name="janedoe"]Header Content[/fe-single]

The first two, fe-escape, are both escaped so that the raw
tags are displayed.

The last three are the normal usages.

[fe-single /]  # A standalone shortcode.

[fe-single]  # Another standalone shortcode.

[fe-single title="foobar" name="janedoe"]Header Content[/fe-single]
# A shortcode with attributes, and wrapping content.

Now Some Code to Dig Around WordPress

In functions.php (in my child theme) I added this code:

function fe_set_ids($text) {
    $regex = get_shortcode_regex();
    $count = preg_match_all( '/'.$regex.'/s', $text, $matches, PREG_SET_ORDER );
    if ($count==0) return $text;

    throw new Exception();
    return $text;
}
add_filter( 'content_save_pre', 'fe_set_ids' );

function fe_single_shortcode() { }
function fe_escape_shortcode() { }
add_shortcode( 'fe-single', 'fe_single_shortcode' );
add_shortcode( 'fe-escape', 'fe_escape_shortcode' );

That code doesn’t do anything. It’s there to accept the text of
a post, parse it with the GSR(), and then throw an exception so we get
the debugging output.

We specify PREG_SET_ORDER so the regex returns one element for
each match. If we match nothing, we just pass the text through.

    $count = preg_match_all( '/'.$regex.'/s', $text, $matches, PREG_SET_ORDER );
    if ($count==0) return $text;

The GSR() matches only existing shortcodes, so we need to define them:

function fe_single_shortcode() { }
function fe_escape_shortcode() { }
add_shortcode( 'fe-single', 'fe_single_shortcode' );
add_shortcode( 'fe-escape', 'fe_escape_shortcode' );

They don’t do anything.

Time to Blow It Up

Fire up WordPress, log in, and create a new Page. Paste in this code into
the text:

[[fe-escape] ... [/fe-escape]]
[[fe-escape]]
[fe-single /]
[fe-single title="foobar" name="janedoe"]Header Content[/fe-single]

For best results, click the “Text” tab first, then paste in the code.

We’re only using four shortcodes because the parser messes up when you have
both the single and paired “fe-single” shortcodes next to each other.

When you hit “Update” the system barfs out some debugging info, including
$matches:

array (size=4)
  0 => 
    array (size=7)
      0 => string '[[fe-escape] ... [/fe-escape]]' (length=30)
      1 => string '[' (length=1)
      2 => string 'fe-escape' (length=9)
      3 => string '' (length=0)
      4 => string '' (length=0)
      5 => string ' ... ' (length=5)
      6 => string ']' (length=1)
  1 => 
    array (size=7)
      0 => string '[[fe-escape]]' (length=13)
      1 => string '[' (length=1)
      2 => string 'fe-escape' (length=9)
      3 => string '' (length=0)
      4 => string '' (length=0)
      5 => string '' (length=0)
      6 => string ']' (length=1)
  2 => 
    array (size=7)
      0 => string '[fe-single /]' (length=13)
      1 => string '' (length=0)
      2 => string 'fe-single' (length=9)
      3 => string ' ' (length=1)
      4 => string '/' (length=1)
      5 => string '' (length=0)
      6 => string '' (length=0)
  3 => 
    array (size=7)
      0 => string '[fe-single title="foobar" name="janedoe"]Header Content[/fe-single]' (length=71)
      1 => string '' (length=0)
      2 => string 'fe-single' (length=9)
      3 => string ' title="foobar" name="janedoe"' (length=34)
      4 => string '' (length=0)
      5 => string 'Header Content' (length=14)
      6 => string '' (length=0)

Let’s see what’s up here.

In each array, you have 6 elements, each one partially deconstructing
the shortcode.

Element 0 is the completely match (as usual).

Element 1 is ‘[‘ if the shortcode is escaped. This is paired with element 6, which is the matching closing escape character.

Element 2 is the shortcode.

Element 3 is a string with the attributes. Look at the fourth match to see this.

Element 4 is “/” if it’s a self-closing shortcode. See the third match to see this.

Element 5 is the content that’s wrapped by the shortcode.

Parsing the Attributes

The shortcode_parse_atts() (SPA() from now on) parse element 3 and returns
an array of attributes.

The code’s hacked to throw and exception when it finds an
attribute.

function fe_set_ids($text) {
    $text = stripcslashes($text);
    $regex = get_shortcode_regex();
    $count = preg_match_all( '/'.$regex.'/s', $text, $matches, PREG_SET_ORDER );
    if ($count==0) return addslashes($text);

    foreach($matches as $m) {
        $atts = shortcode_parse_atts( $m[3] );
        if ($atts) {
            throw new Exception();
        }
    }

    return addslashes($text);
}
add_filter( 'content_save_pre', 'fe_set_ids' );

It looked like the text coming in had C-style escapes, so
I used stripcslashes() to remove them. That also meant
that I needed to re-introduce them before returning the text.

And the dump includes this for $atts:

array (size=2)
  'title' => string 'foobar' (length=6)
  'name' => string 'janedoe' (length=7)

If you don’t stripcslashes() to the input, the regex won’t
match because you have backslashes in front of
the doublequote character: title=”foobar”

So, there you have it. The rest of the code isn’t shown
here, but it performs a simple search-and-replace on
shortcodes that don’t have an “id=’some-randome-value'” attribute,
and adds the attribute.

Once that’s done, it’s possible to programmatically refer
to a specific shortcode in a post, and manipulate it.