wp_kses_hair( string $attr, string[] $allowed_protocols ): array<string,

Given a string of HTML attributes and values, parse into a structured attribute list.

Description

This function performs a number of transformations while parsing attribute strings:

  • It normalizes attribute values and surrounds them with double quotes.
  • It normalizes HTML character references inside attribute values.
  • It removes “bad” URL protocols from attribute values.

Otherwise this reads the attributes as if they were part of an HTML tag. It performs these transformations to lower the risk of mis-parsing down the line and to perform URL sanitization in line with the rest of the kses subsystem. Importantly, it does not decode the attribute values, meaning that special HTML syntax characters will be left with character references in the value property.

Example:

$attrs = wp_kses_hair( 'class="is-wide" inert data-lazy=\'&lt;img&#00062\' =/🐮=/' );
$attrs === array(
    'class'     => array( 'name' => 'class', 'value' => 'is-wide', 'whole' => 'class="is-wide"', 'vless' => 'n' ),
    'inert'     => array( 'name' => 'inert', 'value' => '', 'whole' => 'inert', 'vless' => 'y' ),
    'data-lazy' => array( 'name' => 'data-lazy', 'value' => '&lt;img&gt;', 'whole' => 'data-lazy="&lt;img&gt;"', 'vless' => 'n' ),
    '='         => array( 'name' => '=', 'value' => '', 'whole' => '=', 'vless' => 'y' ),
    '🐮'        => array( 'name' => '🐮', 'value' => '/', 'whole' => '🐮="/"', 'vless' => 'n' ),
);

Parameters

$attrstringrequired
Attribute list from HTML element to closing HTML element tag.
$allowed_protocolsstring[]required
Array of allowed URL protocols.

Return

array<string, array{name: string, value: string, whole: string, vless: 'y'|'n'}> Array of attribute information after parsing.

Source

function wp_kses_hair( $attr, $allowed_protocols ) {
	$attributes = array();
	$uris       = wp_kses_uri_attributes();

	$processor = new WP_HTML_Tag_Processor( "<wp {$attr}>" );
	$processor->next_token();

	$attribute_names = $processor->get_attribute_names_with_prefix( '' );
	if ( null === $attribute_names || 0 === count( $attribute_names ) ) {
		return $attributes;
	}

	$syntax_characters = array(
		'&' => '&amp;',
		'<' => '&lt;',
		'>' => '&gt;',
		"'" => '&apos;',
		'"' => '&quot;',
	);

	foreach ( $attribute_names as $name ) {
		$value   = $processor->get_attribute( $name );
		$is_bool = true === $value;
		if ( is_string( $value ) && in_array( $name, $uris, true ) ) {
			$value = wp_kses_bad_protocol( $value, $allowed_protocols );
		}

		// Reconstruct and normalize the attribute value.
		$recoded = $is_bool ? '' : strtr( $value, $syntax_characters );
		$whole   = $is_bool ? $name : "{$name}=\"{$recoded}\"";

		$attributes[ $name ] = array(
			'name'  => $name,
			'value' => $recoded,
			'whole' => $whole,
			'vless' => $is_bool ? 'y' : 'n',
		);
	}

	return $attributes;
}

Changelog

VersionDescription
7.0.0Reliably parses HTML via the HTML API.
1.0.0Introduced.

User Contributed Notes

You must log in before being able to contribute a note or feedback.