I.7 The Unicode Bidirectional Algorithm and Office Open XML

I.7 The Unicode Bidirectional Algorithm and Office Open XML

Office Open XML provides explicit markup to specify the classification of character runs, as well as to apply embedding and override settings to text. Therefore, it is important to understand how to represent the Unicode Bidirectional Algorithm in Open XML.

Before displaying text contained within WordprocessingML documents, a consumer must resolve the classification of characters in each line using the Unicode Bidirectional Algorithm (http://www.unicode.org/reports/tr9/). Specifically, the sections that detail resolving weak types and resolving neutral types must be followed. After applying this algorithm, the higher-level protocol specified by the rtl element is used to set the directionality of all characters that are resolved as R, AN and EN. Note that AN and EN that remain after applying W7 in that document will be in the right-to-left context.

This table lists the recommended way to interpret Unicode control characters in Office Open XML:

Unicode

Equivalent markup

Comment

RLO (U+202E)

LRO (U+202D)

RLE (U+202B)

<w:bdo
w:val="rtl">
<w:bdo
w:val="ltr">
<w:dir
w:val="rtl">

These Unicode characters have equivalent should be represented by the equivalent markup within Office Open XML documents. The behavior resulting from directly embedding these discouraged characters within Office Open XML text is unspecified.

LRE (U+202A)

<w:dir
w:val="ltr">

PDF (U+202C)

</w:bdo> or
</w:dir>

RLM (U+200F) and LRM (U+200E)

None

These characters affect resolving the surrounding neutral and weak types while classifying text as described above. Once the characters are resolved, R runs should be tagged with <w:rtl w:val=”1”/>. It is up to the implementer to preserve these control characters in the text.

ZWJ (U+200D) and ZWNJ (U+200C)

None

These characters should be preserved as they impact the shaping process at text rendering time.

Example #1:

The following logical text fragment:

he said:“RLEI LEAVE FOR united states TOMORROWPDF”.

would be represented using the following Office Open XML markup:

<w:r>
  <w:t>he said: “</w:t>
</w:r>
<w:dir w:val="rtl">
  <w:r>
    <w:rPr>
      <w:rtl />
    </w:rPr>
    <w:t xml:space="preserve">I LEAVE FOR </w:t>
  </w:r>
  <w:r>
    <w:t>united states</w:t>
  </w:r>
  <w:r>
    <w:rPr>
      <w:rtl />
    </w:rPr>
    <w:t xml:space="preserve"> TOMORROW</w:t>
  </w:r>
</w:dir>
<w:r>
  <w:t>”.</w:t>
</w:r>

Example #2:

The following logical text fragment:

product number: RLOad-326D-FGPDF

would be represented using the following Office Open XML markup:

<w:r>
  <w:t>product number: “</w:t>
</w:r>
<w:bdo w:val=”rtl”>
  <w:r>
    <w:t>ad-326D-FG</w:t>
  </w:r>
</w:bdo>

This generates the following text layout:

part number: GF-D623-da.

Example #3:

The following logical text fragment:

FIRSTLRM,SECONDLRM,THIRD

would be represented using the following Office Open XML markup:

<w:r>
  <w:rPr>
    <w:rtl />
  </w:rPr>
  <w:t>FIRST</w:t>
</w:r>
<w:r>
  <w:t>,</w:t>
</w:r>
<w:r>
  <w:rPr>
    <w:rtl />
  </w:rPr>
  <w:t>SECOND</w:t>
</w:r>
<w:r>
  <w:t>,</w:t>
</w:r>
<w:r>
  <w:rPr>
    <w:rtl />
  </w:rPr>
  <w:t>THIRD</w:t>
</w:r>

End informative Annex.

Last updated on