diff options
| author | John MacFarlane <jgm@berkeley.edu> | 2019-11-11 12:38:43 -0800 | 
|---|---|---|
| committer | John MacFarlane <jgm@berkeley.edu> | 2019-11-11 12:38:43 -0800 | 
| commit | d4711bb865a17dcefb3b0907c0d452ef49c33c16 (patch) | |
| tree | aa58c04ccffb74b7c7368dd834e48db00b02146e /test | |
| parent | ca83398c7aed70a73b010a6ce9366bac90b7c32d (diff) | |
Updaet spec.txt.
Diffstat (limited to 'test')
| -rw-r--r-- | test/spec.txt | 730 | 
1 files changed, 370 insertions, 360 deletions
| diff --git a/test/spec.txt b/test/spec.txt index a09394e..1197d1b 100644 --- a/test/spec.txt +++ b/test/spec.txt @@ -326,6 +326,9 @@ A [space](@) is `U+0020`.  A [non-whitespace character](@) is any character  that is not a [whitespace character]. +An [ASCII control character](@) is a character between `U+0000–1F` (both +including) or `U+007F`. +  An [ASCII punctuation character](@)  is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,  `*`, `+`, `,`, `-`, `.`, `/` (U+0021–2F),  @@ -478,6 +481,347 @@ bar  For security reasons, the Unicode character `U+0000` must be replaced  with the REPLACEMENT CHARACTER (`U+FFFD`). + +## Backslash escapes + +Any ASCII punctuation character may be backslash-escaped: + +```````````````````````````````` example +\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ +. +<p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> +```````````````````````````````` + + +Backslashes before other characters are treated as literal +backslashes: + +```````````````````````````````` example +\→\A\a\ \3\φ\« +. +<p>\→\A\a\ \3\φ\«</p> +```````````````````````````````` + + +Escaped characters are treated as regular characters and do +not have their usual Markdown meanings: + +```````````````````````````````` example +\*not emphasized* +\<br/> not a tag +\[not a link](/foo) +\`not code` +1\. not a list +\* not a list +\# not a heading +\[foo]: /url "not a reference" +\ö not a character entity +. +<p>*not emphasized* +<br/> not a tag +[not a link](/foo) +`not code` +1. not a list +* not a list +# not a heading +[foo]: /url "not a reference" +&ouml; not a character entity</p> +```````````````````````````````` + + +If a backslash is itself escaped, the following character is not: + +```````````````````````````````` example +\\*emphasis* +. +<p>\<em>emphasis</em></p> +```````````````````````````````` + + +A backslash at the end of the line is a [hard line break]: + +```````````````````````````````` example +foo\ +bar +. +<p>foo<br /> +bar</p> +```````````````````````````````` + + +Backslash escapes do not work in code blocks, code spans, autolinks, or +raw HTML: + +```````````````````````````````` example +`` \[\` `` +. +<p><code>\[\`</code></p> +```````````````````````````````` + + +```````````````````````````````` example +    \[\] +. +<pre><code>\[\] +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +~~~ +\[\] +~~~ +. +<pre><code>\[\] +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +<http://example.com?find=\*> +. +<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<a href="/bar\/)"> +. +<a href="/bar\/)"> +```````````````````````````````` + + +But they work in all other contexts, including URLs and link titles, +link references, and [info strings] in [fenced code blocks]: + +```````````````````````````````` example +[foo](/bar\* "ti\*tle") +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo] + +[foo]: /bar\* "ti\*tle" +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +``` foo\+bar +foo +``` +. +<pre><code class="language-foo+bar">foo +</code></pre> +```````````````````````````````` + + +## Entity and numeric character references + +Valid HTML entity references and numeric character references +can be used in place of the corresponding Unicode character, +with the following exceptions: + +- Entity and character references are not recognized in code +  blocks and code spans. + +- Entity and character references cannot stand in place of +  special characters that define structural elements in +  CommonMark.  For example, although `*` can be used +  in place of a literal `*` character, `*` cannot replace +  `*` in emphasis delimiters, bullet list markers, or thematic +  breaks. + +Conforming CommonMark parsers need not store information about +whether a particular character was represented in the source +using a Unicode character or an entity reference. + +[Entity references](@) consist of `&` + any of the valid +HTML5 entity names + `;`. The +document <https://html.spec.whatwg.org/entities.json> +is used as an authoritative source for the valid entity +references and their corresponding code points. + +```````````````````````````````` example +  & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸ +. +<p>  & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸</p> +```````````````````````````````` + + +[Decimal numeric character +references](@) +consist of `&#` + a string of 1--7 arabic digits + `;`. A +numeric character reference is parsed as the corresponding +Unicode character. Invalid Unicode code points will be replaced by +the REPLACEMENT CHARACTER (`U+FFFD`).  For security reasons, +the code point `U+0000` will also be replaced by `U+FFFD`. + +```````````````````````````````` example +# Ӓ Ϡ � +. +<p># Ӓ Ϡ �</p> +```````````````````````````````` + + +[Hexadecimal numeric character +references](@) consist of `&#` + +either `X` or `x` + a string of 1-6 hexadecimal digits + `;`. +They too are parsed as the corresponding Unicode character (this +time specified with a hexadecimal numeral instead of decimal). + +```````````````````````````````` example +" ആ ಫ +. +<p>" ആ ಫ</p> +```````````````````````````````` + + +Here are some nonentities: + +```````````````````````````````` example +  &x; &#; &#x; +� +&#abcdef0; +&ThisIsNotDefined; &hi?; +. +<p>&nbsp &x; &#; &#x; +&#87654321; +&#abcdef0; +&ThisIsNotDefined; &hi?;</p> +```````````````````````````````` + + +Although HTML5 does accept some entity references +without a trailing semicolon (such as `©`), these are not +recognized here, because it makes the grammar too ambiguous: + +```````````````````````````````` example +© +. +<p>&copy</p> +```````````````````````````````` + + +Strings that are not on the list of HTML5 named entities are not +recognized as entity references either: + +```````````````````````````````` example +&MadeUpEntity; +. +<p>&MadeUpEntity;</p> +```````````````````````````````` + + +Entity and numeric character references are recognized in any +context besides code spans or code blocks, including +URLs, [link titles], and [fenced code block][] [info strings]: + +```````````````````````````````` example +<a href="öö.html"> +. +<a href="öö.html"> +```````````````````````````````` + + +```````````````````````````````` example +[foo](/föö "föö") +. +<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo] + +[foo]: /föö "föö" +. +<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +``` föö +foo +``` +. +<pre><code class="language-föö">foo +</code></pre> +```````````````````````````````` + + +Entity and numeric character references are treated as literal +text in code spans and code blocks: + +```````````````````````````````` example +`föö` +. +<p><code>f&ouml;&ouml;</code></p> +```````````````````````````````` + + +```````````````````````````````` example +    föfö +. +<pre><code>f&ouml;f&ouml; +</code></pre> +```````````````````````````````` + + +Entity and numeric character references cannot be used +in place of symbols indicating structure in CommonMark +documents. + +```````````````````````````````` example +*foo* +*foo* +. +<p>*foo* +<em>foo</em></p> +```````````````````````````````` + +```````````````````````````````` example +* foo + +* foo +. +<p>* foo</p> +<ul> +<li>foo</li> +</ul> +```````````````````````````````` + +```````````````````````````````` example +foo

bar +. +<p>foo + +bar</p> +```````````````````````````````` + +```````````````````````````````` example +	foo +. +<p>→foo</p> +```````````````````````````````` + + +```````````````````````````````` example +[a](url "tit") +. +<p>[a](url "tit")</p> +```````````````````````````````` + + +  # Blocks and inlines  We can think of a document as a sequence of @@ -2045,7 +2389,7 @@ need not match the start tag).  **End condition:** line contains the string `?>`.  4.  **Start condition:** line begins with the string `<!` -followed by an uppercase ASCII letter.\ +followed by an ASCII letter.\  **End condition:** line contains the character `>`.  5.  **Start condition:**  line begins with the string @@ -5506,345 +5850,6 @@ Thus, for example, in  backtick. -## Backslash escapes - -Any ASCII punctuation character may be backslash-escaped: - -```````````````````````````````` example -\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ -. -<p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> -```````````````````````````````` - - -Backslashes before other characters are treated as literal -backslashes: - -```````````````````````````````` example -\→\A\a\ \3\φ\« -. -<p>\→\A\a\ \3\φ\«</p> -```````````````````````````````` - - -Escaped characters are treated as regular characters and do -not have their usual Markdown meanings: - -```````````````````````````````` example -\*not emphasized* -\<br/> not a tag -\[not a link](/foo) -\`not code` -1\. not a list -\* not a list -\# not a heading -\[foo]: /url "not a reference" -\ö not a character entity -. -<p>*not emphasized* -<br/> not a tag -[not a link](/foo) -`not code` -1. not a list -* not a list -# not a heading -[foo]: /url "not a reference" -&ouml; not a character entity</p> -```````````````````````````````` - - -If a backslash is itself escaped, the following character is not: - -```````````````````````````````` example -\\*emphasis* -. -<p>\<em>emphasis</em></p> -```````````````````````````````` - - -A backslash at the end of the line is a [hard line break]: - -```````````````````````````````` example -foo\ -bar -. -<p>foo<br /> -bar</p> -```````````````````````````````` - - -Backslash escapes do not work in code blocks, code spans, autolinks, or -raw HTML: - -```````````````````````````````` example -`` \[\` `` -. -<p><code>\[\`</code></p> -```````````````````````````````` - - -```````````````````````````````` example -    \[\] -. -<pre><code>\[\] -</code></pre> -```````````````````````````````` - - -```````````````````````````````` example -~~~ -\[\] -~~~ -. -<pre><code>\[\] -</code></pre> -```````````````````````````````` - - -```````````````````````````````` example -<http://example.com?find=\*> -. -<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> -```````````````````````````````` - - -```````````````````````````````` example -<a href="/bar\/)"> -. -<a href="/bar\/)"> -```````````````````````````````` - - -But they work in all other contexts, including URLs and link titles, -link references, and [info strings] in [fenced code blocks]: - -```````````````````````````````` example -[foo](/bar\* "ti\*tle") -. -<p><a href="/bar*" title="ti*tle">foo</a></p> -```````````````````````````````` - - -```````````````````````````````` example -[foo] - -[foo]: /bar\* "ti\*tle" -. -<p><a href="/bar*" title="ti*tle">foo</a></p> -```````````````````````````````` - - -```````````````````````````````` example -``` foo\+bar -foo -``` -. -<pre><code class="language-foo+bar">foo -</code></pre> -```````````````````````````````` - - - -## Entity and numeric character references - -Valid HTML entity references and numeric character references -can be used in place of the corresponding Unicode character, -with the following exceptions: - -- Entity and character references are not recognized in code -  blocks and code spans. - -- Entity and character references cannot stand in place of -  special characters that define structural elements in -  CommonMark.  For example, although `*` can be used -  in place of a literal `*` character, `*` cannot replace -  `*` in emphasis delimiters, bullet list markers, or thematic -  breaks. - -Conforming CommonMark parsers need not store information about -whether a particular character was represented in the source -using a Unicode character or an entity reference. - -[Entity references](@) consist of `&` + any of the valid -HTML5 entity names + `;`. The -document <https://html.spec.whatwg.org/multipage/entities.json> -is used as an authoritative source for the valid entity -references and their corresponding code points. - -```````````````````````````````` example -  & © Æ Ď -¾ ℋ ⅆ -∲ ≧̸ -. -<p>  & © Æ Ď -¾ ℋ ⅆ -∲ ≧̸</p> -```````````````````````````````` - - -[Decimal numeric character -references](@) -consist of `&#` + a string of 1--7 arabic digits + `;`. A -numeric character reference is parsed as the corresponding -Unicode character. Invalid Unicode code points will be replaced by -the REPLACEMENT CHARACTER (`U+FFFD`).  For security reasons, -the code point `U+0000` will also be replaced by `U+FFFD`. - -```````````````````````````````` example -# Ӓ Ϡ � -. -<p># Ӓ Ϡ �</p> -```````````````````````````````` - - -[Hexadecimal numeric character -references](@) consist of `&#` + -either `X` or `x` + a string of 1-6 hexadecimal digits + `;`. -They too are parsed as the corresponding Unicode character (this -time specified with a hexadecimal numeral instead of decimal). - -```````````````````````````````` example -" ആ ಫ -. -<p>" ആ ಫ</p> -```````````````````````````````` - - -Here are some nonentities: - -```````````````````````````````` example -  &x; &#; &#x; -� -&#abcdef0; -&ThisIsNotDefined; &hi?; -. -<p>&nbsp &x; &#; &#x; -&#987654321; -&#abcdef0; -&ThisIsNotDefined; &hi?;</p> -```````````````````````````````` - - -Although HTML5 does accept some entity references -without a trailing semicolon (such as `©`), these are not -recognized here, because it makes the grammar too ambiguous: - -```````````````````````````````` example -© -. -<p>&copy</p> -```````````````````````````````` - - -Strings that are not on the list of HTML5 named entities are not -recognized as entity references either: - -```````````````````````````````` example -&MadeUpEntity; -. -<p>&MadeUpEntity;</p> -```````````````````````````````` - - -Entity and numeric character references are recognized in any -context besides code spans or code blocks, including -URLs, [link titles], and [fenced code block][] [info strings]: - -```````````````````````````````` example -<a href="öö.html"> -. -<a href="öö.html"> -```````````````````````````````` - - -```````````````````````````````` example -[foo](/föö "föö") -. -<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> -```````````````````````````````` - - -```````````````````````````````` example -[foo] - -[foo]: /föö "föö" -. -<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> -```````````````````````````````` - - -```````````````````````````````` example -``` föö -foo -``` -. -<pre><code class="language-föö">foo -</code></pre> -```````````````````````````````` - - -Entity and numeric character references are treated as literal -text in code spans and code blocks: - -```````````````````````````````` example -`föö` -. -<p><code>f&ouml;&ouml;</code></p> -```````````````````````````````` - - -```````````````````````````````` example -    föfö -. -<pre><code>f&ouml;f&ouml; -</code></pre> -```````````````````````````````` - - -Entity and numeric character references cannot be used -in place of symbols indicating structure in CommonMark -documents. - -```````````````````````````````` example -*foo* -*foo* -. -<p>*foo* -<em>foo</em></p> -```````````````````````````````` - -```````````````````````````````` example -* foo - -* foo -. -<p>* foo</p> -<ul> -<li>foo</li> -</ul> -```````````````````````````````` - -```````````````````````````````` example -foo

bar -. -<p>foo - -bar</p> -```````````````````````````````` - -```````````````````````````````` example -	foo -. -<p>→foo</p> -```````````````````````````````` - - -```````````````````````````````` example -[a](url "tit") -. -<p>[a](url "tit")</p> -```````````````````````````````` -  ## Code spans @@ -7461,10 +7466,11 @@ A [link destination](@) consists of either    closing `>` that contains no line breaks or unescaped    `<` or `>` characters, or -- a nonempty sequence of characters that does not start with -  `<`, does not include ASCII space or control characters, and -  includes parentheses only if (a) they are backslash-escaped or -  (b) they are part of a balanced pair of unescaped parentheses. +- a nonempty sequence of characters that does not start with `<`, +  does not include [ASCII control characters][ASCII control character] +  or [whitespace][], and includes parentheses only if (a) they are +  backslash-escaped or (b) they are part of a balanced pair of +  unescaped parentheses.    (Implementations may impose limits on parentheses nesting to    avoid performance issues, but at least three levels of nesting    should be supported.) @@ -7616,6 +7622,13 @@ However, if you have unbalanced parentheses, you need to escape or use the  `<...>` form:  ```````````````````````````````` example +[link](foo(and(bar)) +. +<p>[link](foo(and(bar))</p> +```````````````````````````````` + + +```````````````````````````````` example  [link](foo\(and\(bar\))  .  <p><a href="foo(and(bar)">link</a></p> @@ -7923,9 +7936,8 @@ perform the *Unicode case fold*, strip leading and trailing  matching reference link definitions, the one that comes first in the  document is used.  (It is desirable in such cases to emit a warning.) -The contents of the first link label are parsed as inlines, which are -used as the link's text.  The link's URI and title are provided by the -matching [link reference definition]. +The link's URI and title are provided by the matching [link +reference definition].  Here is a simple example: @@ -8018,11 +8030,11 @@ emphasis grouping:  ```````````````````````````````` example -[foo *bar][ref] +[foo *bar][ref]*  [ref]: /uri  . -<p><a href="/uri">foo *bar</a></p> +<p><a href="/uri">foo *bar</a>*</p>  ```````````````````````````````` @@ -8070,11 +8082,11 @@ Matching is case-insensitive:  Unicode case fold is used:  ```````````````````````````````` example -[Толпой][Толпой] is a Russian word. +[ẞ] -[ТОЛПОЙ]: /url +[SS]: /url  . -<p><a href="/url">Толпой</a> is a Russian word.</p> +<p><a href="/url">ẞ</a></p>  ```````````````````````````````` @@ -8707,9 +8719,9 @@ a link to the URI, with the URI as the link's label.  An [absolute URI](@),  for these purposes, consists of a [scheme] followed by a colon (`:`) -followed by zero or more characters other than ASCII -[whitespace] and control characters, `<`, and `>`.  If -the URI includes these characters, they must be percent-encoded +followed by zero or more characters other [ASCII control +characters][ASCII control character] or [whitespace][] , `<`, and `>`. +If the URI includes these characters, they must be percent-encoded  (e.g. `%20` for a space).  For purposes of this spec, a [scheme](@) is any sequence @@ -8942,10 +8954,8 @@ consists of the string `<?`, a string  of characters not including the string `?>`, and the string  `?>`. -A [declaration](@) consists of the -string `<!`, a name consisting of one or more uppercase ASCII letters, -[whitespace], a string of characters not including the -character `>`, and the character `>`. +A [declaration](@) consists of the string `<!`, an ASCII letter, zero or more +characters not including the character `>`, and the character `>`.  A [CDATA section](@) consists of  the string `<