Skip to content

Fixes XML Unescaping#362

Merged
stleary merged 1 commit into
stleary:masterfrom
johnjaylward:FixXMLUnescape
Aug 27, 2017
Merged

Fixes XML Unescaping#362
stleary merged 1 commit into
stleary:masterfrom
johnjaylward:FixXMLUnescape

Conversation

@johnjaylward
Copy link
Copy Markdown
Contributor

@johnjaylward johnjaylward commented Aug 19, 2017

Key Changes:

Fixes #361

  • Removes unescape from the XML class calls
  • fixes bug with unescape method
  • moves unescape logic into the XMLTokener class for more consistency

What problem does this code solve?
Bug fix for XML Escaping that was included in a2d3b59 as part of a fix supporting unicode entities (i.e. &#xA which maps to a new line character)

The original fix was causing XML string to be unescaped twice which was also causing the bug as mentioned in #361.

Risks
Low/none. This is a bug fix to correct incorrect parsing of XML entities.

Changes to the API?
One new package level method was added to XMLTokener that handles all valid XML entity unescaping.

Will this require a new release?
Yes. The bug affects processing XML documents only, however, the bug prevents valid XML files from being read.

Should the documentation be updated?
No

Does it break the unit tests?
No. All unit tests should pass. New unit tests were added in order to test the changes and ensure that entities are only unencoded once. See stleary/JSON-Java-unit-test#79

Was any code refactored in this commit?
Yes. The actual implementation for the unescapeing was moved from the XML class to the XMLTokener class.

XMLTokener was already unescaping named entities (i.e. &lt; to < and &amp; to &) however it was not converting unicode entities back to their codepoints (i.e &#34 to " or &#x34 to 4). To compensate for that I originally created the XML.unescape method and incorrectly injected it into the XML processing.

This PR corrects that by moving all the entity handling into the XMLTokener class and removing the unescape injection that I placed into the processing last time.

Review status
APPROVED Starting 3 day window for comments.

* Removes unescape from the XML class calls
* fixes bug with unescape method
* moves unescape logic into the XMLTokener class for more consistency
@stleary
Copy link
Copy Markdown
Owner

stleary commented Aug 23, 2017

Sorry for the delay, back from eclipse, approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Xml.java doesn't cope with encoded ampersand followed by semicolon

2 participants