IceTV Forum

IceTV Guide for IceTV enabled PVRs => XMLTV (General) => Topic started by: DeltaMikeCharlie on February 19, 2018, 05:12:14 PM

Title: Strange JSON escape characters in EPG data.
Post by: DeltaMikeCharlie on February 19, 2018, 05:12:14 PM
I have noticed a difference in the "show" description text depending on if the selected format is JSON or XML.

Here is an XML example.  Note the quotes around the first line of the description.

--SNIP--
  <desc lang="en">"I wanted to know what their plan was. I was their plan!"

The Doctor has been summoned by an old friend, but in the Cabinet War Rooms far below the streets of blitz-torn London, it's his oldest enemy he finds waiting for him...

The Daleks are back - but can Winston Churchill be in league with them? </desc>
  <credits>
--SNIP--

However, the JSON version has a series of escape characters that do not appear to correspond to the character shown in the XML version.

--SNIP--
    "desc": "\u00E2\u0080\u009CI wanted to know what their plan was. I was their plan!\u00E2\u0080\u009D\r\n\r\nThe Doctor has been summoned by an old friend, but in the Cabinet War Rooms far below the streets of blitz-torn London, it's his oldest enemy he finds waiting for him...\r\n\r\nThe Daleks are back - but can Winston Churchill be in league with them? ",
--SNIP--


"\u00E2\u0080\u009C" actually appears to be the UTF-8 representation of "LEFT DOUBLE QUOTATION MARK " character which is "0x201C" in hex.

I have encountered a number of web sites suggesting that the correct JSON encoding should actually be "\u201C".

https://www.fileformat.info/info/unicode/char/201c/index.htm
http://graphemica.com/%E2%80%9C

I have also seen this occur with "\u00E2\u0080\u009D" (right double quote "\u201D") and "\u00E2\u0080\u0093" (en dash "\u2013").  Perhaps there are others.

I'm happy to be proven wrong, but I thought that files containing JSON were already supposed to be in Unicode and that only reserved characters (such as quotes, commas, etc) needed to be escaped.

It appears that the process that is creating the JSON output is reading the source as ASCII text and not UTF-8 text and converting each byte of the Unicode string individually and not as a combined entity.

I am using cURL with [--header "Accept: application/json"].  I was willing to concede that perhaps cURL is mangling the data on the way in, however, the trace file shows the data arriving pre-mangled.

Is there something wrong with my request or is the server genuinely serving up these seemingly erroneously escaped characters?
Title: Re: Strange JSON escape characters in EPG data.
Post by: prl on February 19, 2018, 09:44:12 PM
I sent a long and detailed email about this and a related problem to Daniel Hall a few weeks ago. I haven't heard back.
Title: Re: Strange JSON escape characters in EPG data.
Post by: DeltaMikeCharlie on February 20, 2018, 06:13:42 AM
Quote from: prl on February 19, 2018, 09:44:12 PM
I sent a long and detailed email about this and a related problem to Daniel Hall a few weeks ago. I haven't heard back.
Thanks prl.
Title: Re: Strange JSON escape characters in EPG data.
Post by: Daniel Hall at IceTV on February 20, 2018, 03:16:58 PM
This one is being looked into.
Title: Re: Strange JSON escape characters in EPG data.
Post by: prl on February 20, 2018, 04:47:37 PM
Thanks, Daniel.