Incorect Unicode characters in IceTV data

Started by prl, March 11, 2016, 04:23:36 PM

Previous topic - Next topic

prl

In Re: ICE causes rebooting on 7100+ recorders beginning today:
Quote from: Dave at IceTV on March 10, 2016, 12:36:04 AM
The ABC guide data comes direct from ABC and sometimes they include weird characters that cause Topfield PVRs problems. Previously it was only the old 7100 (not 7100plus) that had the rebooting problem when unsafe characters crept in to the guide.

We are thinking about how best to verify the data before it gets sent to PVRs.

I'm seeing similar problems with incorrect Unicode characters in the IceGuide for the Beyonwiz T series, but not restricted to ABC data.These problems don't crash the T series, but they look a bit ugly.

I first noticed it in the title of "The House That £100K Built" (7TWO), which is rendered in the EPG as "The House That £100K Built".

There's a good reason for that, because the JSON data for the title in the EPG response is: "The House That \u00C2\u00A3100K Built", which is in fact the string "The House That £100K Built". "\u00C2" is being correctly displayed by the Beyonwiz as Â, but it shouldn't be there.

There are similar problems in the Jonathan Creek episode Time Waits for Norman, where the synopsis, which should read "Antonia's husband's" is encoded as "Antonia\u00E2\u0080\u0099s husband\u00E2\u0080\u0099s" and displayed as "Antoniaâs husbandâs". "\u00E2" is being correctly displayed as "â", "\u0080" and "\u0099" both are being correctly displayed as empty strings, because "\u0080" and "\u0099" are both "Unused Control Character". None of the Unicode characters are correct. Either  "\u0027" (ASCII single quote) or, at a stretch, "\u2018" (Left single quotation mark), though I don't know whether the T series fonts have that character (it may be mapped to "\u0027" before display).

There were similar incorrect "apostrophes" in:
"title":"Sonic Boom","subtitle":"Robot Battle Royal/it Wasn\u00E2\u0080\u0099t Me It Was The One Armed Hedgehog" (twice)
"title":"[R] Dr. Quinn, Medicine Woman","subtitle":"Cooper vs. Quinn - Part 2","desc":"Ethan\u00E2\u0080\u0099s new position ... wails of Myra and Horace\u00E2\u0080\u0099s baby ..."
"title":"[R] Touched By An Angel","subtitle":"The Face on the Barroom Floor","desc":"... when Everett\u00E2\u0080\u0099s millionaire father ..."
"title":"[R] Frasier","subtitle":"Something Borrowed, Someone Blue - Part 1","desc":"Daphne\u00E2\u0080\u0099s having ..."
"title":"[R] Movie: William & Catherine: A Royal Romance","subtitle":"","desc":"... William\u00E2\u0080\u0099s service"
... and many, many more.

There were similar UK pound sign errors in:
"title":"The Cube","subtitle":"","desc":"Contestants battle their way through a series of tricky games inside the Cube aiming for the \u00C2\u00A3250,000 prize"

Some others:
"title":"[R] Borgen","subtitle":"The Art Of The Possible",... "credits":{"actors":[...,{"name":"Johan Philip Asb\u00C3\u00A6k"},...]} (Johan Philip Asbæk: should be Johan Philip Asbæk)
"title":"[R] Movie: For the Good Of Others", ... "credits":{... "actors":[{...{"name":"Bel\u00C3\u00A9n Rueda"} ...]} (Belén Rueda: should be Belén Rueda)
"title":"The Bridge","subtitle":"Series 3, Episode 10","desc":"... but the question is if they will make it in time\u00E2\u0080\u00A6 " ("timeâUnused Control Character¦, presumably it should be "time?")

Getting "funny characters" like currency signs, accented characters and ligatures wrong is one thing, but surely we should be able to get apostrophes and question marks that aren't mangled?
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

Dave at IceTV

#1
Quote from: prl on March 11, 2016, 04:23:36 PM
presumably it should be "time?"

On the DP Beyonwiz it shows up as: "make it in time"
On the website it shows up as: "make it in time..."
In the database it shows up as: "make it in time... "
In a spell checker it shows up as: "make it in time<?><?><?>"

PS I've removed it now.
cheers

Dave
Customer Service

prl

Quote from: Dave at IceTV on March 11, 2016, 07:05:36 PM
Quote from: prl on March 11, 2016, 04:23:36 PM
presumably it should be "time?"

On the DP Beyonwiz it shows up as: "make it in time"
On the website it shows up as: "make it in time..."
In the database it shows up as: "make it in time... "

The database has "\u2026" (Horizontal ellipsis), then?
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

Dave at IceTV

Quote from: prl on March 11, 2016, 07:23:08 PM
Quote from: Dave at IceTV on March 11, 2016, 07:05:36 PM
Quote from: prl on March 11, 2016, 04:23:36 PM
presumably it should be "time?"

On the DP Beyonwiz it shows up as: "make it in time"
On the website it shows up as: "make it in time..."
In the database it shows up as: "make it in time... "

The database has "\u2026" (Horizontal ellipsis), then?

I'm sure it was 3 separate characters. But it was supposed to be a single full stop. I say "was" because I have replaced it with a full stop now.
cheers

Dave
Customer Service