Author Topic: Incorect Unicode characters in IceTV data  (Read 1330 times)

Offline prl

  • Guru
  • *****
  • Posts: 3154
    • View Profile
Incorect Unicode characters in IceTV data
« on: March 11, 2016, 04:23:36 PM »
In Re: ICE causes rebooting on 7100+ recorders beginning today:
The ABC guide data comes direct from ABC and sometimes they include weird characters that cause Topfield PVRs problems. Previously it was only the old 7100 (not 7100plus) that had the rebooting problem when unsafe characters crept in to the guide.

We are thinking about how best to verify the data before it gets sent to PVRs.

I'm seeing similar problems with incorrect Unicode characters in the IceGuide for the Beyonwiz T series, but not restricted to ABC data.These problems don't crash the T series, but they look a bit ugly.

I first noticed it in the title of "The House That £100K Built" (7TWO), which is rendered in the EPG as "The House That ¬£100K Built".

There's a good reason for that, because the JSON data for the title in the EPG response is: "The House That \u00C2\u00A3100K Built", which is in fact the string "The House That ¬£100K Built". "\u00C2" is being correctly displayed by the Beyonwiz as ¬, but it shouldn't be there.

There are similar problems in the Jonathan Creek episode Time Waits for Norman, where the synopsis, which should read "Antonia's husband's" is encoded as "Antonia\u00E2\u0080\u0099s husband\u00E2\u0080\u0099s" and displayed as "Antonia‚s husband‚s". "\u00E2" is being correctly displayed as "‚", "\u0080" and "\u0099" both are being correctly displayed as empty strings, because "\u0080" and "\u0099" are both "Unused Control Character". None of the Unicode characters are correct. Either  "\u0027" (ASCII single quote) or, at a stretch, "\u2018" (Left single quotation mark), though I don't know whether the T series fonts have that character (it may be mapped to "\u0027" before display).

There were similar incorrect "apostrophes" in:
"title":"Sonic Boom","subtitle":"Robot Battle Royal/it Wasn\u00E2\u0080\u0099t Me It Was The One Armed Hedgehog" (twice)
"title":"[R] Dr. Quinn, Medicine Woman","subtitle":"Cooper vs. Quinn - Part 2","desc":"Ethan\u00E2\u0080\u0099s new position ... wails of Myra and Horace\u00E2\u0080\u0099s baby ..."
"title":"[R] Touched By An Angel","subtitle":"The Face on the Barroom Floor","desc":"... when Everett\u00E2\u0080\u0099s millionaire father ..."
"title":"[R] Frasier","subtitle":"Something Borrowed, Someone Blue - Part 1","desc":"Daphne\u00E2\u0080\u0099s having ..."
"title":"[R] Movie: William & Catherine: A Royal Romance","subtitle":"","desc":"... William\u00E2\u0080\u0099s service"
... and many, many more.

There were similar UK pound sign errors in:
"title":"The Cube","subtitle":"","desc":"Contestants battle their way through a series of tricky games inside the Cube aiming for the \u00C2\u00A3250,000 prize"

Some others:
"title":"[R] Borgen","subtitle":"The Art Of The Possible",... "credits":{"actors":[...,{"name":"Johan Philip Asb\u00C3\u00A6k"},...]} (Johan Philip Asb√¶k: should be Johan Philip Asbśk)
"title":"[R] Movie: For the Good Of Others", ... "credits":{... "actors":[{...{"name":"Bel\u00C3\u00A9n Rueda"} ...]} (Bel√©n Rueda: should be Belťn Rueda)
"title":"The Bridge","subtitle":"Series 3, Episode 10","desc":"... but the question is if they will make it in time\u00E2\u0080\u00A6 " ("time‚Unused Control Character¶, presumably it should be "time?")

Getting "funny characters" like currency signs, accented characters and ligatures wrong is one thing, but surely we should be able to get apostrophes and question marks that aren't mangled?
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3 & T4 for testing

Offline Dave at IceTV

  • Guru
  • *****
  • Posts: 1258
    • View Profile
    • IceTV Knowledgebase Articles
Re: Incorect Unicode characters in IceTV data
« Reply #1 on: March 11, 2016, 07:05:36 PM »
presumably it should be "time?"

On the DP Beyonwiz it shows up as: "make it in time"
On the website it shows up as: "make it in time..."
In the database it shows up as: "make it in timeÖ "
In a spell checker it shows up as: "make it in time<?><?><?>"

PS I've removed it now.
« Last Edit: March 11, 2016, 07:21:36 PM by Dave at IceTV »
cheers

Dave
Customer Service

Offline prl

  • Guru
  • *****
  • Posts: 3154
    • View Profile
Re: Incorect Unicode characters in IceTV data
« Reply #2 on: March 11, 2016, 07:23:08 PM »
presumably it should be "time?"

On the DP Beyonwiz it shows up as: "make it in time"
On the website it shows up as: "make it in time..."
In the database it shows up as: "make it in timeÖ "

The database has "\u2026" (Horizontal ellipsis), then?

Offline Dave at IceTV

  • Guru
  • *****
  • Posts: 1258
    • View Profile
    • IceTV Knowledgebase Articles
Re: Incorect Unicode characters in IceTV data
« Reply #3 on: March 11, 2016, 08:10:14 PM »
presumably it should be "time?"

On the DP Beyonwiz it shows up as: "make it in time"
On the website it shows up as: "make it in time..."
In the database it shows up as: "make it in timeÖ "

The database has "\u2026" (Horizontal ellipsis), then?

I'm sure it was 3 separate characters. But it was supposed to be a single full stop. I say "was" because I have replaced it with a full stop now.


Share via facebook Share via twitter