Author Topic: Invalid Characters / Dodgy Characters  (Read 7907 times)

Offline BJReplay

  • Full Member
  • ***
  • Posts: 57
    • View Profile
Invalid Characters / Dodgy Characters
« on: April 11, 2005, 06:21:47 PM »
E.G. ABC Sydney & Melbourne (Channel 2 & 9), Australian Story for tonight (11 April) - ICETV Ep Num 35611 & 36672:
Description contains climbed Ball’s Pyramid and ends with waiting….

Other characters (eg é in café) are correctly escaped (é), but it looks like apostrophes amongst other things are being incorrectly handled.

Cheers

BJ

Offline Russell at IceTV

  • Guru
  • *****
  • Posts: 444
    • View Profile
Re: Invalid Characters / Dodgy Characters
« Reply #1 on: April 12, 2005, 04:13:09 PM »
Thanks for letting us know about this BJ, I've found the problem and it should be fixed very soon.

Russell

Offline tonymy01

  • Guru
  • *****
  • Posts: 740
    • View Profile
Re: Invalid Characters / Dodgy Characters
« Reply #2 on: July 05, 2005, 10:35:21 PM »
I notice it for tonight too for Rove "perhaps his polar opposite – highly acclaimed actor"..
edit: I guess this may actually be ok, and I should be talking to John (TED Author) to correctly interpret these when he builds out the ascii file format..
Regards
Tony
« Last Edit: January 01, 1970, 10:00:00 AM by tonymy01 »
Regards
Tony

Beyonwiz DP-S1 & Topfield 5K (using PerlTGD to upload ICE EPG/timers for the 5K, normal ICE interactive for the Wiz).

Offline Russell at IceTV

  • Guru
  • *****
  • Posts: 444
    • View Profile
Re: Invalid Characters / Dodgy Characters
« Reply #3 on: July 06, 2005, 07:39:15 AM »
Quote
I notice it for tonight too for Rove "perhaps his polar opposite – highly acclaimed actor"..
edit: I guess this may actually be ok, and I should be talking to John (TED Author) to correctly interpret these when he builds out the ascii file format..
Regards
Tony
Hi Tony,

Yes, in this case the ampersand code is an "en dash". The problem is what to do with characters that are in the upper half of the ascii range, such as characters from foreign languages, etc. We're converting all of them to ampersand escape sequences like the one you found, but some PVRs may not be displaying them correctly. As far as I know this is the only good way of handling these characters, but I'm not actually sure what the official word from XMLTV is on this subject. We're always open to suggestions though, if there's a method that's compatible with more PVRs.

Thanks,
Russell

Offline trapper

  • Newbie
  • *
  • Posts: 7
    • View Profile
Re: Invalid Characters / Dodgy Characters
« Reply #4 on: August 23, 2005, 03:45:40 PM »
One of the problems at the moment with the XML ICEguide data is that the conversion(s) to cater for extended charaters etc. is not happening in the correct sequence.

Take for example the following snippet from Aug 21:
Code: [Select]
A not-to-be missed analysis of the week’s political newsThis has resulted from 2 separate conversions done in the wrong order.

The apostrophe in weekend's was initially converted to <ampersand>#8217. But ampersands (as in Law & Order) also have to be converted to <ampersand>amp;

The problem in the above code snippet is that the apostrophe was converted before the ampersands. This has resulted in the ampersands which were introduced from the first conversion themselves being converted. So apostrophe becomes <ampersand>#8217 but then the ampersand there is further converted so we end up with &amp;#8217;

The problem with that is there is no already built HTML-decode which will fix it. It has to be purpose built.

Cheers...

EDIT: Just as a matter of interest I've found from my work on TED/S that the only conversion which seems necessary is the & to &amp;  Extended characters seem to display fine in HTML... at least in IE. Internet Explorer is actually very useful for checking XML files. It will report any parsing errors, highlighting where and what caused the error.
« Last Edit: January 01, 1970, 10:00:00 AM by trapper »

Offline Russell at IceTV

  • Guru
  • *****
  • Posts: 444
    • View Profile
Re: Invalid Characters / Dodgy Characters
« Reply #5 on: August 23, 2005, 04:14:20 PM »
Hi trapper,

Looking at the guide data for Brisbane on August 21st, I see the following XML data:

Code: [Select]
<programme start="20050820230000 +0000" stop="20050820234500 +0000" channel="15">
     <title lang="en">Insiders</title>
     <sub-title lang="en"></sub-title>
     <desc lang="en">A not-to-be missed analysis of the week’s political news, with interviews, discussion and analysis with Barrie Cassidy and guests.</desc>

Is it possible you're loading the data into another program that's then modifying the & to be &amp; ?

Try this URL and take a look at the source, then search for the show "Insiders" to see what our server is sending (you'll be prompted for your Ice User ID and Password):

http://www.icetv.com.au/cgi-bin/epg/iceguide.cgi?op=xmlguide&start_date=050821&end_date=050821

Let me know what you find, and if I've misunderstood your post.

Thanks,
Russell

Offline trapper

  • Newbie
  • *
  • Posts: 7
    • View Profile
Re: Invalid Characters / Dodgy Characters
« Reply #6 on: August 23, 2005, 05:09:44 PM »
Hi Russell... yep I did find a problem here.  :-[

I was wondering though why you use a mix of HTML4.0 entities and Unicode numbers. Eg, for apostrophe you use the HTML4.0 entity &apos; yet for the left quote you use the Unicode number 8216 rather than &lsquo;

Would it be possible to standardise on HTML4.0 entities?

Also, in the data for 24 August the word Mélodine has become M<ampersand>#3927;dine

Cheers...

Offline Russell at IceTV

  • Guru
  • *****
  • Posts: 444
    • View Profile
Re: Invalid Characters / Dodgy Characters
« Reply #7 on: August 23, 2005, 05:48:38 PM »
Hi trapper,

Glad to hear you found the problem.

We use the Unicode number for characters that are multibyte characters, such as the accented é character you mentioned.  But for simple ones like the apostrophe, we use the HTML entities.  The XML spec doesn't allow things like the é character to be left unquoted, and it's a lot easier to just convert them to their numeric value than to have to use a large lookup table for all the characters.  I'd have to check to be sure, but the left quote you mentioned was probably a "smart" left quote, or something similar, that wasn't a normal character, and thus was actually a multibyte character that got converted to Unicode.

Thanks for spotting the problem with Mélodine -- it was a typo actually.  I've fixed it, and it should now appear as M<ampersand>#233;lodine

Russell


Share via facebook Share via twitter

xx
TASK_MSG_INVALID_SERVICE

Started by IanL-S on General Discussions

7 Replies
1205 Views
Last post July 28, 2017, 05:46:27 PM
by IanL-S
xx
Is there a problem with IceTV Server - invalid service error messages abound

Started by IanL-S on General Discussions

6 Replies
1686 Views
Last post June 29, 2017, 10:13:26 AM
by JPP
xx
Invalid Channel Number on ABC 21

Started by fitzmooney on Smart Recording website and General questions

3 Replies
1379 Views
Last post February 16, 2016, 01:11:49 PM
by Dave at IceTV
xx
[Solved (sort of)] ice EPG invalid today (it's worked up to now)

Started by chopper on IceTV EPG Content

5 Replies
2066 Views
Last post November 25, 2014, 12:44:01 PM
by chopper