Main Menu

Missing Genre Descriptions

Started by DeltaMikeCharlie, July 10, 2023, 07:06:09 PM

Previous topic - Next topic

DeltaMikeCharlie

I have noticed that a significant number of programmes are missing genre descriptions that should have them.

screenshot00021.jpg

I checked on the IceTV web site, that the EPG clearly shows genre descriptions.

ICE Web Site.png

This is also reflected in the raw ICE EPG JSON feed.

        }, {
            "id": "176785103",
            "series_id": "9892",
            "episode_id": "216252",
            "channel_id": "38",
            "date": "2016",
            "season": "5",
            "episode-num": "4",
            "start": "2023-07-10T08:57:00+00:00",
            "stop": "2023-07-10T09:05:00+00:00",
            "title": "Shaun The Sheep",
            "subtitle": "Baa-d Hair Day",
            "desc": "The flock don't recognise Shaun when he loses one his most distinctive features. His old pal Bitzer helps him track down the lost thatch, but not before Shaun becomes very attached to one of the chickens",
            "icon": {
                "src": "http://images.icetv.com.au/ee37-9443-b346-38c3.jpg",
                "width": "831",
                "height": "467"
            },
            "category": [{
                    "name": "Children",
                    "eit": "0x50"
                }, {
                    "name": "Animation",
                    "eit": "0x55"
                }
            ],
            "language": "English",
            "country": "United Kingdom",
            "video": {
                "aspect": "16:9",
                "colour": "YES",
                "quality": "SDTV"
            },
            "previously-shown": {
                "start": "2016-01-13",
                "channel_id": ""
            },
            "subtitles": {
                "onscreen": "English"
            },
            "part_of_series": "Yes",
            "rating": "G",
            "external_ids": {
                "tvdb_series": "79890",
                "tvdb_episode": "5544592",
                "imdb_series": "tt0983983",
                "imdb_episode": "tt6226402"
            }
        }, {

I did some experimentation with XMLTV importing and I found that "Children's / Youth programs" was accepted, but that "Children" was not excepted.

Having a quick look at the tvheadend source code, it would seem that it is doing some text matching between the received genre description and the ETSI genre descriptions.

https://github.com/tvheadend/tvheadend/blob/14298acb6a8e3a83ed1091fab1f3a924077ddfea/src/epg.c#L1820C26-L1820C26

More investigation is obviously required, however, it would appear that IceTV needs to pass the ETSI description, or at least TVH's version of the ETSI description, for the genre description to be accepted.

The JSON feed already has the ETSI code, perhaps the ICE Kodi addon needs a lookup table to find a value that will work with TVH.

DeltaMikeCharlie

Quote from: DeltaMikeCharlie on July 10, 2023, 07:06:09 PMMore investigation is obviously required, however, it would appear that IceTV needs to pass the ETSI description, or at least TVH's version of the ETSI description, for the genre description to be accepted.
I have an idea regarding sourcing the genre names.....

The tvheadend JSON API has a function to return a list of genre codes and descriptions.

http://{IceBox IP}:9981/api/epg/content_type/list?limit=99999&full=1

The IceTV addon could extract this list at startup and then match the ICE 'EIT' field to the TVH 'key' field (with appropriate hex/dec conversions) to extract the description expected by TVH.

This hypothesis has not been tested.  It's possible that the TVH API module returns descriptions that are different to those expected by the XMLTV module.

prl

Quote from: DeltaMikeCharlie on July 13, 2023, 08:42:28 AMhttp://{IceBox IP}:9981/api/epg/content_type/list?limit=99999&full=1

What text does that list for genre ids 0x50 (decimal 80) and 0x55 (decimal 85)?

The mapping between numeric genre ids and the text names of the genres used in the IceTV EPG is specific to IceTV, and it doesn't follow either the DVB standard for genres ("content descriptors" in the standard) or the Australian DVB standard.

If the IceBox's EPG only stores the numeric genre id (as is the case for Beyonwiz PVRs), handling the difference is messy (I know, because I implemented it in the Beyonwiz code).
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

DeltaMikeCharlie

Quote from: prl on July 13, 2023, 11:42:42 AMWhat text does that list for genre ids 0x50 (decimal 80) and 0x55 (decimal 85)?

Here are the codes that you requested:

        }, {
            "key": 80,
            "val": "Children's / Youth programmes"
        }, {
<SNIP>
        }, {
            "key": 85,
            "val": "Cartoons / Puppets"
        }, {

The ICE JSON feed shows both the ICE proprietary description plus the ETSI 'content_descriptor' field from the EIT.

            "category": [{
                    "name": "Children",
                    "eit": "0x50"
                }, {
                    "name": "Cartoon",
                    "eit": "0x55"
                }, {
                    "name": "Education",
                    "eit": "0x90"
                }
            ],

Where the ICE text reads 'Children', the associated 'eit' field provided by ICE 0x50 (80 decimal).  Using this value to lookup what TVH thinks, 80  yields 'Children's / Youth programmes'.

There are also some genres that don't have a ETSI code like 'Mini Series'.

            "category": [{
                    "name": "Documentary",
                    "eit": "0x23"
                }, {
                    "name": "Real Life",
                    "eit": "0xf0"
                }, {
                    "name": "Mini Series",
                    "eit": "0x0"
                }, {
                    "name": "Society & Culture",
                    "eit": "0x80"
                }
            ],

The IceBox is based on a 'tvheadend' backend.  Its EPG JSON API always delivers the genres as decimal values.  I have not dug super-deep into their code, but it seems that they store their genres as multiple code, not texts.  From memory, the HTSP binary feed also provides code, not text.

        }, {
            "eventId": 69341,
            "episodeId": 69342,
            "episodeUri": "crid://CRID://sydney20.abc.net.au/NC2304H018S00",
            "serieslinkId": 38074,
            "serieslinkUri": "crid://CRID://sydney20.abc.net.au/NC2304H",
            "channelName": "ABC HD-FTA",
            "channelUuid": "ef9f56f1f0dfbf42f6cdce8c8db80dcf",
            "channelNumber": "920",
            "channelIcon": "imagecache/12",
            "start": 1685360130,
            "stop": 1685363910,
            "title": "Q+A",
            "subtitle": "Writing the wrongs of history as literary giants join the panel. We discuss calls to end the pursuit of Julian Assange, Indian PM Narendra Modi's Australian visit plus balancing trade with China and staying strong on defence.",
            "summary": "Writing the wrongs of history as literary giants join the panel. We discuss calls to end the pursuit of Julian Assange, Indian PM Narendra Modi's Australian visit plus balancing trade with China and staying strong on defence.",
            "genre": [128],
            "nextEventId": 69344
        }, {

128 = 0x80 = 'social/political issues/economics (general)'

My understanding of the TVH XMLTV import process is that it tries to match the genre text that ICE provides to its list of ETSI test descriptions.  If it gets a match, it stores the ETSI code.  If no match is found, it looks like the text is ignored and the code set to 0x00.

Here is the whole list that the TVH JSON API returns.
content_type.json

prl

#4
Quote from: DeltaMikeCharlie on July 13, 2023, 01:25:30 PMMy understanding of the TVH XMLTV import process is that it tries to match the genre text that ICE provides to its list of ETSI test descriptions.  If it gets a match, it stores the ETSI code.  If no match is found, it looks like the text is ignored and the code set to 0x00.

Yes, I agree that that is how it looks. That wouldn't give very good results for the IceTV genres. The IceTV genre numbering is quite different from the ETSI numbering: IceTV reuses codes for different text strings, uses different codes than ETSI for some of the same strings, uses different strings for similar genres, uses more codes than the ETSI table, and doesn't follow the 2-level hierarchy in the ETSI table.
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

DeltaMikeCharlie

When I was working on a Topfield TAP to import ICE EPG data, I settled on augmenting the 'extended description' with the additional data provided by ICE.

https://forum.icetv.com.au/index.php?topic=7185.msg37787#msg37787

With the content identifier, for example, I only converted the first ICE value into an EIT format that was fed into the native Topfield EPG.  However, I augmented the extDesc with every genre text that ICE provided.

Using mini-series as an example:  Even though there is no ETSI code for this description, adding it to the extDesc is a way to provide the user with this additional information that is currently lost.  It would be visible on screen and still available to a full text search if required.

ICE already augment the extDesc on their web site EPG display.
ICE augmented web desc.png
They just need to add that logic to the IceBox EPG import.

prl

Quote from: DeltaMikeCharlie on July 13, 2023, 02:56:10 PMWith the content identifier, for example, I only converted the first ICE value into an EIT format that was fed into the native Topfield EPG.  However, I augmented the extDesc with every genre text that ICE provided.

That's certainly one way of doing it. For the Beyonwiz, I did some remapping of IceTV codes so that they could be used in the EIT content descriptor fields, and had an IceTV-specific mapping of the genre codes to strings. I'd always wanted to put the other IceTV metadata like credits, language & year into tagged fields of the extended descriptor, so that they'd have visibility as separate entities. But that would have required implementing tagged extended descriptors in the low-level code, and I never got around to it.
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

DeltaMikeCharlie

Quote from: prl on July 13, 2023, 04:06:17 PMI'd always wanted to put the other IceTV metadata like credits, language & year into tagged fields of the extended descriptor, so that they'd have visibility as separate entities. But that would have required implementing tagged extended descriptors in the low-level code, and I never got around to it.
This rings a bell, but it was so long ago.  Either the Topfield EPG did not recognise 'items' within the extDesc at all or it presented them in an odd way.  In either case, it did not produce the result that I was looking for so it was abandoned.

prl

I think we're getting a bit off topic. It'd be intersting to hear from Daniel whether our speculation about the cause of the problem in the IceBox is correct.
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

DeltaMikeCharlie

I created a bogus channel that I could feed XMLTV into via Unix Domain Sockets.

I then extracted the genre description list from TVH and created dummy EPG events using the genre names as titles, etc.

Mostly they worked except for the Children's block 0x5_.

TVH Missing Genres.png

I found that the JSON API returned "Entertainment programmes for 6 to 14" whereas the XMLTV feed accepted "Entertainment programs for 6 to 14".

When I was creating my XMLTV feed, I also found that the Python library that I was using inserted escape characters for certain characters.

"Children's / Youth programs" would have become "Children\'s / Youth programs" and therefore ignored.  When I tried with a manual entry where the apostrophe was not escaped, the description worked.  It would appear that TVH can accepted escaped characters in the text fields, but not the genre fields.

prl

I have a list of IceTV genre names/numbers from Daniel, but I'm not sure that I can share it. It's possible to extract most of them from downloads of the IceTV EPG, anyway.
Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

DeltaMikeCharlie

Quote from: prl on July 15, 2023, 10:44:32 AMI have a list of IceTV genre names/numbers from Daniel, but I'm not sure that I can share it. It's possible to extract most of them from downloads of the IceTV EPG, anyway.
I have extracted that (partial?) list from the ICE EPG data some time ago.  However, it is not really needed.

All that is needed is the ETSI code that ICE has associated with their proprietary genre text descriptions.  They already provide this information with every event that has a genre in their JSON EPG feed.

        }, {
            "id": "175928552",
            "series_id": "51693",
            "episode_id": "317266",
            "channel_id": "1",
            "date": "2020",
            "season": "1",
            "episode-num": "1",
            "start": "2023-05-29T10:30:00+00:00",
            "stop": "2023-05-29T11:25:00+00:00",
            "title": "Michael Palin In North Korea",
            "subtitle": "Series 1, Episode 1",
            "desc": "This incredible two-part series, which required two years of planning and high-level negotiations, grants presenter Michael Palin and his team unprecedented access to the reclusive country of North Korea.\r\nCovering more than 1,300 miles in just over a week, Palin travels from the capital Pyongyang to the snowy peaks of Mount Paektu, interviewing locals and understanding what it's like to live there, experiencing North Korea in a way most Westerners will have never previously seen.",
            "credits": {
                "actors": [{
                        "name": "Michael Palin"
                    }
                ]
            },
            "category": [{
                    "name": "Documentary",
                    "eit": "0x23"
                }, {
                    "name": "Real Life",
                    "eit": "0xf0"
                }, {
                    "name": "Mini Series",
                    "eit": "0x0"
                }, {
                    "name": "Society & Culture",
                    "eit": "0x80"
                }
            ],
            "language": "English",
            "country": "United Kingdom",
            "video": {
                "aspect": "16:9",
                "colour": "YES",
                "quality": "SDTV"
            },
            "previously-shown": {
                "start": "2020-02-23",
                "channel_id": ""
            },
            "subtitles": {
                "onscreen": "English"
            },
            "part_of_series": "Yes",
            "rating": "G"
        }, {

1) To make TVH recognise the event genre, ICE just need to take the ETSI code that they already provide with each event, lookup the exact description that TVH expects as genre text and send that text instead of the proprietary genre text descriptions that ICE uses.

EG:  ICE get 'Children' + '0x50' from their JSON feed.  They lookup '0x50' and find 'Children's / Youth programs'.  They set the TVH genre text to 'Children's / Youth programs' instead of 'Children'.

2) So that the end users (us lot) can get the extra value out of the proprietary genre text descriptions that ICE uses, ICE could append their proprietary text to the extended description for us to read.

DeltaMikeCharlie

Quote from: DeltaMikeCharlie on July 15, 2023, 09:30:55 AM"Children's / Youth programs" would have become "Children\'s / Youth programs" and therefore ignored.  When I tried with a manual entry where the apostrophe was not escaped, the description worked.  It would appear that TVH can accepted escaped characters in the text fields, but not the genre fields.
This absolute was nonsense on my part.  The problem with with 'programme' vs 'program' not the escape character.  The escape character only appeared on my data dump screen, not in the actual data sent.

prl

Quote from: DeltaMikeCharlie on July 15, 2023, 11:59:36 AMAll that is needed is the ETSI code that ICE has associated with their proprietary genre text descriptions.

Unfortunately, that's not all that's needed. More than half of the IceTV genres have 0 as their ETSI code. The ETSI code 0 is "undefined content" (in fact the whole range 0x00 to 0x0f is "undefined content").

There are doubled-up non-zero codes, too. For example, "Business & Finance", "Parliament" and "Society & Culture" all have code 128. "Crime", "Murder", "Mystery" and "Thriller" are all code 17, and there are a lot more examples.

It can be done (because I've done it), but it may be more complicated than you expect.

Peter
Beyonwiz T4 in-use
Beyonwiz T2, T3, T4, U4 & V2 for testing

DeltaMikeCharlie

I'm sorry, I'm not quite sure that I see the problem:

ICE 'Business & Finance' + '0x80' = ETSI 'Social / Political issues / Economics'
ICE 'Parliament' + '0x80' = ETSI 'Social / Political issues / Economics'
ICE 'Society & Culture' + '0x80' = ETSI 'Social / Political issues / Economics'
ICE 'Crime' + '0x11' = ETSI 'Detective / Thriller'
ICE 'Murder' + '0x11' = ETSI 'Detective / Thriller'
ICE 'Mystery' + '0x11' = ETSI 'Detective / Thriller'
ICE 'Thriller' + '0x11' = ETSI 'Detective / Thriller'

All of these proprietary ICE descriptions seem to have been mapped to reasonable ETSI equivalents.  The mapping, however, will never be perfect.

As for the null (and other invalid) values, I only see that as a potential issue if there are no other genres for that event.  For the sample EPG entry that I provided earlier, both 'Real Life' and 'Mini Series' are invalid codes, however, 'Documentary' and 'Society & Culture' are still valid genres.  The only issue that I can see is that perhaps ICE could have added their version of ETSI 0xA1 'Tourism / Travel' if they have one.

Adding the text for the proprietary ICE descriptions to the end of the extended description would still allow the proprietary descriptions without ETSI equivalents to be useful.

I too have implemented a solution for my Topfield TAP.  I read through all of the genres that ICE provides.  Every text item is appended to the extended description and the first non-null value is fed into the firmware's EPG (it can only store 1 genre) via its ETSI code.

As long as we have TVH as the back end, and as long as that back end only recognises ETSI genres, then we need to provide ETSI genre data.  To the best of my knowledge, TVH can also receive/record ATSC broadcasts.  The ATSC genre codes are very different to the ETSI codes.  If TVH is updated in the future to accommodate ATSC genres, perhaps ICE could make use of that feature.

Consideration also needs to be given to the front end, Kodi.  Unless it can somehow make use of custom genres, then we are stuck once more.  From previous investigations into the Kodi language files, I only remember finding ETSI descriptions.

I look forward to ICE's solution once implemented.