As Daniel posted, he sent me some more EPG sample files that have this problem, and I've verified that the issue is with the mapping of the 11-digit (decimal) IceTV event ids to the 16-bit event ids used in the Beyonwiz EPG (and elsewhere on the Beyonwiz side, like in timers).
It boils down to the
Birthday Problem (and is similar to the
Birthday Attack).
With a 16-bit hash used to reduce the IceTV event ids to EIT event id size, and about 200 events/channel in the EPG (the problem only occurs when the same short event id is used for two different programs on the same channel), there's about a 25% probability that at least two different events will have the same hash, if the hashes are uniformly distributed. If that seems an unreasonably high probability, that's why the Birthday Problem is also called the Birthday Paradox.
When a collision occurs, the event already in the EPG with that EIT event id is removed, and the event that collided is added to the EPG. The EPG allows lookup of an event by its EIT event id, so multiple events with the same EIT event id can't co-exist in the EPG.
The IceTV event ids are not uniformly distributed, they tend to run in sequences, and the simple modulo hash used in the Beyonwiz IceTV plugin tends to preserve those sequences. That means that the collisions tend to cluster: if one collision occurs in a channel, there's a fairly high probability that a run of several collisions will occur, and that removes a block of entries from the EPG on the Beyonwiz side, even though all the data has been correctly sent and received. However, there seems to be about the same overall probability that collisions occur as in the uniform distribution case.
I have experimented with re-hashing the IceTV event id to remove runs (e.g. by using the IceTV event id's MD5 hash) before reducing it by the modulo hash. That stops blocks of missing entries in the EPG from happening, but it doesn't significantly change the total number of collisions and missing entries in the EPG. They're just scattered over more channels and are less obtrusive.
I don't yet have a solution to this problem that appeals to me in any way
