It took me many iterations to settle down on the exact logic. At first, I started with just the HiG2Vec GO similarity embedding. It did alright, but I didn’t like how the same protein family gets wildly different scores based on just pathway participation or tissue expression. I added ESM2 sequence-based embedding to tame this inconsistency. It also resulted in the “your guess is top-9 similar” hint to be arranged in the order of increasing sequence similarity, which is a nice bonus for late-game triangulation.
I tried making a shared embedding out of two separate ones, but ran into statistical issues with how differently I needed to normalize them. Instead, I opted to calculate intermediate “evidence strengths” for each embedding separately, and then combining them into a final similarity percentage in such a way that highly rewards both “only similar by sequence” and “only functionally similar”, so that a player of any background has a chance to close onto the target using their own experience, no matter if it’s the experience in pathways or in structural families.
Thanks for checking it out.
It took me many iterations to settle down on the exact logic. At first, I started with just the HiG2Vec GO similarity embedding. It did alright, but I didn’t like how the same protein family gets wildly different scores based on just pathway participation or tissue expression. I added ESM2 sequence-based embedding to tame this inconsistency. It also resulted in the “your guess is top-9 similar” hint to be arranged in the order of increasing sequence similarity, which is a nice bonus for late-game triangulation.
I tried making a shared embedding out of two separate ones, but ran into statistical issues with how differently I needed to normalize them. Instead, I opted to calculate intermediate “evidence strengths” for each embedding separately, and then combining them into a final similarity percentage in such a way that highly rewards both “only similar by sequence” and “only functionally similar”, so that a player of any background has a chance to close onto the target using their own experience, no matter if it’s the experience in pathways or in structural families.