Forgot to follow up here but turning up the learning rate multiplier to 10 seemed to do the trick without introducing any over-fitting weirdness or instability
Forgot to follow up here but turning up the learning rate multiplier to 10 seemed to do the trick without introducing any over-fitting weirdness or instability