2 Apr
2022
2 Apr
'22
12:11 p.m.
Ok, so let's check out this detokenizer. I don't know how speech-to-text models convert a long stream of samples into a sequence of tokens, and I suspect they do something to avoid the concept of including feedback around word boundaries. It seems to me they go to pains to avoid putting feedback inside their architectures, but I could be wrong.