improving acoustic quality
I have a set of audio files (~1000 hours), all speech of a single speaker. A good portion is in reasonable quality but many have various acoustic defects. I thought I could train a neural network to learn what a good recording looks like and then translate the defect ones to good ones, much like we can use a CycleGAN to translate a sketch to a photograph. But the problem is that the defect recordings are not all defect in the same way. Some have a lot of noise, some have echo, some have high frequencies cut off etc. So I cannot really train a CycleGAN in the traditional manner because I don't have two domains between which to translate but a variety of domains and combinations among them.
I wonder how to tackle this problem. Any ideas are welcome.