Here's a rough schema of what I want to accomplish. It is important to take into account that the files with voices can be of indefinite length (for example: a couple of seconds, or even minutes) and their number varies from three to five. But they all start playing one after another at about 90 minutes of music.