A DL model which accepts recorded audio as input and output the words spoken in the audio along with the metrics such as Word Error Rate (WER) and Character Error Rate (CER). We used transfer learning on the pre-trained model provided by the Google - Transformers on the open source LibriSpeech Audio Dataset which has approximately of 1000 hours and is sampled at the rate of 16KHz.