With an aim to improve existing telemental health services, we present TeleEngage, a novel framework leveraging semi- supervised multimodal GAN to detect engagement levels during conversations from videos. Inspired by psychology practices used to capture patient engagement, we create features for Affective, and Cognitive engagement. We feed these features to a semi-supervised GAN network and regress using these latent representations to obtain the corresponding engagement values for humans in videos. We also publicly re- lease a new dataset, MEDICA, for mental health patient engagement detection. We demonstrate the efficiency of our approach through experiments on multiple datasets. To evaluate our method, we analyze and compare our performance on MEDICA and RECOLA. On the RECOLA dataset, we demonstrate the utility of our framework on the Valence and Arousal estimation task commonly performed in psychology studies of patients. We report an average improvement of 40% on RMSE over the existing methodologies for these tasks. To the best of our knowledge, our approach is the first method to estimate mental health patient engagement based on psychology driven features used in a multimodal, semi-supervised setup.