To aid existing telemental health services, we propose DeepTMH, a novel framework that models telemental health session videos by extracting latent vectors corresponding to Affective and Cognitive features frequently used in psychology literature. Our approach leverages advances in semi-supervised learning to tackle the data scarcity in the telemental health session video domain and consists of a multimodal semi-supervised GAN to detect im- portant mental health indicators during telemental health sessions. We demonstrate the usefulness of our framework and contrast against existing works in two tasks: Engagement regression and Valence-Arousal regression, both of which are important to psychologists during a telemental health session. Our framework reports 40% improvement in RMSE over SOTA method in Engagement Regression and 50% improvement in RMSE over SOTA method in Valence- Arousal Regression. To tackle the scarcity of publicly available datasets in telemental health space, we release a new dataset, MEDICA, for mental health patient engagement detection. Our dataset, MEDICA consists of 1299 videos, each 3 seconds long. To the best of our knowledge, our ap- proach is the first method to model telemental health session data based on psychology-driven Affective and Cognitive features, which also accounts for data sparsity by leveraging a semi-supervised setup.