This paper presents a data-driven methodology for estimating capacity regions in multi-user communication scenarios, focusing on channels with discrete alphabets, both with and without feedback. Prior research has successfully utilized neural networks for estimating capacity regions in continuous domains. However, the shift to discrete alphabets introduces a significant challenge due to the lack of end-to-end differentiability of the joint model. To tackle this issue, we first formulate the optimization problem of the causally conditioned directed information rate as a decentralized Markov decision process (MDP). Building on this formulation, we introduce a tractable optimization procedure specifically designed to estimate rate pairs that lie on the boundary of the capacity region. In addressing the inherent complexity of the MDP state space, we employ a reinforcement learning (RL) algorithm to learn the optimal policies. To demonstrate the performance of our methodology, we apply it to various communication scenarios, including the two-way channel and the multiple access channel (MAC), all under the constraint of a discrete input alphabet. The results showcase the adaptability and performance of the proposed RL-based framework in estimating capacity regions without explicit knowledge of the underlying channel model.