Motivated by communication systems with constrained complexity, we consider the problem of input symbol selection for discrete memoryless channels (DMCs). Given a DMC, the goal is to find a subset of its input alphabet, so that the optimal input distribution that is only supported on these symbols maximizes the capacity among all other subsets of the same size (or smaller). We observe that the resulting optimization problem is non-concave and non-submodular, and so generic methods for such cases do not have theoretical guarantees. We derive an analytical upper bound on the capacity loss when selecting a subset of input symbols based only on the properties of the transition matrix of the channel. We propose a selection algorithm that is based on input-symbols clustering, and an appropriate choice of representatives for each cluster, which uses the theoretical bound as a surrogate objective function. We provide numerical experiments to support the findings.