Following initial work by JaJa, and Ahlswede and Cai, and inspired by a recent renewed surge in interest in deterministic identification (DI) via noisy channels, we consider the problem in its generality for memoryless channels with finite output, but arbitrary input alphabets. Such a channel is essentially given by (the closure of) the subset of its output distributions in the probability simplex. Our main findings are that the maximum number of messages thus identifiable scales super-exponentially as $2^{R\,n\log n}$ with the block length $n$, and that the optimal rate $R$ is upper and lower bounded in terms of the covering (aka Minkowski, or Kolmogorov, or entropy) dimension $d$ of a certain algebraic transformation of the output set: $\frac14 d \leq R \leq \frac12 d$. Along the way, we present a \emph{Hypothesis Testing Lemma} that shows it is sufficient to ensure pairwise reliable distinguishability of the output distributions to construct a DI code. Although we do not know the exact capacity formula, we can conclude that the DI capacity exhibits super-activation: there exist channels whose capacity is zero, but whose product has positive capacity. These results are then generalised to classical-quantum channels with finite-dimensional output quantum system (but arbitrary input alphabet), and in particular to quantum channels on finite-dimensional quantum systems under the constraint that the identification code can only use tensor product inputs.