Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes. In this presentation, we show that under mild regularity conditions (in particular, involving only weak continuity or Wasserstein continuity of the transition kernel of an MDP), Qlearning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit under mild ergodicity conditions, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) near-optimality of finite state model approximations for MDPs with weakly continuous kernels, and (ii) convergence of quantized Q-learning to a limit which corresponds to the fixed point of a constructed approximate finite MDP which depends on the exploration policy used during learning. This result also implies near optimality of empirical model learning where one fits a finite MDP model to data as an alternative to quantized Q-learning, for which we also obtain sample complexity bounds. Thus, we present a general rigorous convergence and near optimality result for the applicability of Q-learning and model learning for continuous MDPs. Our analysis applies also to problems with non-compact state spaces via non-uniform quantization with convergence bounds, to non-Markovian stochastic control problems which can be lifted to measure-valued MDPs under appropriate topologies (as in POMDPs and decentralized stochastic control), and controlled diffusions via time-discretization. [Joint work with Ali Kara, Emre Demirci, Omar Mrani-Zentar, Naci Saldi, and Somnath Pradhan]