CLaD: Planning with Grounded Foresight via Cross-Modal Latent Dynamics