ConLA: Contrastive Latent Action Learning from Human Videos for Robotic Manipulation