MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation