Virtual power plants (VPPs) have emerged as an advanced solution for coordinating distributed energy resources (DERs), including the stored energy of electric vehicles (EVs). The substantial demand for EV charging imposes significant stress on the electrical grid, resulting in elevated energy costs for operators. On the other hand, the advent of reversible charging technologies offers a promising method to harness the surplus energy from EVs that do not require immediate charging. In this study, we introduce the concept of EV-integrated VPP in place of the traditional charging station. By designing a tailored mathematical model, we optimize the charging and discharging schedule, termed optimal power orchestration, which aims to minimize the energy costs as well as EV battery degradation. We further design a lightweight multi-agent reinforcement learning (MARL) based approach to tackle the optimal power orchestration problem by reformulating it as a decentralized partially observable Markov decision process (Dec-POMDP). Meanwhile, knowledge distillation is also incorporated into the proposed method to enable efficient deployment in such a distributed resource-constrained environment. Through extensive experiments utilizing real-world EV charging data and realistic scenario settings, our findings demonstrate significant reductions in energy costs and battery degradation by 15.5% and 71.1%, respectively, compared to the baseline method.