I have read a bunch of posts on SO regarding the computation of end-to-end delay in Veins, but have not found an answer to be fulfilling in explaining why the delay is seemingly
This end-to-end delay is to be expected. If your application's simulation model does not explicitly model processing delay (e.g., by an application running on a slow general purpose computer), all you would expect to delay a frame is propagation delay (lightspeed, so negligible here) and queueing delay on the MAC (time from inserting frame into TX queue until transmission finishes).
To give an example, for a 2400 bit frame sent at 6 Mbit/s this delay is roughly 0.45 ms. You are likely using slightly shorter frames, so your values appear to be reasonable.
For background information, see F. Klingler, F. Dressler, C. Sommer: "The Impact of Head of Line Blocking in Highly Dynamic WLANs" (DOI 10.1109/TVT.2018.2837157), which also includes a comparison of theory vs. Veins vs. real measurements.