Use a pair of values (doubles, integers, whatever) for your distance calculation.
The first is the distance, the second the number of turns.
Sort lexically, so the first one matters more than the second one.
This is cleaner than "use a small penalty for turns" both mathematically and programmatically.
Each node is duplicated. The node "having entered vertically" and "having entered horizontally", as they make a difference to the number of turns.
The heuristic is Manhattan distance, with a turn if you aren't exactly horizontal or vertically aligned with the target.
As a down side, this technique gets in the way of the Jump Point optimization, as there are far fewer symmetric paths to a location (as some paths have more turns than others).