I\'m new to Docker and was reading up on Docker. It\'s a great way to test systems in a self contained and reproducible standardized configuration (when done correctly).
I prefer your option (3) i.e. to include test code in the production deployable artifact (the docker image)
Will quote Alister Scott from GTAC 2015 which I attended:
Don’t be afraid to add testability specific features to your app that don’t serve a functional purpose. I recently had to get new tyres on my car and realized that a lot of tyres have testability features called tread indicators. These don’t serve a functional purpose
For integration and e2e tests, i.e. tests that require more than 1 docker image to be used, I prefer CI tool that, through docker-compose, and a separate git repo for these tests, orchestrates the creation of all containers that are needed for the larger test. Again the docker images used should be the exact same as for production except what varies is the configuration (e.g. environment variables) that make the tests point to test data and/or staging services.