How should you build your database from source control?

后端 未结 11 1449
我寻月下人不归
我寻月下人不归 2020-11-30 15:48

There has been some discussion on the SO community wiki about whether database objects should be version controlled. However, I haven\'t seen much discussion about t

相关标签:
11条回答
  • 2020-11-30 16:38

    +1 for Liquibase: LiquiBase is an open source (LGPL), database-independent library for tracking, managing and applying database changes. It is built on a simple premise: All database changes (structure and data) are stored in an XML-based descriptive manner and checked into source control. The good point, that DML changes are stored semantically, not just diff, so that you could track the purpose of the changes.

    It could be combined with GIT version control for better interaction. I'm going to configure our dev-prod enviroment to try it out.

    Also you could use Maven, Ant build systems for building production code from scripts.

    Tha minus is that LiquiBase doesnt integrate into widespread SQL IDE's and you should do basic operations yourself.

    In affffdition to this you could use DBUnit for DB testing - this tool allows data generation scripts to be used for testing your production env with cleanup aftewards.

    IMHO:

    1. Store DML in files so that you could version them.
    2. Automate schema build process from source control.
    3. For testing purposes developer could use local DB builded from source control via build system + load testing Data with scripts, or DBUnit scripts (from Source Control).
    4. LiquiBase allows you to provide "run sequence" of scripts to respect dependences.
    5. There should be DBA team that checks master brunch with ALL changes before production use. I mean they check trunk/branch from other DBA's before committing into MASTER trunk. So that master is always consistent and production ready.

    We faced all mentioned problems with code changes, merging, rewriting in our billing production database. This topic is great for discovering all that stuff.

    0 讨论(0)
  • 2020-11-30 16:39

    Here are some some answers to your questions:

    1. Should both test and production environments be built from source control? YES
      • Should both be built using automation - or should production by built by copying objects from a stable, finalized test environment?
      • Automation for both. Do NOT copy data between the environments
      • How do you deal with potential differences between test and production environments in deployment scripts?
      • Use templates, so that actually you would produce different set of scripts for each environment (ex. references to external systems, linked databases, etc)
      • How do you test that the deployment scripts will work as effectively against production as they do in test?
      • You test them on pre-production environment: test deployment on exact copy of production environment (database and potentially other systems)
    2. What types of objects should be version controlled?
      • Just code (procedures, packages, triggers, java, etc)?
      • Indexes?
      • Constraints?
      • Table Definitions?
      • Table Change Scripts? (eg. ALTER scripts)
      • Everything?
      • Everything, and:
        • Do not forget static data (lookup lists etc), so you do not need to copy ANY data between environments
        • Keep only current version of the database scripts (version controlled, of course), and
        • Store ALTER scripts: 1 BIG script (or directory of scripts named liked 001_AlterXXX.sql, so that running them in natural sort order will upgrade from version A to B)
    3. Which types of objects shouldn't be version controlled?
      • Sequences?
      • Grants?
      • User Accounts?
      • see 2. If your users/roles (or technical user names) are different between environments, you can still script them using templates (see 1.)
    4. How should database objects be organized in your SCM repository?
      • How do you deal with one-time things like conversion scripts or ALTER scripts?
      • see 2.
      • How do you deal with retiring objects from the database?
      • deleted from DB, removed from source control trunk/tip
      • Who should be responsible for promoting objects from development to test level?
      • dev/test/release schedule
      • How do you coordinate changes from multiple developers?
      • try NOT to create a separate database for each developer. you use source-control, right? in this case developers change the database and check-in the scripts. to be completely safe, re-create the database from the scripts during nightly build
      • How do you deal with branching for database objects used by multiple systems?
      • tough one: try to avoid at all costs.
    5. What exceptions, if any, can be reasonable made to this process?
      • Security issues?
      • do not store passwords for test/prod. you may allow it for dev, especially if you have automated daily/nightly DB rebuilds
      • Data with de-identification concerns?
      • Scripts that can't be fully automated?
      • document and store with the release info/ALTER script
    6. How can you make the process resilient and enforceable?
      • To developer error?
      • tested with daily build from scratch, and compare the results to the incremental upgrade (from version A to B using ALTER). compare both resulting schema and static data
      • To unexpected environmental issues?
      • use version control and backups
      • compare the PROD database schema to what you think it is, especially before deployment. SuperDuperCool DBA may have fixed a bug that was never in your ticket system :)
      • For disaster recovery?
    7. How do you convince decision makers that the benefits of DB-SCM truly justify the cost?
      • Anecdotal evidence?
      • Industry research?
      • Industry best-practice recommendations?
      • Appeals to recognized authorities?
      • Cost/Benefit analysis?
      • if developers and DBAs agree, you do not need to convince anyone, I think (Unless you need money to buy a software like a dbGhost for MSSQL)
    8. Who should "own" database objects in this model?
      • Developers?
      • DBAs?
      • Data Analysts?
      • More than one?
      • Usually DBAs approve the model (before check-in or after as part of code review). They definitely own performance related objects. But in general the team own it [and employer, of course :)]
    0 讨论(0)
  • 2020-11-30 16:43

    We have our Silverlight project with MSSQL database in Git version control. The easiest way is to make sure you've got a slimmed down database (content wise), and do a complete dump from f.e. Visual Studio. Then you can do 'sqlcmd' from your build script to recreate the database on each dev machine.

    For deployment this is not possible since the databases are too large: that's the main reason for having them in a database in the first place.

    0 讨论(0)
  • 2020-11-30 16:52

    Rather than get into white tower arguments, here's a solution that has worked very well for me on real world problems.

    Building a database from scratch can be summarised as managing sql scripts.

    DBdeploy is a tool that will check the current state of a database - e.g. what scripts have been previously run against it, what scripts are available to be run and therefore what scripts are needed to be run.

    It will then collate all the needed scripts together and run them. It then records which scripts have been run.

    It's not the prettiest tool or the most complex - but with careful management it can work very well. It's open source and easily extensible. Once the running of the scripts is handled nicely adding some extra components such as a shell script that checks out the latest scripts and runs dbdeploy against a particular instance is easily achieved.

    See a good introduction here:

    http://code.google.com/p/dbdeploy/wiki/GettingStarted

    0 讨论(0)
  • 2020-11-30 16:55

    I strongly believe that a DB should be part of source control and to a large degree part of the build process. If it is in source control then I have the same coding safe guards when writing a stored procedure in SQL as I do when writing a class in C#. I do this by including a DB scripts directory under my source tree. This script directory doesn't necessarily have one file for one object in the database. That would be a pain in the butt! I develop in my db just a I would in my code project. Then when I am ready to check in I do a diff between the last version of my database and the current one I am working on. I use SQL Compare for this and it generates a script of all the changes. This script is then saved to my db_update directory with a specific naming convention 1234_TasksCompletedInThisIteration where the number is the next number in the set of scripts already there, and the name describes what is being done in this check in. I do this this way because as part of my build process I start with a fresh database that is then built up programatically using the scripts in this directory. I wrote a custom NAnt task that iterates through each script executing its contents on the bare db. Obviously if I need some data to go into the db then I have data insert scripts too. This has many benefits too it. One, all of my stuff is versioned. Two, each build is a fresh build which means that there won't be any sneaky stuff eking its way into my development process (such as dirty data that causes oddities in the system). Three, when a new guy is added to the dev team, they simply need to get latest and their local dev is built for them on the fly. Four, I can run test cases (I didn't call it a "unit test"!) on my database as the state of the database is reset with each build (meaning I can test my repositories without worrying about adding test data to the db).

    This is not for everyone.

    This is not for every project. I usually work on green field projects which allows me this convenience!

    0 讨论(0)
提交回复
热议问题