We have two tables: OriginalDocument and ProcessedDocument. In the first one we put an original, not processed document. After it\'s validated and processed (converted to our XM
Do try to make distinction between logical and physical modelling.
Even if the difference between the two entities is only seven properties, they are logically a different thing in those seven items. At the same time they are a same thing in other properties.
The way to logically represent that is this have one-to-one-or-zero relationship between the two tables, and to have one table store all the common properties (superclass) and in the other (subclass) you would only store the ID from the superclass.
In terms of performance this is not so bad:
Depending on the processes you are modelling, the frequency of these queries and other things (such as security for both entities, ownership, difference in integrity rules) you might decide to store this information in one table in the database or in two (either can be much faster in border-line cases and two table solution can also be denormalized a bit; for example you could still store information in a main table about the type of the document to avoid the join if that kind of query is all you care).
Or maybe your implementation decisions might be driven by your choice of application framework and for that reason you might really prefer working with single table or the other way around (for example automatic creation of data entry forms in frameworks such as django-admin).
Whatever you do, realize the difference between the logical and physical design. In your logical design normalize everything - it will pay off. In physical implementation make different scenarios and - test, test, test with your own data. Never confuse the order of the two (logical-conceptual and physical-practical modelling).