SQL design approach for searching a table with an unlimited number of bit fields

前端 未结 5 2096
失恋的感觉
失恋的感觉 2020-12-31 23:01

Consider searching a table that contains Apartment Rental Information: A client using the interface selects a number of criteria that are represented as bit fields in the D

相关标签:
5条回答
  • 2020-12-31 23:33

    something like this may work for you:

    define tables:

    CREATE TABLE #Apartments
    (
         ApartmentID    int          not null primary key identity(1,1)
        ,ApartmentName  varchar(500) not null
        ,Status         char(1)      not null default ('A') 
        --....
    )
    
    CREATE TABLE #AttributeTypes
    (
        AttributeType         smallint     not null primary key
        ,AttributeDescription varchar(500) not null
    )
    
    CREATE TABLE #Attributes  --boolean attributes, if row exists apartment has this attribute 
    (
         ApartmentID     int not null --FK to Apartments.ApartmentID    
        ,AttributeID     int not null primary key identity(1,1)
        ,AttributeType   smallint  not null --fk to AttributeTypes
    )
    

    insert sample data:

    SET NO COUNT ON
    INSERT INTO #Apartments VALUES ('one','A')
    INSERT INTO #Apartments VALUES ('two','A')
    INSERT INTO #Apartments VALUES ('three','I')
    INSERT INTO #Apartments VALUES ('four','I')
    
    INSERT INTO #AttributeTypes VALUES (1,'dishwasher')
    INSERT INTO #AttributeTypes VALUES (2,'deck')
    INSERT INTO #AttributeTypes VALUES (3,'pool')
    INSERT INTO #AttributeTypes VALUES (4,'pets allowed')
    INSERT INTO #AttributeTypes VALUES (5,'washer/dryer')
    INSERT INTO #AttributeTypes VALUES (6,'Pets Alowed')
    INSERT INTO #AttributeTypes VALUES (7,'No Pets')
    
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,1)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,2)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,3)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,4)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,5)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,6)
    
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,1)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,2)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,3)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,4)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,7)
    
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,1)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,2)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,3)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,4)
    
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (4,1)
    INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (4,2)
    SET NOCOUNT OFF
    

    sample search query:

    ;WITH GetMatchingAttributes AS
    (
    SELECT
        ApartmentID,COUNT(AttributeID) AS CountOfMatches
        FROM #Attributes
        WHERE AttributeType IN (1,2,3)  --<<change dynamically or split a CSV string and join in
        GROUP BY ApartmentID
        HAVING COUNT(AttributeID)=3--<<change dynamically or split a CSV string and use COUNT(*) from resulting table
    )
    SELECT
        a.*
        FROM #Apartments                      a
            INNER JOIN GetMatchingAttributes m ON a.ApartmentID=m.ApartmentID
        WHERE a.Status='A'
        ORDER BY m.CountOfMatches DESC
    

    OUTPUT:

    ApartmentID ApartmentName 
    ----------- --------------
    1           one           
    2           two           
    
    (2 row(s) affected)
    

    In the search query above, I just included a CSV string of atribute IDs to search for. In reality, you could create a Search stored procedure where you pass in a CSV parameter containing the IDs to search on. You can look at this answer to learn about loop free splitting of that CSV strings into table which you can join to. This would result in not needing to use any dynamic SQL.

    EDIT based on the many comments:

    if you add a few columns to the #AttributeTypes table you could dynamically build the search page. Here are a few suggestions:

    • Status: "A"ctive "I"nactive
    • ListOrder: can use this to sort by to build the screen
    • ColumnNumber: can help organize fields on the same screen row
    • AttributeGroupID: to group fields, see below
    • etc.

    You could make all the fields checkboxes, or add another table called #AttributesGroups, and group some together and use radio buttons. For example, since "Pets Allowed" and "No Pets" are exclusive, add a row in the #AttributesGroups table "Pets". The application would group the attributes in the interface. Attributes in Groups would work the same as regular ungrouped attributes, just collect the selected IDs and pass it in to the search procedure. However, for each group you'll need to have the application include a "no preference" radio button and default it on. This option will not have an attribute ID and is not passed in, since you don't want to consider the attribute.

    In my example, I do show an example of a "super attribute" that is in the #Apartments table, "Status". You should only consider major attributes for this table. If you start using these, you may want to alter the CTE to be FROM #Apartments with filtering on these fields and then join to #Attributes. However you will run into issues of Dynamic Search Conditions, so read this article by Erland Sommarskog.

    EDIT on latest comments:

    here is code to have a list of exclude attributes:

    ;WITH GetMatchingAttributes AS
    (
    SELECT
        ApartmentID,COUNT(AttributeID) AS CountOfMatches
        FROM #Attributes
        WHERE AttributeType IN (1,2,3)  --<<change dynamically or split an include CSV string and join in
        GROUP BY ApartmentID
        HAVING COUNT(AttributeID)=3--<<change dynamically or split a CSV string and use COUNT(*) from resulting include table
    )
    , SomeRemoved AS
    (
    SELECT
        m.ApartmentID
        FROM GetMatchingAttributes      m
            LEFT OUTER JOIN #Attributes a ON m.ApartmentID=a.ApartmentID 
                AND a.AttributeType IN (5,6)   --<<change dynamically or split an exclude CSV string and join in
        WHERE a.ApartmentID IS NULL
    )
    SELECT
        a.*
        FROM #Apartments           a
            INNER JOIN SomeRemoved m ON a.ApartmentID=m.ApartmentID
        WHERE a.Status='A'
    

    I don't think I would go this way though. I'd go with the approach I outlined in my previous EDIT above. When include/exclude of an attribute is necessary, I'd just add an attribute for each: "Pets allowed" and "No Pets".

    I updated the sample data from the original post to show this.

    Run the original query with:

    • (..,..,6,..) to find apartments that allow pets
    • (..,..,7,..) to find apartments where no pets are allowed
    • (..,..,..) if there is no preference.

    I think this is the better approach. When combined with the grouping idea and dynamically built search page described in the last edit, I think this would be better and would run faster.

    0 讨论(0)
  • 2020-12-31 23:33

    I suggest you go with the second approach, known as Entity-attribute-value model. It's probably the only approach that will scale as you need.

    You could also have two searches, the basic and the advanced. You keep the attributes for the basic search in one table, and all the advanced attributes in the other table. This way at least the basic search will remain rapid as the number of attributes will grow with time.

    0 讨论(0)
  • 2020-12-31 23:38

    Create a table that stores attributes or search columns based on the apartment. Definitely do not keep adding more bit field columns..nightmare maintenance and nightmare coding. And definitely don't please don't dynamically generate where statements and use exec.

    0 讨论(0)
  • 2020-12-31 23:44

    I've walked down this path a few times trying to store health status markers!

    When I first started (in 2000?) I tried a character position approach (your #2) and found that it quickly became pretty unwieldy as I wrestled with the same questions over and over: "which position held 'Allows Pets' again?" or, worse yet, "how long is this string now? / which position am I on?" Can you work around this problem - developing objects to manage things for you? Well, yes, to an extent. But I really didn't appreciate how much extra work it cost compared to having the field identities managed for me by the database.

    The second time around, I used an attribute/value pair approach similar to your solution #3. This basically worked and, for specialty needs, I still generate attribute/value pairs using a PIVOT. Also, my background is in AI and we used attribute/value pairs all the time in mechanical theorem proving so this was very natural for me.

    However, there is a huge problem with this approach: pulling any one fact out ("Show me the apartments that allow pets") is easy but pulling all of the records meeting multiple constraints quickly gets very, very ugly (see my example below).

    **SO...**I ended up adding fields to a table. I understand the theoretical reasons that Jon and 'Unknown' and 'New In Town' give for preferring other approaches and I'd have agreed with either or both at one point. But experience is a pretty harsh teacher...

    A Couple More Things

    First, I disagree that adding more bit fields is a nightmare of maintenance - at least compared with a character-bit approach (your #2). That is, having a distinct field for each attribute ensures that there is no 'management' necessary to figure out which slot belongs to which attribute.

    Second, having 300 fields isn't really the problem - any decent database can do that without problem.

    Third, your real issue and the source of pain is really the matter of dynamically generating your queries. If you are like me, this question is really all about "Do I really have to have this massive, grody and inelegant chain of "IF" statements to construct a query?"

    The answer, unfortunately, is Yes. All three of the approaches you suggest will still boil down to a chain of IF statements.

    In a database bit-field approach, you'll end up with a series of IF statements where all of your columns have to be added like so:

    string SQL = "Select X,Y,Z Where ";
    
    if (AllowsPets == 0)
      SQL += "(AllowsPets = 0) AND ";
    else if (AllowsPets == )
      SQL += "(AllowsPets = 1) AND ";  // Else AllowsPets not in query
    .
    .
    .
    SQL = SQL.Substring(SQL.Length - 4);  // Get rid of trailing 'AND' / alternatively append '(1=1)'
    

    In a character-position approach, you'll do the same thing but your "Appends" will add "0", "1" or "_" to your SQL. You'll also, of course, run into the maintenance issues deciding which one is which that I discussed above (enums help but don't completely solve the problem).

    As mentioned above, the Attribute-Value approach is actually the worst. You'll have to either create a nasty chain of sub-queries (which surely will cause a stack overflow of some sort with 300 clauses) or you need to have an IF-THEN like this:

    // Kill any previously stored selections.
    SQLObject.Execute("Delete From SelectedApts Where SessionKey=X");
    // Start with your first *known* attr/value and fill a table with the results.
    .
    .
    Logic to pick first known attr/value pair
    .
    .
    SQLObject.Execute("Insert Into SelectedApts Select X as SessionKey, AptID From AttrValue Where AllowsPets=1");
    
    // Now you have the widest set that meets your criteria. Time to whittle it down.
    if (HasParking == 1)
      SQLObject.Execute("Delete From SelectedApts Where AptID not in (Select AptID From AttrValue Where AllowsChildren=1));
    if (AllowsChildren == 0)
      SQLObject.Execute("Delete From SelectedApts Where AptID not in (Select AptID From AttrValue Where AllowsChildren=0));
    .
    .
    .
    // Perform 2-300 more queries to keep whittling down your set to the actual match.
    

    Now, you may be able to optimize this a bit so you run fewer queries (a PIVOT, sets of subqueries or using the UNION operator) but the fact is that this gets VERY expensive compared to the single query that you can use (but have to build) using the other approaches.

    Thus, this is a painful kind of problem no matter what approach you take - there really is no magic that helps you to avoid it. But, having been there before, I would absolutely recommend approach #1.

    Update: If you are really focused on pulling straight criteria matches ("All Apartments That Have A, B and C") and don't need other queries (like "...Sum(AllowsPets), Sum(AllowsChildren)..." or "...(AllowsPets=1) OR (AllowsChildren=1)...") then I really like KM's answer the more I look at it. It is very clever and looks likely to be acceptably fast.

    0 讨论(0)
  • 2020-12-31 23:51

    I've never tested this, but what if you were to create a varchar(256) fields that stored all of your flags as one long string of 0's and 1's.

    For example,

    • AllowsPets = 1
    • HasParking = 0
    • HasDeck = 1
    • ModernKitchen = 1

    would be:

    • PropertyFlags = 1011

    and if you were looking for something that AllowsPets and HasDeck, then the search query would look something like this:

    WHERE PropertyFlags LIKE '1_1_' (the underscore represents a single wildcard character in the like clause)

    this would solve your issues with adding additional columns to the search in the future, but I'm not sure how this would do performance-wise.

    has anyone out there tried anything similar to this?

    0 讨论(0)
提交回复
热议问题