get a number of unique values without separating values that belong to the same block of values

后端 未结 6 435
伪装坚强ぢ
伪装坚强ぢ 2020-12-19 15:06

I\'m OK with either a PL/SQL solution or an Access VBA/Excel VBA (though Access VBA is preferred over Excel VBA) one. so, PL/SQL is the first choice, Access VBA is second a

相关标签:
6条回答
  • 2020-12-19 15:24

    This gets you most of the way there in standard SQL, it's not quite perfect and I expect that the MODEL clause is what would work best, but...

    What this does, is:

    1. In all_possible work out every possible combination
    2. In some_counting pivot this round and count the number of unique otherids per fax. We can also restrict this to 6 here, so that we exclude any faxs which are never going to qualify
    3. In uniquify use row_number() to ensure that we can split records that have the same number of otherids per fax later and also work out the greatest. If this is 6 then you've got a simple win.
    4. In cumulative_sum work out the running sum of the number of otherids per fax. The trick here is the order in which you do it. I've chosen to pick the greatest first and then add in the smaller ones. I'm sure there's a cleverer way to do this... I did this because if the greatest is 6, you win. If it's 4, say, then you can fill it in with 2 faxs which only have 1 associated otherid etc.
    5. Lastly restrict the cumulative sum to 6 records and pull in all the extra data you need.

    Assuming a table as follows, filled with your data:

    create table tmp_table ( 
       r number
     , otherid number
     , fax number
       );
    

    the code would look like this:

    with all_possible as (
    select t.r as t_r, t.otherid as t_otherid, t.fax as t_fax
         , u.r as u_r, u.otherid as u_otherid, u.fax as u_fax
      from tmp_table t
      left outer join tmp_table u
        on t.fax = u.fax
       and t.r <> u.r
           )
    , some_counting as (
     select fax 
          , count(distinct otherid) as no_o_per_fax
       from all_possible
    unpivot ( (r, otherid, fax) 
            for (a, b, c)
             in ( (t_r, t_otherid, t_fax)
                , (u_r, u_otherid, u_fax)
                ))
     group by fax
    having count(distinct otherid) < 6            
            )
    , uniquify as (
    select c.*
         , row_number() over (order by no_o_per_fax asc) as rn
         , max(no_o_per_fax) over () as m_fax
      from some_counting c
           )
    , cumulative_sum as (
    select u.*, sum(no_o_per_fax) over (order by case when no_o_per_fax = m_fax then 0 else 1 end
                                            , no_o_per_fax asc 
                                            , rn ) as csum
      from uniquify u
           )
    , candidates as (
    select a.*
      from cumulative_sum a
     where csum <= 6
           )
    select b.*
      from tmp_table a
      join candidates b
        on a.fax = b.fax
    

    SQL Fiddle

    I make extensive use of common table expressions here to make the code look cleaner

    0 讨论(0)
  • 2020-12-19 15:28

    This is not a full answer, but I don't want to write a lot of queries in comments.
    Your main goal is to send information to people, and to avoid the situation when one person receives fax twice. So you first you need a list of unique recipients, like this:

    select distinct otherid
      from NR_PVO_120
    

    If one person has two fax numbers, you need to decide, which one to choose:

    select otherid, fax
      from (select otherid, fax, row_number() over (partition by otherid order by <choosing rule>) rn
              from NR_PVO_120)
     where rn = 1
    

    (All of this you have in answers of previous question)
    If you take this list of fax numbers, all of your recipients receive the fax, and only one fax for every person. But some fax numbers will not be used. You can easily find them:

    select otherid, fax
      from (select otherid, fax, row_number() over (partition by otherid order by <choosing rule>) rn
              from NR_PVO_120)
     where rn > 1
    

    If you send fax to any of this numbers, some of people get one fax twice.
    English is not my native language, so I don't understand what you mean when say "without breaking up fax numbers". As I can see in your question, possibly you need to use order of fax numbers in your question as number priority (the higher number is situated in the table - the higher probability to use it). It seems like you can use following:

    select otherid, fax
      from (select otherid, fax, row_number() over (partition by otherid order by row) rn
              from NR_PVO_120)
     where rn = 1
    

    here row in order by clause is a Row from your example table.

    UPD
    P. S. About my last query: we have a table with certain order, and the order is important. We take rows of the table line by line. Take first row and put its otherid and fax to result table. Then take next row. If it contains another fax number and otherid, we take it, if otherid already in our result table, we skip it. Did you ask this algorithm?

    0 讨论(0)
  • 2020-12-19 15:29

    Not sure about your requirement but this is the best I have understood your question. First the code is sorting the data on Fax and then extracting the IDs where Fax is appearing for the the first time, even after that because of the data, there are duplicates IDs, so again sorting and removing duplicates is being done.

    Sub Unique_fax()
    

    Finding the last row so that loop can run that many times

    lastrow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Row
    

    Copying the data to new rows so that your original data remains intact

    For i = 1 To lastrow
    
    Worksheets("Sheet1").Cells(i, 5).Value = Trim(Worksheets("Sheet1").Cells(i, 1))
    Worksheets("Sheet1").Cells(i, 6).Value = Trim(Worksheets("Sheet1").Cells(i, 2))
    Worksheets("Sheet1").Cells(i, 7).Value = Trim(Worksheets("Sheet1").Cells(i, 3))
    
    Next
    

    Sorting the data based on Fax

    Range("E1:G" & lastrow).Select
        Selection.Sort Key1:=Range("G1"), Order1:=xlAscending, _
                       Header:=xlNo, OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom
    

    Copying the IDs where the Fax is different to a new row

    x = 1
    For i = 1 To lastrow
    If Cells(i, 7) <> Cells(i + 1, 7) Then
    Cells(x, 9) = Cells(i, 6)
    x = x + 1
    End If
    Next
    

    Sorting the list of IDs and removing duplicates

    lastrowUnq = Worksheets("Sheet1").Cells(Rows.Count, 9).End(xlUp).Row
    
    Range("I1:I" & lastrowUnq).Select
        Selection.Sort Key1:=Range("I1"), Order1:=xlAscending, _
                       Header:=xlNo, OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom
    y = 1
    For j = 1 To lastrow
    If Cells(j, 9) <> Cells(j + 1, 9) Then
    Cells(y, 11) = Cells(j, 9)
    y = y + 1
    End If
    Next
    
    End Sub
    

    Column - A,B,C is your original Data. Column - E,F,G is the data sorted on Fax. Column - I contains the list of IDs where Fax was unique. Column - K contains the final list of IDs(as required).

    enter image description here

    0 讨论(0)
  • 2020-12-19 15:34

    If I understand the requirements correctly, this should do it.

    EDIT: I missed the uniqueness requirement. So, I've updated the code to account for that.

    EDIT2: Added fax to the output, using a record type.

    declare
        input_number int := 6;
        cursor get_faxes is
               select fax, count(*) num_ids from listofids
                group by fax
                order by fax;
        cursor get_ids (p_fax in int) is
               select otherid from listofids
                 where fax = p_fax;
        type idrec is record(id listofids.otherid%type, fax listofids.fax%type);
        type idlist is table of idrec;
        output_list idlist := idlist();
        v_memberof  boolean;
    begin
        for fax_rec in get_faxes loop
            if output_list.count + fax_rec.num_ids <= input_number then
                for id_rec in get_ids(fax_rec.fax) loop
                    v_memberof := False;
                    for i in 1..output_list.count loop
                        if output_list(i).id = id_rec.otherid then
                            v_memberof := true;
                        end if;
                    end loop;
                    if not v_memberof then
                        output_list.extend(1);
                        output_list(output_list.count).id := id_rec.otherid;
                        output_list(output_list.count).fax := fax_rec.fax;
                    end if;
                end loop;
            end if;
        end loop;
        for i in 1..output_list.last loop
            dbms_output.put_line('id: ' || output_list(i).id || '  fax:' || output_list(i).fax);
        end loop;
    end;
    

    This now returns the following:

    id: 11098554  fax:2063504752
    id: 56200936  fax:2080906666
    id: 56166614  fax:7180930966
    id: 56159509  fax:7180930966
    id: 25138850  fax:7182160901
    id: 56148974  fax:7182232046
    

    If you actually need a random selection, you can change the order by to use dbms_random.random instead of fax.

    0 讨论(0)
  • 2020-12-19 15:41

    EDIT 2/13/2015 after using the accepted answer for a few months i came across a scenario that hasn't happened yet and realized that his solution only works if i need to get a number that's not too close to the total. for example, if my total number of records is 15000 and i'm asking for 12000 then his code will give 10 or 11k. if i ask for 8k then i will probably get the 8.

    i don't understand what his code does and he never replied so i can't explain why this is happening, my guess is that he's taking the counts in a certain order and since the results are dependent on the order the faxes are sorted in - he won't necessarily get the best results every time. when there's enough room (asking 8l out of 15k) he has enough room for any combination to yield the acceptable result but once you ask for a tighter number (12k out of 15k) he's locked into his order and runs out of acceptable counts fast enough.

    so this is the code that will give correct result no matter what. it's not nearly as elegant and is extremely slow but it works.

    12/13/14 i think i got it, PL/SQL, not the best solution by far but it gives better results than what they currently get by hand. actually, would be really interested to hear about possible problems

    12/13/14 EDIT the accepted answer is the way to do it, i'm only leaving this here for contrast, so people can see how not to code lol.

    DECLARE
         CountsNeededTotal NUMBER;
         CountsNeededRemaining NUMBER;
         CurCountsTotal NUMBER;
         CurFaxCount NUMBER;
         CurFaxCountPicked NUMBER;
    BEGIN
         CountsNeededTotal := 420;
         CurCountsTotal := 0;
         CurFaxCount := 0;
    
         CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
    
         EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_121';
    
    
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --START BLOCK
         --this block jsut gets the first fax, the fax with the largest number of people
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
    
         --get the first fax with the most people as long as thta number isn't larger than the number needed
         SELECT MAX(CountOfPeople) CountOfPeople
        INTO CurFaxCount
        FROM (SELECT     fax
                ,COUNT(1) CountOfPeople
               FROM NR_PVO_120
              GROUP BY Fax
             HAVING COUNT(1) <= CountsNeededRemaining);
    
         COMMIT;
    
         --if there is a number that's not larger then add to the table and keep looping
         --if there isn't then there's no providers from this campaign that can be used
         IF CurFaxCount >= 0 THEN
           --insert into the 121 table (final list of faxes)
           INSERT INTO NR_PVO_121
             SELECT   fax
                  ,COUNT(1) CountOfPeople
                 FROM NR_PVO_120
               HAVING COUNT(1) = (SELECT MAX(CountOfPeople) CountOfPeople
                           FROM (SELECT   fax
                                   ,COUNT(1) CountOfPeople
                                  FROM NR_PVO_120
                              GROUP BY Fax
                                HAVING COUNT(1) <= CountsNeededTotal))
             GROUP BY Fax;
    
    
    
           COMMIT;
    
           --############################################################################################
           --############################################################################################
           --############################################################################################
           --############################################################################################
           --############################################################################################
           --START BLOCK
           --this block loops through remaining faxes
           --############################################################################################
           --############################################################################################
           --############################################################################################
           --############################################################################################
           --############################################################################################
    
    
    
           SELECT SUM(CountOfPeople) INTO CurCountsTotal FROM NR_PVO_121;
    
    
           IF CurCountsTotal < CountsNeededTotal THEN
             CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
    
    
             --loop until counts needed remaining is 0 or as close as 0 as possible without going in the negative
             WHILE CountsNeededRemaining >= 0 LOOP
                  --clear 122 table
                  EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_122';
    
    
                  --loop through all faxes in 120 table  MINUS the ones in the 121 table
                  DECLARE
                    CURSOR CurRec  IS
                      SELECT DISTINCT Fax
                        FROM NR_PVO_120
                       WHERE Fax NOT IN (SELECT Fax FROM NR_PVO_121);
                    PVO CurRec%ROWTYPE;
                  BEGIN
                    OPEN CurRec;
                    LOOP
                      FETCH CurRec INTO PVO;
    
                      SELECT DISTINCT COUNT(OtherID) CountOfPeople
                        INTO CurFaxCount
                        FROM NR_PVO_120
                       WHERE     Fax = PVO.fax
                          AND OtherID NOT IN (SELECT DISTINCT OtherID
                                       FROM NR_PVO_120
                                      WHERE fax IN (SELECT Fax FROM NR_PVO_121));
                      --                                                          DBMS_OUTPUT.put_line('CurFaxCount ' || CurFaxCount);
                      --                                                          DBMS_OUTPUT.put_line('CountsNeededRemaining ' || CountsNeededRemaining);
    
                      IF CurFaxCount <= CountsNeededRemaining THEN
                        --record their unique counts in 122 table IF THEY'RE NOT LARGER THAN CountsNeededRemaining
                        INSERT INTO NR_PVO_122
                             SELECT PVO.fax
                                ,CurFaxCount
                            FROM DUAL;
    
                        COMMIT;
                      END IF;
                      EXIT WHEN CurRec%NOTFOUND;
                    --end fax loop
                    END LOOP;
                    CLOSE CurRec;
                  END;
    
    
                  --pick the highest count from 122 table
                  SELECT MAX(CountOfPeople) CountOfPeople INTO CurFaxCountPicked FROM NR_PVO_122;
    
                  --add this fax to the 121 table
                  INSERT INTO NR_PVO_121
                    SELECT MIN(Fax) Fax
                       ,CurFaxCountPicked
                      FROM NR_PVO_122
                     WHERE CountOfPeople = CurFaxCountPicked;
    
    
                  COMMIT;
                  --add the counts to the CurCountsTotal
                  CurCountsTotal := CurCountsTotal + CurFaxCountPicked;
                  --recalc   CountsNeededRemaining
                  CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
                  --
                  --                                                          DBMS_OUTPUT.put_line('CurCountsTotal ' || CurCountsTotal);
                  --                                                          DBMS_OUTPUT.put_line('CurFaxCountPicked ' || CurFaxCountPicked);
                  --                                                          DBMS_OUTPUT.put_line('CurFaxCount ' || CurFaxCount);
                  --                                                          DBMS_OUTPUT.put_line('CountsNeededRemaining ' || CountsNeededRemaining);
                  --                                                          DBMS_OUTPUT.put_line('CountsNeededTotal ' || CountsNeededTotal);
    
                  --clear 122 table
                  EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_122';
             --end while loop
             END LOOP;
           END IF;
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --END BLOCK
         --this block loops through remaining faxes
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
         --############################################################################################
    
    
    
         END IF;
    --############################################################################################
    --############################################################################################
    --############################################################################################
    --############################################################################################
    --############################################################################################
    --END BLOCK
    --this block jsut gets the first fax, the fax with the largest number of people
    --############################################################################################
    --############################################################################################
    --############################################################################################
    --############################################################################################
    --############################################################################################
    
    
    
    END;
    

    here's a better version, MUCH faster than the above but it probably won't return perfect results in some cases. i wasn't able to get wrong results while testing but there is a possibility because i'm not trying every possible combination (as in the first version), that takes days to finish for a dataset of 20K records

    DECLARE
        CountsNeededTotal NUMBER;
        CountsNeededRemaining NUMBER;
        CurCountsTotal NUMBER;
    BEGIN
        CurCountsTotal := 0;
    
        SELECT NoOfProvToKeep INTO CountsNeededTotal FROM NR_PVO_121;
    
        CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
    
        EXECUTE IMMEDIATE 'TRUNCATE TABLE nr_pvo_122';
    
    
        COMMIT;
    
        IF CurCountsTotal <= CountsNeededTotal THEN
            --loop until counts needed remaining is 0 or as close as 0 as possible without going in the negative
            WHILE CountsNeededRemaining > 0 LOOP
                --clear 122 table
                INSERT INTO NR_PVO_122
                    SELECT Fax
                          ,CountOfPeople
                      FROM (SELECT   DISTINCT COUNT(OtherID) CountOfPeople
                                   ,Fax
                           FROM NR_PVO_120
                          WHERE OtherID NOT IN (SELECT DISTINCT OtherID
                                        FROM NR_PVO_120
                                       WHERE fax IN (SELECT Fax FROM NR_PVO_122))
                         HAVING COUNT(1) <= CountsNeededRemaining
                            GROUP BY fax
                            ORDER BY 1 DESC)
                     WHERE ROWNUM = 1;
    
    
    
                SELECT SUM(CountOfPeople) INTO CurCountsTotal FROM NR_PVO_122;
    
                COMMIT;
                --recalc   CountsNeededRemaining
                CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
            --
            --DBMS_OUTPUT.put_line('CurCountsTotal ' || CurCountsTotal || ', CountsNeededRemaining ' || CountsNeededRemaining);
            --end while loop
            END LOOP;
        END IF;
    
    
    
        DELETE FROM NR_PVO_112
              WHERE NVL(Fax, '999999999999') NOT IN (SELECT Fax FROM NR_PVO_122);
    END;
    
    0 讨论(0)
  • 2020-12-19 15:44

    Data Tested at Beginning. Note OtherID is in Col A and Fax in Col B: Before First we are going to find the number of Unique IDs you want NOTE: YOU WILL NEED A NEW SHEET CALLED "Use Me". We will need a custom function for this. This function can be run as an cell formula with syntax =UniqueItems(B2:D5) but we are going to use it in our Sub:

    Function UniqueItems(ArrayIn, Optional Count As Variant) As Variant
    '   Accepts an array or range as input
    '   If Count = True or is missing, the function returns the number of unique elements
    '   If Count = False, the function returns a variant array of unique elements
        Dim Unique() As Variant ' array that holds the unique items
        Dim Element As Variant
        Dim i As Integer
        Dim FoundMatch As Boolean
    '   If 2nd argument is missing, assign default value
        If IsMissing(Count) Then Count = True
    '   Counter for number of unique elements
        NumUnique = 0
    '   Loop thru the input array
        For Each Element In ArrayIn
            FoundMatch = False
    '       Has item been added yet?
            For i = 1 To NumUnique
                If Element = Unique(i) Then
                    FoundMatch = True
                    Exit For '(exit loop)
                End If
            Next i
    AddItem:
    '       If not in list, add the item to unique list
            If Not FoundMatch And Not IsEmpty(Element) Then
                NumUnique = NumUnique + 1
                ReDim Preserve Unique(NumUnique)
                Unique(NumUnique) = Element
            End If
        Next Element
    '   Assign a value to the function
        If Count Then UniqueItems = NumUnique Else UniqueItems = Unique
    End Function
    

    Here is the sub you need to find your Unique IDs and copy them over to the sheet "Use Me"

    Sub FaxesToUse()
        Dim LastRow As Long, CurRow As Long, UniqueTotal As Long, SubTotal As Long
    
        UniqueTotal = InputBox("How Many Unique OtherIDs is Max?")
        If Not UniqueTotal > 0 Then
            Exit Sub
        End If
    
        LastRow = Range("A" & Rows.Count).End(xlUp).Row
        SubTotal = 0
        For CurRow = 2 To LastRow
            SubTotal = UniqueItems(Range("A2:A" & CurRow))
            If SubTotal > UniqueTotal Then
                SubTotal = UniqueItems(Range("A2:A" & CurRow - 1))
                Range("A1:B" & CurRow - 1).Copy
                Sheets("Use Me").Cells.Clear
                Sheets("Use Me").Range("A1").PasteSpecial xlPasteValues
                Sheets("Use Me").Activate
                MsgBox "Use Me Sheet rows contain " & SubTotal & " Unique OtherIDs"
                Exit Sub
            End If
            Cells(CurRow, 1).EntireRow.Interior.Color = RGB(255, 255, 0)
        Next CurRow
    
    End Sub
    

    That will get you a page that looks like this: After Faxes Runs Now we just need to remove all the duplicate Faxes using this macro:

    Sub RemoveDups()
    
    Dim CurRow As Long, LastRow As Long, LastCol As Long, DestLast As Long, DestRng As Range, ws As Worksheet
    
    Set ws = Sheets("Use Me")
    LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row
    
    For CurRow = LastRow To 3 Step -1
         Set DestRng = ws.Range("B2:B" & CurRow - 1).Find(ws.Range("B" & CurRow).Value, LookIn:=xlValues, LookAt:=xlWhole, SearchDirection:=xlNext)
         If DestRng Is Nothing Then
             'Do Nothing
         Else
            DestLast = ws.Cells(DestRng.Row, Columns.Count).End(xlToLeft).Column + 1
            ws.Cells(DestRng.Row, DestLast).Value = ws.Cells(CurRow, 1).Value
            ws.Cells(CurRow, 1).EntireRow.Delete xlShiftUp
         End If
         Next CurRow
    ws.Columns("B:B").Cut
    ws.Columns("A:A").Insert Shift:=xlToRight
    Application.CutCopyMode = False
    
    LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row
    LastCol = 0
    For CurRow = 2 To LastRow
        If ws.Cells(CurRow, Columns.Count).End(xlToLeft).Column > LastCol Then
            LastCol = ws.Cells(CurRow, Columns.Count).End(xlToLeft).Column
        End If
    Next CurRow
    
    MsgBox "Use Me Sheet Rows contain " & UniqueItems(ws.Range(Cells(2, 2), Cells(LastRow, LastCol))) & " Unique OtherIDs"
    
    End Sub
    

    Leave you with this: After Remove Dups

    0 讨论(0)
提交回复
热议问题