Print rows with condition on field data

天涯浪子 提交于 2021-02-07 20:18:36


I have file with data

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
IA2    A1,A2,A3   Z     comb    ((!A1A2)A3)        3
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
IAD    A1,A2,A3   Z     comb    (!((A1A2)A3))      3
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

From this data, I want to print rows whose level field (6th column) gives sum 7 together.

here to get level sum 7 we can select AI2O ,BUF ,INV rows giving level sum as 2+4+1=7and print them
Or can select XOR,IAD,INVgiving sum 3+3+1=7 and print them. Any random selection of rows work but level sum needs to be 7

Output can be as

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

Or output can also be

cell   input     out    type      fun            level
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
IAD    A1,A2,A3   Z     comb    (!((A1A2)A3))      3
INV    I1         ZN    comb    (!I1)              1

I tried it using awk

awk '{{ sum[i] += $6} for (i=1;i<8;i++) print $0}' file

But this is printing each row 7 times not the desired output.

Part 2. Prblm continue to part 1.

file2 with data

cell   input  out  type   fun  level
CLK    C       Z    seq   Cq   1      
DFk    C,Cp    Q    seq   IQ   1
DFR    D,C     Qn   seq   IN   1
SKN    SE,Q    Qp   seq   Iq   1

Output to get for part2

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4
CLK    C          Z     seq      Cq                1
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
IAD    A1,A2,A3   Z     comb    (!((A1A2)A3))      3
INV    I1         ZN    comb    (!I1)              1
DFk    C,Cp       Q     seq      IQ                1
IA2    A1,A2,A3   Z     comb    ((!A1A2)A3)        3
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
INV    I1         ZN    comb    (!I1)              1

output for part2 is that when we get level sum as 7 for file1, insert first line from file2 after it. And again check for condition for level sum 7 and if true insert second line from file2. Then again check for level sum as 7. If true insert 3rd line from file2. This is done for execution 3 times.


Here is an awk solution for this job:

cat rnd.awk
function rnd(max) {        # generate a randon number between 2 and max
   return int(rand()*max-1)+2
   srand()                 # seed random generation
NR == 1 {                  # for header row
   print                   # print header record
   rec[NR] = $0            # save each record in rec array with NR as key 
   num[NR] = $NF           # save last column in num array with NR as key
   while(1) {              # infinite loop
      r = rnd(NR)          # generate a randomm number between 2 and NR
      if (!seen[r]++)      # populate seen array with this random number
         s += num[r]       # get aggregate sum from num array

      if (s == 7)          # if sum is 7 then break the loop
      else if (s > 7) {    # if sum > 7 then restart the loop
         delete seen
         s = 0
   for (j in seen)         # for each val in seen print rec array
      print rec[j]

use it as:

awk -f rnd.awk file
cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

and again:

awk -f rnd.awk file
cell   input     out    type      fun            level
IA2    A1,A2,A3   Z     comb    ((!A1A2)A3)        3
XOR    A1,A2,B1   Z     comb    (((A1A2)B1)        3
INV    I1         ZN    comb    (!I1)              1


There are two places where efficiency is important in this problem:

  1. Generation of all the possible combinations;
  2. Retrieving of the right line once the combination is known.

The first issue is extremely dependent on the number of possible values that you have as "level". If you have "hundreds" of different values the number of possible combinations giving you a requested sums is going to be very very large and thus, you want to optimize that part of the algorithm.

The second part is dependent on the number of lines you have in the file. To address this problem I would create an hash table where keys are the value of the "level" and values are arrays of string with each string being one of your line. Once you have a given combination, you can generate (virtually infinite) combinations almost instantaneously with the following steps:

  1. retrieve the array of strings associated to each of the level value present in the combination;
  2. from each array of strings retrieve a random string; 3 repeat the process to get as many combinations of string as you want associated with a given combination of level numbers.


The following function will return a random combination of rows where the sum of level column is equal with the target (currently 7 as per your question). It can work with any dataframe (as long as there is a numerical column 'level') and any target:

import random

def get_one(df, target):    
    while sum(values)<target:
        dftemp=df[(df['level']<=target-sum(values)) & (df['level']>0)]
        ind1=random.choice([i for i in set(dftemp.index)-set(indices)])
        values.append(df.loc[ind1, 'level'])
    return df.loc[indices, :]

To get a result, just run the function using df and your target as parameteres:

>>>get_one(df, 7)

cell   input     out    type      fun            level
AI20   A1,A2      Z     comb    ((A1A2))           2
INV    I1         ZN    comb    (!I1)              1  
BUF  A1,A2,A3,B1  Z     comb    (!(((A1A2)A3)B1))  4

If you want other total, you can change the parameter, for example:

>>>get_one(df, 10)
>>>get_one(df, 15)


