I need a little help in the following. I have this kind of datafile:
0 0 # <--- Group 1 -- 1 house (0) and 1 room (0)
0 0 # <--- Group 2 -- 2 ho
I would first define a House class and a Group class:
class House:
def __init__(self, rooms):
self.rooms = rooms
class Group:
def __init__(self, index, houses):
self.index = index
# houses.values() is a list with number of rooms for each house.
self.houses = [House(houses[house_nr]) for house_nr in sorted(houses)]
def __str__(self):
return 'Group {}'.format(self.index)
def __repr__(self):
return 'Group {}'.format(self.index)
Then parse the data into this hierarchical structure:
with open('in.txt') as f:
groups = []
# Variable to accumulate current group.
group = collections.defaultdict(int)
i = 1
for line in f:
if not line.strip():
# Empty line found, create a new group.
groups.append(Group(i, group))
# Reset accumulator.
group = collections.defaultdict(int)
i += 1
continue
house_nr, room_nr = line.split()
group[house_nr] += 1
# Create the last group at EOF
groups.append(Group(i, group))
Then you can do stuff like this:
found = filter(
lambda g:
len(g.houses) == 1 and # Group contains one house
g.houses[0].rooms == 1, # First house contains one room
groups)
print(list(found)) # Prints [Group 1, Group 5, Group 6]
found = filter(
lambda g:
len(g.houses) == 2 and # Group contains two houses
g.houses[0].rooms == 3 and # First house contains three rooms
g.houses[1].rooms == 2, # Second house contains two rooms
groups)
print(list(found)) # Prints [Group 2]
I don't know what would be your expected output, however I have converted/decoded your number pattern to a meaningful group/house/rooms format. any further "query" could be done on this content.
see below:
kent$ cat file
0 0
0 0
0 1
0 2
1 0
1 1
0 0
0 1
0 2
0 0
1 0
2 0
3 0
0 0
0 0
awk:
kent$ awk 'BEGIN{RS=""}
{ print "\ngroup "++g;
delete a;
for(i=1;i<=NF;i++) if(i%2) a[$i]++;
for(x in a) printf "House#: %s , Room(s): %s \n", x, a[x]; }' file
we get output:
group 1
House#: 0 , Room(s): 1
group 2
House#: 0 , Room(s): 3
House#: 1 , Room(s): 2
group 3
House#: 0 , Room(s): 3
group 4
House#: 0 , Room(s): 1
House#: 1 , Room(s): 1
House#: 2 , Room(s): 1
House#: 3 , Room(s): 1
group 5
House#: 0 , Room(s): 1
group 6
House#: 0 , Room(s): 1
note that the generated format could be changed to fit your "filter" or "query"
UPDATE
OP's comment:
I need to know, the number of the group(s) which have/has for example 1 house with one room. The output would be in the above case: 1, 5 ,6
as I said, based on your query criteria, we could adjust the awk output for next step. now I change the awk abovet to:
awk 'BEGIN{RS=""}
{print ""; gid=++g;
delete a;
for(i=1;i<=NF;i++) if(i%2) a[$i]++;
for(x in a) printf "%s %s %s\n", gid,x, a[x]; }' file
this will output:
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
6 0 1
the format is groupIdx houseIdx numberOfRooms
and there is a blank line between groups. we save the text above to a file named decoded.txt
so your query could be done on this text:
kent$ awk 'BEGIN{RS="\n\n"}{if (NF==3 && $3==1)print $1}' decoded.txt
1
5
6
the last awk line above means, print the group number, if room number ($3) = 1 and there is only one line in the group block.
Perl solution. It converts the input into this format:
1|0
2|1 2
3|2
4|0 0 0 0
5|0
6|0
The first column is group number, in second column there are number of rooms (minus one) of all its houses, sorted. To search for group with two different houses with 2 and 3 rooms, you can just grep '|1 2$'
, to search for groups with just one house with one room, grep '|0$'
#!/usr/bin/perl
#-*- cperl -*-
#use Data::Dumper;
use warnings;
use strict;
sub report {
print join ' ', sort {$a <=> $b} @_;
print "\n";
}
my $group = 1;
my @last = (0);
print '1|';
my @houses = ();
while (<>) {
if (/^$/) { # group end
report(@houses, $last[1]);
undef @houses;
print ++$group, '|';
@last = (0);
} else {
my @tuple = split;
if ($tuple[0] != $last[0]) { # new house
push @houses, $last[1];
}
@last = @tuple;
}
}
report(@houses, $last[1]);
It is based on the fact that for each house, only the last line is important.