Trying to figure out how to build a query in CakePHP where I can select all Events that are between X and Y dates (user-entered dates).
The problem lies in that the
In this kind of situation, I tend not to use Cake's associations, or Containable, and craft the joins myself:
$events = $this->Event->find('all', array(
'joins'=>array(
array(
'table' => $this->Schedule->table,
'alias' => 'Schedule',
'type' => 'INNER',
'foreignKey' => false,
'conditions'=> array(
'Schedule.event_id = Event.id',
),
),
array(
'table' => $this->Date->table,
'alias' => 'Date',
'type' => 'INNER',
'foreignKey' => false,
'conditions'=> array(
'Date.schedule_id = Schedule.id',
),
),
),
'conditions'=>array(
'Date.start >=' => $start_date,
'Date.start <=' => $end_date,
),
'order'=>'Event.created DESC',
'limit'=>5
));
It's a bit chunky, but results in the exact query I want.
UPDATE
Let's break your code in parts and see where we could improve it. The first part is the preparation for the find
. I've rewritten your code trying to make it shorter, and this is what I came up with:
// Default options go here
$defaultOpts = array(
'start' => date('Y-m-d') . ' 00:00:00',
'end' => date('Y-m-d') . ' 23:59:59',
'limit' => 10
)
// Use default options if nothing is passed, otherwise merge passed options with defaults
$opts = is_array($opts) ? array_merge($defaultOpts, $opts) : $defaultOpts;
// Initialize array to hold query conditions
$conditions = array();
//date conditions
$conditions[] = array(
"Date.start >=" => $qOpts['start'],
"Date.start <=" => $qOpts['end'],
));
//cities conditions
if(isset($opts['cities']) && is_array($opts['cities'])) {
$conditions['OR'] = array();
$conditions['OR'][] = array('Venue.city_id'=>$opts['cities']);
$conditions['OR'][] = array('Restaurant.city_id'=>$opts['cities']);
}
//event types conditions
//$opts['event_types'] = array('1');
if(isset($opts['event_types']) && is_array($opts['event_types'])) {
$conditions[] = 'EventTypesEvents.event_type_id' => $opts['event_types']
}
//event sub types conditions
if(isset($opts['event_sub_types']) && is_array($opts['event_sub_types'])) {
$conditions[] = 'EventSubTypesEvents.event_sub_type_id' => $opts['event_sub_types']
}
//event sub sub types conditions
if(isset($opts['event_sub_types']) && is_array($opts['event_sub_sub_types'])) {
$conditions[] = 'EventSubSubTypesEvents.event_sub_sub_type_id' => $opts['event_sub_sub_types']
}
Notice that I eliminated most of the ORs. That's because you can pass an array as a value in conditions
, and Cake will make it an IN(...)
statement in the SQL Query. For example: 'Model.field' => array(1,2,3)
generates 'Model.field IN (1,2,3)'
. This works just like ORs, but requires less code. So the code block above does exactly the same your code was doing, but it's shorter.
Now comes the complex part, the find
itself.
Usually I'd recommend the forced joins alone, without Containable, and with 'recursive'=>false
. I believe this usually is the best way to deal with complex finds. With Associations and Containable, Cake runs several SQL queries against the database (one query per Model/table), which tends to be inefficient. Also, Containable not always returns the expected results (as you noticed when you tried it).
But since in your case there are four complex associations involved, maybe a mixed approach will be the ideal solution - otherwise, it would be too complicated to clean-up the duplicate data. (The 4 complex associations are: Event hasMany Dates [through Event hasMany Schedule, Schedule hasMany Date], Event HABTM EventType, Event HABTM EventSubType, Event HABTM EventSubSubType). So, we could let Cake handle data retrieval of EventType, EventSubType and EventSubSubType, avoiding too many duplicates.
So here is what I suggest: use joins for all the required filtering, but do not include Date and [Sub[Sub]]Types in fields. Because of the model associations you have, Cake will automatically run extra queries against the DB to fetch those bits of data. No Containable needed.
The code:
// We already fetch the data from these 2 models through
// joins + fields, so we can unbind them for the next find,
// avoiding extra unnecessary queries.
$this->unbindModel(array('belongsTo'=>array('Restaurant', 'Venue'));
$data = $this->find('all', array(
// The other fields required will be added by Cake later
'fields' => "
Event.*,
Restaurant.id, Restaurant.name, Restaurant.slug, Restaurant.address, Restaurant.GPS_Lon, Restaurant.GPS_Lat, Restaurant.city_id,
Venue.id, Venue.name, Venue.slug, Venue.address, Venue.GPS_Lon, Venue.GPS_Lat, Venue.city_id,
City.id, City.name, City.url_name
",
'joins' => array(
array(
'table' => $this->Schedule->table,
'alias' => 'Schedule',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => 'Schedule.event_id = Event.id',
),
array(
'table' => $this->Schedule->Date->table,
'alias' => 'Date',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => 'Date.schedule_id = Schedule.id',
),
array(
'table' => $this->EventTypesEvent->table,
'alias' => 'EventTypesEvents',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => 'EventTypesEvents.event_id = Event.id',
),
array(
'table' => $this->EventSubSubTypesEvent->table,
'alias' => 'EventSubSubTypesEvents',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => 'EventSubSubTypesEvents.event_id = Event.id',
),
array(
'table' => $this->Restaurant->table,
'alias' => 'Restaurant',
'type' => 'LEFT',
'foreignKey' => false,
'conditions' => 'Event.restaurant_id = Restaurant.id',
),
array(
'table' => $this->City->table,
'alias' => 'RestaurantCity',
'type' => 'LEFT',
'foreignKey' => false,
'conditions' => 'Restaurant.city_id = city.id',
),
array(
'table' => $this->Venue->table,
'alias' => 'Venue',
'type' => 'LEFT',
'foreignKey' => false,
'conditions' => 'Event.venue_id = Venue.id',
),
array(
'table' => $this->City->table,
'alias' => 'VenueCity',
'type' => 'LEFT',
'foreignKey' => false,
'conditions' => 'Venue.city_id = city.id',
),
),
'conditions' => $conditions,
'limit' => $opts['limit'],
'recursive' => 2
));
We eliminated contains
, and some of the extra queries Cake was running because of it. Most joins are of type INNER
. This means that at least one record must exist on both tables involved in the join, or you'll get less results then you'd expect. I'm assuming each Event takes place at a Restaurant OR a Venue, but not both, that's why I used LEFT
for those tables (and cities). If some of the fields used in the joins are optional, you should use LEFT
instead of INNER
on the related joins.
If we used 'recursive'=>false
here, we'd still get the right events, and no data repetition, but dates and [Sub[Sub]]Types would be missing. With the 2 levels of recursion, Cake will automatically loop through the returned events, and for each event it will run the necessary queries to fetch the associated model data.
This is almost what you were doing, but without Containable, and with a few extra tweaks. I know it's still a long, ugly and boring piece of code, but after all there are 13 database tables involved...
This is all untested code, but I believe it should work.
Dave, you have a reasonably complicated problem to solve, which requires more than basic Cake. You must understand what's going on in order to be able to solve it. I'm assuming you don't have much experience with SQL, and don't know much 'under-the-hood' Cake. So I'll try to explain the basics here.
Consider you have two tables, called 'main' and 'related':
main related id | val id | main_id | val 1 | A 1 | 1 | Foo 2 | B 2 | 1 | FooBar 3 | C 3 | 2 | Bar 4 | D 4 | 3 | BarFoo
In Cake, you'll have models Main and Related to deal with them. Main hasMany Related
, and Related belongsTo Main
. Now you do the following (from a method inside Main):
$data = $this->find('all', array(
'recursive' => 1
));
Here is what Cake will do behind the scenes:
Retrieve all rows from table 'main'
SELECT * FROM main
With the results, Cake will build an array of IDs, which will then be used to get the data for the associated model Related. This data will be fetched from MySQL using a query like this:
SELECT * FROM related WHERE main_id IN ([comma_separated_list_of_ids_here])
Finally, Cake will loop through results array from Main, and add the related data to each row as applicable. When it finishes, it returns the "decorated" array.
Sometimes, depending on the type of association, Cake will do an extra SQL query for every row retrieved for the main model. That can be really slow. The solution would be to use single query to get data from both tables, and that's what JOINs are for. The problem with that is data repetition. For example:
SELECT Main.*, Related.*
FROM main as Main
INNER JOIN related AS Related
ON Related.main_id = main.id
Results:
Main.id | Main.val | Related.id | Related.main_id | Related.val 1 | A | 1 | 1 | Foo 1 | A | 2 | 1 | FooBar 2 | B | 3 | 2 | Bar 3 | C | 4 | 3 | BarFoo
Things to notice here:
We have 2 rows for Main.id = 1. The difference between them is at Related.id and Related.val. If you remove those columns from the SELECT clause, the repetition will go away. This is very useful if you need to add conditions on the related table. For example:
SELECT DISTINCT Main.*
FROM main as Main
INNER JOIN related AS Related
ON Related.main_id = main.id
WHERE Related.val LIKE '%Foo%'
Gives:
Main.id | Main.val 1 | A 3 | C
There are actually 2 rows on related that match on our conditions (Foo and FooBar), but A shows up only once in the results beacause we didn't ask SQL to display Related.val, and also told it to ignore exact duplicates (with DISTINCT
).
INNER JOIN
, which limits the results to rows from Main that also have one or more corresponding rows on Related. If we used a LEFT JOIN
, the results would have an extra line, as below:Main.id | Main.val | Related.id | Related.main_id | Related.val 1 | A | 1 | 1 | Foo 1 | A | 2 | 1 | FooBar 2 | B | 3 | 2 | Bar 3 | C | 4 | 3 | BarFoo 4 | D | NULL | NULL | NULL
(if you need more details on INNER vs. LEFT JOINs, see here). (EDIT: link updated)
Back to the duplicates: it's easy to clean them up with a simple foreach
loop in PHP. It's simple when there are just 2 tables involved, but becomes more and more complex for every extra table you add to the query (if the new table has a one-to-many relationship with either main or related).
But you do have lots of tables and associations involved. So, the solution I suggested above is somewhat a compromise between performance and code simplicity. Let me try to explain my line of thought when I wrote it.
You need to deal with 13 tables to get all the data you want. You need to display data that comes from most of those tables, and need to filter events based on quite a few tables too.
Cake alone can't understand what you want, and will return too much data, including stuff you expected it to have filtered out.
There are some 1-n and n-n relashionships involved. If you jast add all 13 to a single query with JOINs, the result will have too many dupes, and will be unmanageable.
So I decided to try a mixed approach: start by getting a filtered list of events, with no dupes, then let Cake 'decorate' it with data from some associated models. To do that, you must:
JOIN
all tables which need conditions applied to them. This will allow us to retrieve our final list of events, considering all conditions, with a single query.JOIN
ed can cause duplicates, do not include their fields in the SELECT
clause (or Cake's fields
list). If the associations are properly setup, Cake will run an extra query later to get the associated data (since we used recursive=2
).If this is still returning fields you don't want, and such fields come from associated models, you must use Containable to tell Cake which fields you want from each of those models.
I know it might sound complicated, but you won't be able to solve this on your own unless you understand what Cake does, and how SQL works. I hope this helps.
You could try the following, assuming Cake 1.3 and using containable behaviour. I assumed that the date fields in your table are called start_date
and end_date
, so these conditions might need adjusting.
This may, or may not work, and will likely produce a massive query if you have a lot of data, so some further refinement is probably necessary - certainly with the fields selected.
You could post the query generated, if it runs, and we may be able to help adjust it further.
/* in your EventController (the method) for on the fly */
$this->Event->Behaviors->attach('Containable');
/* Your dates */
$x_date = '2011-06-01';
$y_date = '2011-07-01';
$this->paginate = array(
'limit'=>10,
'order'=>'Event.created DESC',
'contain'=>array(
'Schedule',
'Event'=>array(
'conditions'=>array('Event.start_date'=>$x_date,
'Event.end_date'<=$y_date)
)
),
);
$this->set('events',$this->Paginate('Event'));
// print_r($events);
GROUP_CONCAT to the rescue!!! Long story short - I needed to return Events with their many Dates (with being able to query against different HABTM tables) - but when I tried, I'd either get way too many events (one for each date...etc) or I'd use GROUP BY, and not get all the dates. The answer... still use GROUP BY, but combine the Dates into a single field using GROUP_CONCAT:
$qOpts['fields'] = array(
...
'GROUP_CONCAT(Date.start, "|", Date.end ORDER BY Date.start ASC SEPARATOR "||") AS EventDates'
);
I'm posting a lot of code - feel free to browser if you got stuck like I did.
Things I learned:
Hope this helps someone. Feel free to point out any issues w/ my code - I always like to improve. But for now, I'm dancing in circles because it WORKS!!! Now that it works, I'm going to go back in and try to clean up those ORs like @bfavaretto mentioned.
//returns events based on category, subcategory, and start/end datetimes
function getEvents($opts = null) {
//$opts = limit, start(date), end(date), types, subtypes, subsubtypes, cities, paginate(0,1), venues, excludes(event ids)
$qOpts['conditions'] = array();
//order
$qOpts['order'] = 'Date.start ASC';
if(isset($opts['order'])) $qOpts['order'] = $opts['order'];
//dates
$qOpts['start'] = date('Y-m-d') . ' 00:00:00';
if(isset($opts['start'])) $qOpts['start'] = $opts['start'];
//limit
$qOpts['limit'] = 10;
if(isset($opts['limit'])) $qOpts['limit'] = $opts['limit'];
//event excludes (example: when you want "other events at this venue", you need to exclude current event)
if(isset($opts['excludes'])) {
if(is_array($opts['excludes'])) {
foreach($opts['excludes'] as $exclude_id) {
array_push($qOpts['conditions'], array('Event.id <>' => $exclude_id));
}
}
}
//approval status conditions
if(!isset($opts['approval_statuses'])) $opts['approval_statuses'] = array('1'); //default 1 = approved
if(isset($opts['approval_statuses'])) {
if(is_array($opts['approval_statuses'])) {
$approvalStatusesConditions['OR'] = array();
foreach($opts['approval_statuses'] as $status) {
array_push($approvalStatusesConditions['OR'], array('Event.approval_status_id' => $status));
}
array_push($qOpts['conditions'], $approvalStatusesConditions);
}
}
//date conditions
$date_conditions = array();
array_push($qOpts['conditions'], array('Date.start >=' => $qOpts['start']));
array_push($date_conditions, array('Date.start >=' => $qOpts['start']));
if(isset($opts['end'])) {
array_push($qOpts['conditions'], array('Date.start <=' => $opts['end']));
array_push($date_conditions, array('Date.start <=' => $opts['end']));
}
//venues conditions
if(isset($opts['venues'])) {
if(is_array($opts['venues'])) {
$venueConditions['OR'] = array();
foreach($opts['venues'] as $venue_id) {
array_push($venueConditions['OR'], array('OR'=>array('Venue.id'=>$venue_id)));
}
array_push($qOpts['conditions'], $venueConditions);
}
}
//cities conditions
if(isset($opts['cities'])) {
if(is_array($opts['cities'])) {
$cityConditions['OR'] = array();
foreach($opts['cities'] as $city_id) {
array_push($cityConditions['OR'], array('OR'=>array('Venue.city_id'=>$city_id, 'Restaurant.city_id'=>$city_id)));
}
array_push($qOpts['conditions'], $cityConditions);
}
}
//event types conditions
if(isset($opts['event_types'])) {
if(is_array($opts['event_types'])) {
$eventTypeConditions['OR'] = array();
foreach($opts['event_types'] as $event_type_id) {
array_push($eventTypeConditions['OR'], array('EventTypesEvents.event_type_id' => $event_type_id));
}
array_push($qOpts['conditions'], $eventTypeConditions);
}
}
//event sub types conditions
if(isset($opts['event_sub_types'])) {
if(is_array($opts['event_sub_types'])) {
$eventSubTypeConditions['OR'] = array();
foreach($opts['event_sub_types'] as $event_sub_type_id) {
array_push($eventSubTypeConditions['OR'], array('EventSubTypesEvents.event_sub_type_id' => $event_sub_type_id));
}
array_push($qOpts['conditions'], $eventSubTypeConditions);
}
}
//event sub sub types conditions
if(isset($opts['event_sub_sub_types'])) {
if(is_array($opts['event_sub_sub_types'])) {
$eventSubSubTypeConditions['OR'] = array();
foreach($opts['event_sub_sub_types'] as $event_sub_sub_type_id) {
array_push($eventSubSubTypeConditions['OR'], array('EventSubSubTypesEvents.event_sub_sub_type_id' => $event_sub_sub_type_id));
}
array_push($qOpts['conditions'], $eventSubSubTypeConditions);
}
}
//joins
$qOpts['joins'] = array();
//Restaurants join
array_push($qOpts['joins'], array(
'table' => $this->Restaurant->table,
'alias' => 'Restaurant',
'type' => 'LEFT',
'foreignKey' => false,
'conditions' => array(
'Restaurant.id = Event.restaurant_id',
),
)
);
//Venues join
array_push($qOpts['joins'], array(
'table' => $this->Venue->table,
'alias' => 'Venue',
'type' => 'LEFT',
'foreignKey' => false,
'conditions' => array(
'Venue.id = Event.venue_id',
),
)
);
//Schedules join
array_push($qOpts['joins'], array(
'table' => $this->Schedule->table,
'alias' => 'Schedule',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => array(
'Schedule.event_id = Event.id',
),
)
);
//Dates join
array_push($qOpts['joins'], array(
'table' => $this->Schedule->Date->table,
'alias' => 'Date',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => array(
'Date.schedule_id = Schedule.id',
//$date_conditions
),
));
//Uploads join
array_push($qOpts['joins'], array(
'table' => $this->Upload->table,
'alias' => 'Upload',
'type' => 'LEFT',
'foreignKey' => false,
'conditions' => array(
'Upload.event_id = Event.id',
),
)
);
//Event types join
if(isset($opts['event_types'])) {
if(is_array($opts['event_types'])) {
array_push($qOpts['joins'], array(
'table' => $this->EventTypesEvent->table,
'alias' => 'EventTypesEvents',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => array(
'EventTypesEvents.event_id = Event.id',
),
));
}
}
if(isset($opts['event_sub_types'])) {
if(is_array($opts['event_sub_types'])) {
array_push($qOpts['joins'], array(
'table' => $this->EventSubTypesEvent->table,
'alias' => 'EventSubTypesEvents',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => array(
'EventSubTypesEvents.event_id = Event.id',
),
));
}
}
if(isset($opts['event_sub_sub_types'])) {
if(is_array($opts['event_sub_sub_types'])) {
array_push($qOpts['joins'], array(
'table' => $this->EventSubSubTypesEvent->table,
'alias' => 'EventSubSubTypesEvents',
'type' => 'INNER',
'foreignKey' => false,
'conditions' => array(
'EventSubSubTypesEvents.event_id = Event.id',
),
));
}
}
$qOpts['fields'] = array(
'Event.*',
'Venue.id', 'Venue.slug', 'Venue.name', 'Venue.GPS_Lon', 'Venue.GPS_Lat',
'Restaurant.id', 'Restaurant.slug', 'Restaurant.name', 'Restaurant.GPS_Lat', 'Restaurant.GPS_Lon',
'GROUP_CONCAT(Date.start, "|", Date.end ORDER BY Date.start ASC SEPARATOR "||") AS EventDates'
);
//group by
$qOpts['group'] = 'Event.id';
//you need to set the recursion to -1 for this type of join-search
$this->recursive = -1;
$paginate = false;
if(isset($opts['paginate'])) {
if($opts['paginate']) {
$paginate = true;
}
}
//either return the options just created (paginate)
if($paginate) {
return $qOpts;
//or return the events data
} else {
$data = $this->find('all', $qOpts);
return $data;
}
}