My question is the following:
If you look below you\'ll see there is a datastructure with message ids and then the final datastructure containing the message detail
When you print_r the $emails array what structure you get? Maybe the below should do it?
$threads[$root][$key] = $emails[$key];
Remember that in php whatever function you use it will be finally converted to some sort of loop. There are, however some steps you could take to make it more efficient and they are different in PHP 5.5 and in 5.3/5.4.
The most efficient way of doing this would be to split the function to 2 separate steps. In first step you would generate a map of keys for the list of emails.
$keys = array();
foreach($emails as $k => $email)
{
$keys[$email->msgno] = $k;
}
In 2nd step you iterate all values in the multi-dimensional $threads and replace them with the email details:
// Iterate threads
$threads = array_map(function($thread) use($emails, $keys)
{
// Iterate emails in these threads
return array_map(function($msgno) use($emails, $keys)
{
// Swap the msgno with the email details
return $emails[$keys[$msgno]];
}, $thread);
}, $threads);
Proof of concept: http://pastebin.com/rp5QFN4J
Explanation of keyword use in anonymous functions:
In order to make use of variables defined in the parent scope, it is possible to import variables from the parent scope into the closure scope with the use () keyword. Although it was introduced in PHP 5.3 it hasn't been documented in the official PHP manual yet. There's only a draft document on php's wiki here https://wiki.php.net/rfc/closures#userland_perspective
One of the new features in this version enables you to use generators, which have significantly smaller memory thumbprint thus are more efficient.
Explanation of keyword yield in generators:
The heart of a generator function is the yield keyword. In its simplest form, a yield statement looks much like a return statement, except that instead of stopping execution of the function and returning, yield instead provides a value to the code looping over the generator and pauses execution of the generator function.
1st step:
function genetateKeyMap($emails)
{
foreach($emails as $k => $email)
{
// Yielding key => value pair to result set
yield $email->msgno => $k;
}
};
$keys = iterator_to_array(genetateKeyMap($emails));
2nd step:
function updateThreads($emails, $threads, $keys)
{
foreach($threads as $thread)
{
$array = array();
// Create a set of detailed emails
foreach($thread as $msgno)
{
$array[] = $emails[$keys[$msgno]];
}
// Yielding array to result set
yield $array;
}
};
$threads = iterator_to_array(updateThreads($emails, $threads, $keys));
A few words about the values being returned by genrators:
Generators return an object which is an instance of SPL Iterator thus it needs to use iterator_to_array() in order to convert it into exactly the same array structure your code is expecting. You don't need to do this, but it would require an update of your code following the generator function, which could be even more efficient.
Proof of concept: http://pastebin.com/9Z4pftBH
I generated a list of 7000 threads with 5 messages each and tested the performance of each method (avg from 5 tests):
Takes: Memory used:
----------------------------
3x foreach(): 2.8s 5.2 MB
PHP 5.3/5.4 way 0.061s 2.7 MB
PHP 5.5 way 0.036s 2.7 MB
Although the results on your machine/server might be different but the overview shows that the 2-step method is around 45-77 times faster than using 3 foreach loops
Test script: http://pastebin.com/M40hf0x7
I don't have access to PHP right now, to test, but I believe what you're trying to do is something like
foreach($emails as $email) {
foreach($threads as $root => $messages) {
foreach($messages as $index =>$message_id){
if($message_id == $email->msgno){
$threads[$root][$index] = $email;
}
}
}
}
That being said, even if this works, there is probably a more efficient way to approach this than with three nested loops. What is your reason for needing to store the output in this format?
An implementation with branches (more complex then a single thread array('5' => array(5,7,8))
, but unless I was only talking to 1 person, threads always tend to branch for me personally, so I'll have to cope with the added complexity)
<?php
$threads = imap_thread($imap, SE_UID);
/*
* threads returns entries as follows:
* <id>.num = <messageid>
* <id>.next = <messageid of first reply to <id>>, 0 = no replies
* <id>.branch = <messageid of nth. reply to <parent of id>>, 0 = no more branches
* Keep in mind: _every_ message 'starts' a branch, but that may be empty.
*/
$nodes = array( 0 => array( 'children' => array()));
$ids = array();
foreach ($threads as $key => $val) {
list($treeid,$type) = explode('.',$key);
switch($type){
case 'num':
//the actual message number of this tree node
//store id for retrieval later:
$ids[$val] = null;
if($val==0){
//return to root
$nodes[$treeid] = &$nodes[0];
} else {
if(!isset($nodes[$treeid])) $nodes[$treeid] = array();
$nodes[$treeid] = array_merge($nodes[$treeid],array(
'id' => $val,
'message' => &$ids[$val],
'treeid' => $treeid));
}
break;
case 'next':
// 0 means no next message, anything else is a reply
if (0!=$val) {
if(!isset($nodes[$val])) $nodes[$val] = array('parent' => $treeid);
$nodes[$treeid][] = &$nodes[$val];
}
break;
case 'branch':
//0 means end of branch, a number means continue as sibling \
//so we need to know the parent
if (0!=$val) {
if(!isset($nodes[$val])) $nodes[$val] = array('parent' => $nodes[$treeid]['parent']?:0);
$nodes[$nodes[$val]['parent']][] = &$nodes[$val];
}
break;
default:
trigger_error("Unknown tree traverse-type: $type", E_USER_WARNING);
}
}
//the great thing is we can get all our ID's at once:
$keystofetch = implode(',',array_filter(array_keys($nodes)));
$messages = imap_fetch_overview($imap,$keystofetch, FT_UID);
foreach($messages as $message){
// you can of course store the _whole_ message in this thread like:
// $nodes[$message->uid]['message'] = get_object_vars($message);
// and do what you like with $tree[0]['children'] (be it a resursive array iterator,
// or a resursive function, your pick.
// However, for this example we are going to only set message to a string of p.o.c
// (which is also nicer for our treeiterator)
$ids[$message->uid] = $message->from.':'.$message->subject;
}
//let's show the result:
$it = new RecursiveTreeIterator(new RecursiveArrayIterator($nodes[0]),
RecursiveTreeIterator::BYPASS_CURRENT,
CachingIterator::TOSTRING_USE_KEY);
foreach($it as $key => $item){
echo "$key".(is_scalar($item)?': '.$item:'').PHP_EOL;
}
Which give us:
|-children
|-0
| |-parent: 0
| |-id: 35
| |-message: Friend Purple Acc2 <purple2@example.com>:A bigger message thread
| |-treeid: 1
| \-0
| |-parent: 1
| |-id: 7
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 2
| \-0
| |-parent: 2
| |-id: 11
| |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| |-treeid: 3
| \-0
| |-parent: 3
| |-id: 39
| |-message: Friend Purple Acc2 <purple2@example.com>:Re: A bigger message thread
| |-treeid: 4
| \-0
| |-parent: 4
| |-id: 40
| |-message: Friend Pink <pink@example.com>:Re: A bigger message thread
| |-treeid: 5
| \-0
| |-parent: 5
| |-id: 38
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 6
| \-0
| |-parent: 6
| |-id: 12
| |-message: Friend Pink <pink@example.com>:Re: A bigger message thread
| |-treeid: 7
| \-0
| |-parent: 7
| |-id: 25
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 8
| \-0
| |-parent: 8
| |-id: 19
| |-message: Friend Black <black@example.com>:Re: A bigger message thread
| |-treeid: 9
| \-0
| |-parent: 9
| |-id: 23
| |-message: Friend Black <black@example.com>:Re: A bigger message thread
| |-treeid: 10
| \-0
| |-parent: 10
| |-id: 30
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 11
| \-0
| |-parent: 11
| |-id: 2
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 12
| |-0
| | |-parent: 12
| | |-id: 20
| | |-message: Me <me@example.com>:Re: A bigger message thread
| | |-treeid: 13
| | \-0
| | |-parent: 13
| | |-id: 1
| | |-message: Fiend Silver <silver@example.com>:Re: A bigger message thread
| | |-treeid: 14
| | \-0
| | |-parent: 14
| | |-id: 41
| | |-message: Fiend Silver <silver@example.com>:Re: A bigger message thread
| | |-treeid: 15
| | \-0
| | |-parent: 15
| | |-id: 27
| | |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| | |-treeid: 16
| | \-0
| | |-parent: 16
| | |-id: 17
| | |-message: Friend Magenta <magenta@example.com>:Re: A bigger message thread
| | |-treeid: 17
| | |-0
| | | |-parent: 17
| | | |-id: 31
| | | |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| | | |-treeid: 18
| | | \-0
| | | |-parent: 18
| | | |-id: 4
| | | |-message: Friend Black <black@example.com>:Re: A bigger message thread
| | | |-treeid: 19
| | | \-0
| | | |-parent: 19
| | | |-id: 37
| | | |-message: Friend Black <black@example.com>:Re: A bigger message thread
| | | |-treeid: 20
| | | \-0
| | | |-parent: 20
| | | |-id: 24
| | | |-message: Friend Purple Acc2 <purple2@example.com>:Re: A bigger message thread
| | | |-treeid: 21
| | | \-0
| | | |-parent: 21
| | | |-id: 13
| | | |-message: Friend White <white@example.com>:Re: A bigger message thread
| | | \-treeid: 22
| | \-1
| | |-parent: 17
| | |-id: 15
| | |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| | |-treeid: 23
| | \-0
| | |-parent: 23
| | |-id: 18
| | |-message: Friend Magenta <magenta@example.com>:Re: A bigger message thread
| | |-treeid: 24
| | \-0
| | |-parent: 24
| | |-id: 45
| | |-message: Friend Black <black@example.com>:Re: A bigger message thread
| | \-treeid: 25
| \-1
| |-parent: 12
| |-id: 46
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 26
| \-0
| |-parent: 26
| |-id: 29
| |-message: Fiend Silver <silver@example.com>:Re: A bigger message thread
| |-treeid: 27
| \-0
| |-parent: 27
| |-id: 26
| |-message: Friend Magenta <magenta@example.com>:Re: A bigger message thread
| |-treeid: 28
| |-0
| | |-parent: 28
| | |-id: 34
| | |-message: Friend Grey <grey@example.com>Re: A bigger message thread
| | \-treeid: 29
| |-1
| | |-parent: 28
| | |-id: 33
| | |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| | |-treeid: 30
| | \-0
| | |-parent: 30
| | |-id: 36
| | |-message: Friend White <white@example.com>:Re: A bigger message thread
| | |-treeid: 31
| | |-0
| | | |-parent: 31
| | | |-id: 10
| | | |-message: Friend White <white@example.com>:Re: A bigger message thread
| | | \-treeid: 32
| | \-1
| | |-parent: 31
| | |-id: 48
| | |-message: Friend Pink <pink@example.com>:Re: A bigger message thread
| | \-treeid: 33
| \-2
| |-parent: 28
| |-id: 47
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 34
| \-0
| |-parent: 34
| |-id: 5
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 35
| \-0
| |-parent: 35
| |-id: 3
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 36
| \-0
| |-parent: 36
| |-id: 21
| |-message: Friend Yellow <yellow@example.com>:Re: A bigger message thread
| |-treeid: 37
| \-0
| |-parent: 37
| |-id: 8
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 38
| \-0
| |-parent: 38
| |-id: 43
| |-message: Friend White <white@example.com>:Re: A bigger message thread
| |-treeid: 39
| \-0
| |-parent: 39
| |-id: 28
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| |-treeid: 40
| \-0
| |-parent: 40
| |-id: 42
| |-message: Friend Brown <brown@example.com>:Re: A bigger message thread
| |-treeid: 41
| \-0
| |-parent: 41
| |-id: 22
| |-message: Friend Purple <purple@example.com>:Re: A bigger message thread
| \-treeid: 42
|-1
| |-parent: 0
| |-id: 9
| |-message: Friend Blue <blue@example.com>:RE: A bigger message thread
| \-treeid: 43
|-2
| \-parent: 0
|-3
| |-parent: 44
| |-id: 49
| |-message: Some Subcription <foo@example.com>:Newsletter #1
| \-treeid: 45
|-4
| |-parent: 44
| |-id: 50
| |-message: Some Subcription <foo@example.com>:Newsletter #2
| \-treeid: 46
\-5
|-parent: 0
|-id: 32
|-message: Friend Red <red@example.com>:A second mainthread
|-treeid: 47
\-0
|-parent: 47
|-id: 16
|-message: Friend Black <black@example.com>:Re: A second mainthread
|-treeid: 48
\-0
|-parent: 48
|-id: 14
|-message: Friend Red <red@example.com>:Re: A second mainthread
|-treeid: 49
\-0
|-parent: 49
|-id: 6
|-message: Friend White <white@example.com>:Re: A second mainthread
|-treeid: 50
\-0
|-parent: 50
|-id: 44
|-message: Fiend Silver <silver@example.com>:Re: A second mainthread
\-treeid: 51
A few things of note:
imap_thread
isn't perfect: we see id=9
as an orphan, although it seems it should be in the first thread somewhere. However, due to
the headers not mentioning this, Google Apps here decided to make it
it's own node.N.num.N.branch,N.next
method has apparently no other way to return
to root. This is the /return to root $nodes[$treeid] = &$nodes[0];
bit. You can/should filter this out after determining all other
nodes, but you need it to build the array at first.To get only the nodes starting new threads (Nth reply on message, N>1):
$threads = imap_thread($imap, SE_UID);
$branchestarts = array();
foreach($threads as $key => $value){
list($num,$type) = explode('.',$key);
if (
$type=='num' // an id
&& $value == 0 // which is actually root
&& isset($threads[$num.'.next']) // then check for next
&& isset($threads[$threads[$num.'.next'].'.num'])
){
$branchestarts[] = $threads[$threads[$num.'.next'].'.num'];
} else if(
$type=='branch' // branch movement
&& $value != 0 // not back
&& isset($threads[$value.'.num']) // sanity: target exists
&& $threads[$value.'.num'] != 0 // and is not a return to root
){
$branchestarts[] = $threads[$value.'.num'];
}
}
echo json_encode($branchestarts);
Which gives us:
[35,15,46,33,48,47,9,49,50,32]
And indeed, 35,49,50 & 32 are starts of threads, 9 is recognized as such by the imap server too, and the rest are 2nd or more replies starting their own branches.
Now, you could indeed split out branches as seperate conversation, but as you can see, these are often only 1 or 2 replies more, longer threads tend to develop a bit more rarely. To see how these 'branches' go:
$branches = array();
$currenttree = null;
foreach($threads as $key => $value){
list($num,$type) = explode('.',$key);
switch($type){
case 'num':
//nothing
break;
case 'next':
if(is_null($currenttree)) $currenttree = &$branches[$threads[$value.'.num']];
if($value && isset($threads[$value.'.num'])) $currenttree[] = $threads[$value.'.num'];
break;
case 'branch':
unset($currenttree);
if($value && $threads[$value.'.num']){
$branches[$threads[$value.'.num']] = array($threads[$value.'.num']);
$currenttree =& $branches[$threads[$value.'.num']];
}
}
}
echo json_encode($branches, JSON_PRETTY_PRINT);
Which gives you roots & branches & their replies:
{
"35": [
35,
7,
11,
39,
40,
38,
12,
25,
19,
23,
30,
2,
20,
1,
41,
27,
17,
31,
4,
37,
24,
13
],
"15": [
15,
18,
45
],
"46": [
46,
29,
26,
34
],
"33": [
33,
36,
10
],
"48": [
48
],
"47": [
47,
5,
3,
21,
8,
43,
28,
42,
22
],
"9": [
9
],
"49": [
49
],
"50": [
50
],
"32": [
32,
16,
14,
6,
44
]
}
With some slight alterations we can get the messages in there:
$branches = array();
$currenttree = null;
$messages = array();
foreach($threads as $key => $value){
list($num,$type) = explode('.',$key);
switch($type){
case 'num':
//nothing
break;
case 'next':
if(is_null($currenttree)) $currenttree = &$branches[$threads[$value.'.num']];
if($value && isset($threads[$value.'.num'])) $currenttree[] = &$messages[$threads[$value.'.num']];
break;
case 'branch':
unset($currenttree);
if($value && $threads[$value.'.num']){
$branches[$threads[$value.'.num']] = array(&$messages[$threads[$value.'.num']]);
$currenttree =& $branches[$threads[$value.'.num']];
} else {
$currenttree = null;
}
}
}
$keystofetch = implode(',',array_filter(array_keys($messages)));
foreach(imap_fetch_overview($imap,$keystofetch,FT_UID) as $message){
$messages[$message->uid] = $message;
}
echo json_encode($branches);//won't show it's output, this answer is to large as it is ;)
Another option is to just sort them by datetime value, which would be alright for conversations with little/negligible branching, probably making most of the code you are planning just work.
A combination of the two would be 'moving branches', follow threads in series, so this:
1 2013-06-01
2 2013-06-02
3 2013-06-03
4 2013-06-03
5 2013-06-04
Becomes a sequence of 1,2,3,4,5
but a reply on 3
would resort it:
1 2013-06-01
4 2013-06-03
5 2013-06-04
2 2013-06-02
3 2013-06-03
6 2013-06-05
Making it a sequence of 1,4,5,2,3,6
, which would keep it a logically flowing conversation, with always the thread/branch with the last reply as last.