I have a simple array with some names in it and I want to group them by their first letter. E.g. all names with A to C as first letter go in an array and D to F go to another on
You can do this:
function buckets($array, callable $bucketFunc) {
$buckets = [];
foreach ($array as $val) {
$bucket = $bucketFunc($val);
if (!isset($buckets[$bucket])) {
$buckets[$bucket] = [];
}
$buckets[$bucket][] = $val;
}
return $buckets;
}
function myBucketFunc($value) {
//Gets the index of the first character and returns which triple of characters it belongs to
return floor((ord(ucfirst($value)) - ord("A"))/3);
}
$array = [ "Abc", "Cba", "Foo","Hi", "Bar" ];
$buckets = buckets($array, 'myBucketFunc');//Any function would
Would return:
Array
(
[0] => Array
(
[0] => Abc
[1] => Cba
[2] => Bar
)
[1] => Array
(
[0] => Foo
)
[2] => Array
(
[0] => Hi
)
)
Further clarification:
ord returns the ASCII value of a character.
Doing ord("X") - ord("A")
would return the letter index of X.
Dividing that letter index by 3 would return the bucket number of X if we split the alphabet into buckets of 3 letters each.
This is a good use of array_reduce in a non-scalar fashion:
function keyize(string $word, $stride = 3): string {
$first = strtoupper($word{0});
$index = (int)floor((ord($first) - ord('A'))/$stride);
return implode('', array_chunk(range('A', 'Z'), $stride)[$index]);
}
function bucketize(array $words, $stride = 3): array {
return array_reduce(
$words,
function ($index, $word) use ($stride) {
$index[keyize($word, $stride)][] = $word;
return $index;
},
[]
);
}
$words = [ 'alpha', 'Apple', 'Bravo', 'banana', 'charlie', 'Cucumber', 'echo', 'Egg', ];
shuffle($words);
$buckets = bucketize($words, 3); // change the number of characters you want grouped, eg 1, 13, 26
ksort($buckets);
var_dump($buckets);
So we're using array_reduce to walk - and simultaneously build - the buckets. It's not the most efficient as implemented, because the bucket array is copied through each closure invocation. However, it's compact.
I now have four methods to offer. All can be modified to allow for larger or smaller groups by changing $size
.
Code#1 processes the values as an array by using 2 foreach()
loops and a comparison on the first character of each value. This is the easiest to comprehend.
$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits); // pre-sort them for alphabetized output
$size=3; // <-modify group sizes here
$chunks=array_chunk(range('A','Z'),$size); // 0=>["A","B","C"],1=>["D","E","F"],etc...
foreach($fruits as $fruit){
foreach($chunks as $letters){
if(in_array(strtoupper($fruit[0]),$letters)){ // check if captialized first letter exists in $letters array
$groups[implode($letters)][]=$fruit; // push value into this group
break; // go to next fruit/value
}
}
}
var_export($groups);
Code#2 integrates apokryfos' very clever ord()
line with Code#1 to eliminate the non-matching iterations of the inner loop (and the inner loop itself). This delivers improvement on efficiency, but a negative impact on readability.
$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits); // pre-sort them for alphabetized output
$size=3; // <-modify group sizes here
$chunks=array_chunk(range('A','Z'),$size); // 0=>["A","B","C"],1=>["D","E","F"],etc...
foreach($fruits as $fruit){
$groups[implode($chunks[floor((ord(strtoupper($fruit[0]))-ord("A"))/$size)])][]=$fruit;
}
var_export($groups);
Code#3 processes the values as a csv string by using preg_match_all()
and some filtering functions. This assumes that no values include commas in them. In my opinion, this code is hard to comprehend at a glance because of all of the functions and the very long regex pattern.
$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits); // pre-sort them for alphabetized output // array(6 => 'apple',5 => 'Banana',0 => 'date',1 => 'guava',4 => 'kiwi',2 => 'lemon',3 => 'Orange')
$size=3; // <-modify group sizes here
$chunks=str_split(implode(range('A','Z')),$size); // ['ABC','DEF','GHI','JKL','MNO','PQR','STU','VWX','YZ']
$regex="/((?<=^|,)[".implode('][^,]*)|((?<=^|,)[',$chunks)."][^,]*)/i"; // '/((?<=^|,)[ABC][^,]*)|((?<=^|,)[DEF][^,]*)|((?<=^|,)[GHI][^,]*)|((?<=^|,)[JKL][^,]*)|((?<=^|,)[MNO][^,]*)|((?<=^|,)[PQR][^,]*)|((?<=^|,)[STU][^,]*)|((?<=^|,)[VWX][^,]*)|((?<=^|,)[YZ][^,]*)/i'
if(preg_match_all($regex,implode(",",$fruits),$out)){
$groups=array_map('array_values', // 0-index subarray elements
array_filter( // omit empty subarrays
array_map('array_filter', // omit empty subarray elements
array_combine($chunks, // use $chunks as keys for $out
array_slice($out,1) // remove fullstring subarray from $out
)
)
)
);
var_export($groups);
}
Code#4 processes the values as an array without loops or conditionals by using: array_map()
, preg_grep()
, array_values()
, array_combine()
, and array_filter
to form a one-liner *discounting the $size
& $chunks
declarations. ...I don't like to stop until I've produced a one-liner -- no matter how ugly. ;)
$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits); // pre-sort them for alphabetized output
$size=3; // <-modify group sizes here
$chunks=str_split(implode(range('A','Z')),$size); // ['ABC','DEF','GHI','JKL','MNO','PQR','STU','VWX','YZ']
$groups=array_filter(array_combine($chunks,array_map(function($v)use($fruits){return array_values(preg_grep("/^[$v].*/i",$fruits));},$chunks)));
var_export($groups);
// $groups=array_filter( // remove keys with empty subarrays
// array_combine($chunks, // use $chunks as keys and subarrays as values
// array_map(function($v)use($fruits){ // check every chunk
// return array_values( // reset subarray's keys
// preg_grep("/^[$v].*/i",$fruits) // create subarray of matches
// );
// },$chunks)
// )
// );
All codes output an identical result:
array (
'ABC' =>
array (
0 => 'apple',
1 => 'Banana',
),
'DEF' =>
array (
0 => 'date',
),
'GHI' =>
array (
0 => 'guava',
),
'JKL' =>
array (
0 => 'kiwi',
1 => 'lemon',
),
'MNO' =>
array (
0 => 'Orange',
),
)