问题
I'm looking for an efficient way of sorting an array of email addresses to avoid items with the same domain to be consecutive, in C#.
Email addresses inside the array are already distinct and all of them are lower case.
Example:
Given an array with the following entries:
john.doe@domain1.com
jane_doe@domain1.com
patricksmith@domain2.com
erick.brown@domain3.com
I would like to obtain something similar to the following:
john.doe@domain1.com
patricksmith@domain2.com
jane_doe@domain1.com
erick.brown@domain3.com
回答1:
With the help of an extension method (stolen from https://stackoverflow.com/a/27533369/172769), you can go like this:
List<string> emails = new List<string>();
emails.Add("john.doe@domain1.com");
emails.Add("jane_doe@domain1.com");
emails.Add("patricksmith@domain2.com");
emails.Add("erick.brown@domain3.com");
var q = emails.GroupBy(m => m.Split('@')[1]).Select(g => new List<string>(g)).Interleave();
The Interleave
method is defined as:
public static IEnumerable<T> Interleave<T>(this IEnumerable<IEnumerable<T>> source )
{
var queues = source.Select(x => new Queue<T>(x)).ToList();
while (queues.Any(x => x.Any())) {
foreach (var queue in queues.Where(x => x.Any())) {
yield return queue.Dequeue();
}
}
}
So basically, we create groups based on the domain part of the email adresses, project (or Select) each group into a List<string>
, and then "Interleave" those lists.
I have tested against your sample data, but more thorough testing might be needed to find edge cases.
DotNetFiddle snippet
Cheers
回答2:
This will distribute them semi-evenly and attempt to avoid matching domains next to each other (although in certain lists that may be impossible). This answer will use OOP and Linq.
DotNetFiddle.Net Example
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var seed = new List<string>()
{
"1@a.com",
"2@a.com",
"3@a.com",
"4@a.com",
"5@a.com",
"6@a.com",
"7@a.com",
"8@a.com",
"9@a.com",
"10@a.com",
"1@b.com",
"2@b.com",
"3@b.com",
"1@c.com",
"4@b.com",
"2@c.com",
"3@c.com",
"4@c.com"
};
var work = seed
// Create a list of EmailAddress objects
.Select(s => new EmailAddress(s)) // s.ToLowerCase() ?
// Group the list by Domain
.GroupBy(s => s.Domain)
// Create a List<EmailAddressGroup>
.Select(g => new EmailAddressGroup(g))
.ToList();
var currentDomain = string.Empty;
while(work.Count > 0)
{
// this list should not be the same domain we just used
var noDups = work.Where(w => w.Domain != currentDomain);
// if none exist we are done, or it can't be solved
if (noDups.Count() == 0)
{
break;
}
// find the first group with the most items
var workGroup = noDups.First(w => w.Count() == noDups.Max(g => g.Count()));
// get the email address and remove it from the group list
var workItem = workGroup.Remove();
// if the group is empty remove it from *work*
if (workGroup.Count() == 0)
{
work.Remove(workGroup);
Console.WriteLine("removed: " + workGroup.Domain);
}
Console.WriteLine(workItem.FullEmail);
// last domain looked at.
currentDomain = workItem.Domain;
}
Console.WriteLine("Cannot disperse email addresses affectively, left overs:");
foreach(var workGroup in work)
{
while(workGroup.Count() > 0)
{
var item = workGroup.Remove();
Console.WriteLine(item.FullEmail);
}
}
}
public class EmailAddress
{
public EmailAddress(string emailAddress)
{
// Additional Email Address Validation
var result = emailAddress.Split(new char[] {'@'}, StringSplitOptions.RemoveEmptyEntries)
.ToList();
if (result.Count() != 2)
{
new ArgumentException("emailAddress");
}
this.FullEmail = emailAddress;
this.Name = result[0];
this.Domain = result[1];
}
public string Name { get; private set; }
public string Domain { get; private set; }
public string FullEmail { get; private set; }
}
public class EmailAddressGroup
{
private List<EmailAddress> _emails;
public EmailAddressGroup(IEnumerable<EmailAddress> emails)
{
this._emails = emails.ToList();
this.Domain = emails.First().Domain;
}
public int Count()
{
return _emails.Count();
}
public string Domain { get; private set; }
public EmailAddress Remove()
{
var result = _emails.First();
_emails.Remove(result);
return result;
}
}
}
Output:
1@a.com
1@b.com
2@a.com
1@c.com
3@a.com
2@b.com
4@a.com
2@c.com
5@a.com
3@b.com
6@a.com
3@c.com
7@a.com
removed: b.com
4@b.com
8@a.com
removed: c.com
4@c.com
9@a.com
Cannot disperse email addresses affectively, left overs:
10@a.com
回答3:
Something like this will spread them equally, but you will have the problems (=consecutive elements) at the end of the new list...
var list = new List<string>();
list.Add("john.doe@domain1.com");
list.Add("jane_doe@domain1.com");
list.Add("patricksmith@domain2.com");
list.Add("erick.brown@domain3.com");
var x = list.GroupBy(content => content.Split('@')[1]);
var newlist = new List<string>();
bool addedSomething=true;
int i = 0;
while (addedSomething) {
addedSomething = false;
foreach (var grp in x) {
if (grp.Count() > i) {
newlist.Add(grp.ElementAt(i));
addedSomething = true;
}
}
i++;
}
回答4:
Edit: Added a high level description :)
What this code does is group each element by the domain, sort the groups by size in descending order (largest group first), project the elements of each group into a stack, and pop them off of each stack (always pop the next element off the largest stack with a different domain). If there is only a single stack left, then its contents are yielded.
This should make sure that all domains distributed as evenly as possible.
MaxBy extension method from: https://stackoverflow.com/a/31560586/969962
private IEnumerable<string> GetNonConsecutiveEmails(List<string> list)
{
var emailAddresses = list.Distinct().Select(email => new EmailAddress { Email = email, Domain = email.Split('@')[1]}).ToArray();
var groups = emailAddresses
.GroupBy(addr => addr.Domain)
.Select (group => new { Domain = group.Key, EmailAddresses = new Stack<EmailAddress>(group)})
.ToList();
EmailAddress lastEmail = null;
while(groups.Any(g => g.EmailAddresses.Any()))
{
// Try and pick from the largest stack.
var stack = groups
.Where(g => (g.EmailAddresses.Any()) && (lastEmail == null ? true : lastEmail.Domain != g.Domain))
.MaxBy(g => g.EmailAddresses.Count);
// Null check to account for only 1 stack being left.
// If so, pop the elements off the remaining stack.
lastEmail = (stack ?? groups.First(g => g.EmailAddresses.Any())).EmailAddresses.Pop();
yield return lastEmail.Email;
}
}
class EmailAddress
{
public string Domain;
public string Email;
}
public static class Extensions
{
public static T MaxBy<T,U>(this IEnumerable<T> data, Func<T,U> f) where U:IComparable
{
return data.Aggregate((i1, i2) => f(i1).CompareTo(f(i2))>0 ? i1 : i2);
}
}
回答5:
What I am trying to do here is to sort them first. Then I re-arrange from a different end. I'm sure there're more efficient ways to do this but this is one easy way to do it.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication4
{
class Program
{
static void Main(string[] args)
{
String[] emails = { "john.doe@domain1.com", "jane_doe@domain1.com", "patricksmith@domain2.com", "erick.brown@domain3.com" };
var result = process(emails);
}
static String[] process(String[] emails)
{
String[] result = new String[emails.Length];
var comparer = new DomainComparer();
Array.Sort(emails, comparer);
for (int i = 0, j = emails.Length - 1, k = 0; i < j; i++, j--, k += 2)
{
if (i == j)
result[k] = emails[i];
else
{
result[k] = emails[i];
result[k + 1] = emails[j];
}
}
return result;
}
}
public class DomainComparer : IComparer<string>
{
public int Compare(string left, string right)
{
int at_pos = left.IndexOf('@');
var left_domain = left.Substring(at_pos, left.Length - at_pos);
at_pos = right.IndexOf('@');
var right_domain = right.Substring(at_pos, right.Length - at_pos);
return String.Compare(left_domain, right_domain);
}
}
}
来源:https://stackoverflow.com/questions/33849021/interleave-an-array-of-email-addresses-avoiding-items-with-same-domain-to-be-con