I got a list of files and directories List<string> pathes
. Now I'd like to calculate the deepest common branch every path is sharing with each other.
We can assume that they all share a common path, but this is unknown in the beginning.
Let's say I have the following three entries:
- C:/Hello/World/This/Is/An/Example/Bla.cs
- C:/Hello/World/This/Is/Not/An/Example/
- C:/Hello/Earth/Bla/Bla/Bla
This should get the result: C:/Hello/ as Earth is breaking this "chain" of subdirectories.
Second example:
- C:/Hello/World/This/Is/An/Example/Bla.cs
- C:/Hello/World/This/Is/Not/An/Example/
-> C:/Hello/World/This/Is/
How would you proceed? I tried to use string.split(@"/") and start with the first string and check if every part of this array is contained in the other strings. However, this would be a very expensive call as I'm iterating (list_of_entries)^list_of_entries. Is there any better solution available?
My current attempt would be something like the following (C# + LINQ):
public string CalculateCommonPath(IEnumerable<string> paths)
{
int minSlash = int.MaxValue;
string minPath = null;
foreach (var path in paths)
{
int splits = path.Split('\\').Count();
if (minSlash > splits)
{
minSlash = splits;
minPath = path;
}
}
if (minPath != null)
{
string[] splits = minPath.Split('\\');
for (int i = 0; i < minSlash; i++)
{
if (paths.Any(x => !x.StartsWith(splits[i])))
{
return i >= 0 ? splits.Take(i).ToString() : "";
}
}
}
return minPath;
}
A function to get the longest common prefix may look like this:
public static string GetLongestCommonPrefix(string[] s)
{
int k = s[0].Length;
for (int i = 1; i < s.Length; i++)
{
k = Math.Min(k, s[i].Length);
for (int j = 0; j < k; j++)
if (s[i][j] != s[0][j])
{
k = j;
break;
}
}
return s[0].Substring(0, k);
}
Then you may need to cut the prefix on the right hand. E.g. we want to return c:/dir
instead of c:/dir/file
for
c:/dir/file1
c:/dir/file2
You also may want to normalize the paths before processing. See Normalize directory names in C#.
I dont know whether this is the best performing solution (probably not), but it surely is very easy to implement.
- Sort your list alphabetically
- compare the first entry in that sorted list to the last in that list, character by character, and terminate when you find a difference (the value before the termination is the longest shared substring of both those strings)
Sample code:
List<string> paths = new List<string>();
paths.Add(@"C:/Hello/World/This/Is/An/Example/Bla.cs");
paths.Add(@"C:/Hello/World/This/Is/Not/An/Example/");
paths.Add(@"C:/Hello/Earth/Bla/Bla/Bla");
List<string> sortedPaths = paths.OrderBy(s => s).ToList();
Console.WriteLine("Most common path here: {0}", sharedSubstring(sortedPaths[0], sortedPaths[sortedPaths.Count - 1]));
And that function of course:
public static string sharedSubstring(string string1, string string2)
{
string ret = string.Empty;
int index = 1;
while (string1.Substring(0, index) == string2.Substring(0, index))
{
ret = string1.Substring(0, index);
index++;
}
return ret;
} // returns an empty string if no common characters where found
First sort the list with the paths to inspect. Then you can split and compare the first and the last item - if they are same proceed to the next dimension until you find a difference.
So you just need to sort once and then inspect two items.
I would iterate over each character in the first path, comparing it with every character in every path (except the first) in the collection of paths:
public string FindCommonPath(List<string> paths)
{
string firstPath = paths[0];
bool same = true;
int i = 0;
string commonPath = string.Empty;
while (same && i < firstPath.Length)
{
for (int p = 1; p < paths.Count && same; p++)
{
same = firstPath[i] == paths[p][i];
}
if (same)
{
commonPath += firstPath[i];
}
i++;
}
return commonPath;
}
You could iterate through the list first to find the shortest path and possibly improve it slightly.
To return c:/dir
for
c:/dir/file1
c:/dir/file2
I would code it this way:
public static string GetLongestCommonPrefix(params string[] s)
{
return GetLongestCommonPrefix((ICollection<string>)s);
}
public static string GetLongestCommonPrefix(ICollection<string> paths)
{
if (paths == null || paths.Count == 0)
return null;
if (paths.Count == 1)
return paths.First();
var allSplittedPaths = paths.Select(p => p.Split('\\')).ToList();
var min = allSplittedPaths.Min(a => a.Length);
var i = 0;
for (i = 0; i < min; i++)
{
var reference = allSplittedPaths[0][i];
if (allSplittedPaths.Any(a => !string.Equals(a[i], reference, StringComparison.OrdinalIgnoreCase)))
{
break;
}
}
return string.Join("\\", allSplittedPaths[0].Take(i));
}
And here are some tests for it:
[TestMethod]
public void GetLongestCommonPrefixTest()
{
var str1 = @"C:\dir\dir1\file1";
var str2 = @"C:\dir\dir1\file2";
var str3 = @"C:\dir\dir1\file3";
var str4 = @"C:\dir\dir2\file3";
var str5 = @"C:\dir\dir1\file1\file3";
var str6 = @"C:\dir\dir1\file1\file3";
var res = Utilities.GetLongestCommonPrefix(str1, str2, str3);
Assert.AreEqual(@"C:\dir\dir1", res);
var res2 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str4);
Assert.AreEqual(@"C:\dir", res2);
var res3 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str5);
Assert.AreEqual(@"C:\dir\dir1", res3);
var res4 = Utilities.GetLongestCommonPrefix(str5, str6);
Assert.AreEqual(@"C:\dir\dir1\file1\file3", res4);
var res5 = Utilities.GetLongestCommonPrefix(str5);
Assert.AreEqual(str5, res5);
var res6 = Utilities.GetLongestCommonPrefix();
Assert.AreEqual(null, res6);
var res7 = Utilities.GetLongestCommonPrefix(null);
Assert.AreEqual(null, res7);
}
来源:https://stackoverflow.com/questions/24866683/find-common-parent-path-in-list-of-files-and-directories