data-partitioning

partitioning an float array into similar segments (clustering)

妖精的绣舞 提交于 2019-11-30 02:50:47
问题 I have an array of floats like this: [1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200] Now, I want to partition the array like this: [[1.91, 2.87, 3.61] , [10.91, 11.91, 12.82] , [100.73, 100.71, 101.89] , [200]] // [200] will be considered as an outlier because of less cluster support I have to find this kind of segment for several arrays and I don't know what should be the partition size. I tried to do it by using hierarchical clustering (Agglomerative) and it gives

Querying Windows Azure Table Storage with multiple query criteria

≡放荡痞女 提交于 2019-11-29 09:23:56
I'm trying to query a table in Windows Azure storage and was initially using the TableQuery.CombineFilters in the TableQuery<RecordEntity>().Where function as follows: TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.GreaterThanOrEqual, lowDate), TableOperators.And, TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThanOrEqual, lowDate), TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, entityId) )); Unfortunately CombineFilters only allows 2 query criteria max. So I'm currently doing this: var

Querying Windows Azure Table Storage with multiple query criteria

做~自己de王妃 提交于 2019-11-28 02:50:16
问题 I'm trying to query a table in Windows Azure storage and was initially using the TableQuery.CombineFilters in the TableQuery<RecordEntity>().Where function as follows: TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.GreaterThanOrEqual, lowDate), TableOperators.And, TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThanOrEqual, lowDate), TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, entityId) ));

Using an iterator to Divide an Array into Parts with Unequal Size

こ雲淡風輕ζ 提交于 2019-11-27 09:50:30
I have an array which I need to divide up into 3-element sub-arrays. I wanted to do this with iterators, but I end up iterating past the end of the array and segfaulting even though I don't dereference the iterator . given: auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; I'm doing: auto bar = cbegin(foo); for (auto it = next(bar, 3); it < foo.end(); bar = it, it = next(bar, 3)) { for_each(bar, it, [](const auto& i) { cout << i << endl; }); } for_each(bar, cend(foo), [](const auto& i) { cout << i << endl; }); Now I can solve this by defining a finish iterator: auto bar = cbegin(foo); auto finish

U-SQL Output in Azure Data Lake

柔情痞子 提交于 2019-11-27 09:08:39
Would it be possible to automatically split a table into several files based on column values if I don't know how many different key values the table contains? Is it possible to put the key value into the filename? Michael Rys This is our top ask (and has been previously asked on stackoverflow too :). We are currently working on it and hopefully have it available by summer. Until then you have to write a script generator. I tend to use U-SQL to generate the script but you could do it with Powershell or T4 etc. Here is an example: Let's assume you want to write files for the column name in the

QuickSort and Hoare Partition

十年热恋 提交于 2019-11-27 08:42:30
I have a hard time translating QuickSort with Hoare partitioning into C code, and can't find out why. The code I'm using is shown below: void QuickSort(int a[],int start,int end) { int q=HoarePartition(a,start,end); if (end<=start) return; QuickSort(a,q+1,end); QuickSort(a,start,q); } int HoarePartition (int a[],int p, int r) { int x=a[p],i=p-1,j=r; while (1) { do j--; while (a[j] > x); do i++; while (a[i] < x); if (i < j) swap(&a[i],&a[j]); else return j; } } Also, I don't really get why HoarePartition works. Can someone explain why it works, or at least link me to an article that does? I

C# - elegant way of partitioning a list?

ぐ巨炮叔叔 提交于 2019-11-27 07:05:24
I'd like to partition a list into a list of lists, by specifying the number of elements in each partition. For instance, suppose I have the list {1, 2, ... 11}, and would like to partition it such that each set has 4 elements, with the last set filling as many elements as it can. The resulting partition would look like {{1..4}, {5..8}, {9..11}} What would be an elegant way of writing this? Here is an extension method that will do what you want: public static IEnumerable<List<T>> Partition<T>(this IList<T> source, Int32 size) { for (int i = 0; i < (source.Count / size) + (source.Count % size >

Iterator over all partitions into k groups?

烂漫一生 提交于 2019-11-26 22:03:28
问题 Say I have a list L. How can I get an iterator over all partitions of K groups? Example: L = [ 2,3,5,7,11, 13], K = 3 List of all possible partitions of 3 groups: [ [ 2 ], [ 3, 5], [ 7,11,13] ] [ [ 2,3,5 ], [ 7, 11], [ 13] ] [ [ 3, 11 ], [ 5, 7], [ 2, 13] ] [ [ 3 ], [ 11 ], [ 5, 7, 2, 13] ] etc... === UPDATE === I was working on a solution which seems to be working, so I will just copy paste it # -*- coding: utf-8 -*- import itertools # return ( list1 - list0 ) def l1_sub_l0( l1, l0 ) : ""

Spark SQL - Difference between df.repartition and DataFrameWriter partitionBy?

泪湿孤枕 提交于 2019-11-26 18:27:20
What is the difference between DataFrame repartition() and DataFrameWriter partitionBy() methods? I hope both are used to "partition data based on dataframe column"? Or is there any difference? If you run repartition(COL) you change the partitioning during calculations - you will get spark.sql.shuffle.partitions (default: 200) partitions. If you then call .write you will get one directory with many files. If you run .write.partitionBy(COL) then as the result you will get as many directories as unique values in COL. This speeds up futher data reading (if you filter by partitioning column) and

Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?

半腔热情 提交于 2019-11-26 17:08:30
问题 I have a large JSON file with I'm guessing 4 million objects. Each top level has a few levels nested inside. I want to split that into multiple files of 10000 top level objects each (retaining the structure inside each). jq should be able to do that right? I'm not sure how. So data like this: [{ "id": 1, "user": { "name": "Nichols Cockle", "email": "ncockle0@tmall.com", "address": { "city": "Turt", "state": "Thị Trấn Yên Phú" } }, "product": { "name": "Lychee - Canned", "code": "36987-1526" }