hpc

What is the recommended compression for HDF5 for fast read/write performance (in Python/pandas)?

北慕城南 提交于 2019-12-31 13:29:34
问题 I have read several times that turning on compression in HDF5 can lead to better read/write performance. I wonder what ideal settings can be to achieve good read/write performance at: data_df.to_hdf(..., format='fixed', complib=..., complevel=..., chunksize=...) I'm already using fixed format (i.e. h5py ) as it's faster than table . I have strong processors and do not care much about disk space. I often store DataFrame s of float64 and str types in files of approx. 2500 rows x 9000 columns.

Why does this MPI code execute out of order? [duplicate]

依然范特西╮ 提交于 2019-12-25 16:49:24
问题 This question already has answers here : Why is my MPI program outputting incorrectly (2 answers) Closed 5 years ago . I'm trying to create a "Hello, world!" application in (Open)MPI such that each process will print out in order. My idea was to have the first process send a message to the second when it's finished, then the second to the third, etc.: #include <mpi.h> #include <stdio.h> int main(int argc,char **argv) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,

starting slurm array job with a specified number of nodes

北战南征 提交于 2019-12-25 04:12:12
问题 I’m trying to align 168 sequence files on our HPC using slurm version 14.03.0. I’m only allowed to use a maximum of 9 compute nodes at once to keep some nodes open for other people. I changed the file names so I could use the array function in sbatch. The sequence files look like this: Sequence1.fastq.gz, Sequence2.fastq.gz, … Sequence168.fastq.gz I can’t seem to figure out how to tell it to run all 168 files, 9 at a time. I can get it to run all 168 files, but it uses all the available nodes

Application performance vs Peak performance

为君一笑 提交于 2019-12-25 03:26:25
问题 I have questions about real application performance running on a cluster vs cluster peak performance. Let's say one HPC cluster report that it has peak performance of 1 Petaflops. How is this calculated? To me, it seems that there are two measuring matrixes. One is the performance calculated based on the hardware. The other one is from running HPL? Is my understanding correct? When I am reading one real application running on the system at full scale, the developer mentions that it could

Garbage output after MPI gather using derived type

折月煮酒 提交于 2019-12-25 02:51:38
问题 Given this struct: struct mpi_energy_data { int rank; time_t from; time_t to; char hostname[HOST_NAME_MAX]; }; I'm attempting to build a derived MPI type. I later use this in a gather operation, but all output in the receiving array is garbage except that which is sent from rank 0. MPI_Datatype time_interval_mpi; MPI_Datatype type[4] = { MPI_INT, MPI_CHAR, MPI_INT, MPI_INT }; int blocklen[4] = { 1,HOST_NAME_MAX, 1, 1 }; MPI_Aint offsets[4]; offsets[0] = offsetof(struct mpi_energy_data, rank);

Pass a Parameter object (PSCredential) inside a ScriptBlock programmatically in C#

穿精又带淫゛_ 提交于 2019-12-24 02:45:21
问题 I am trying to run an HPC cmdlet programmatically to change HPC install credential on a remote computer. If run the cmdlet locally, it's pretty straightforward: Runspace rs = GetPowerShellRunspace(); rs.Open(); Pipeline pipeline = rs.CreatePipeline(); PSCredential credential = new PSCredential(domainAccount, newPassword); Command cmd = new Command("Set-HpcClusterProperty"); cmd.Parameters.Add("InstallCredential", credential); pipeline.Commands.Add(cmd); Collection<PSObject> ret = pipeline

What all operations does FLOPS include?

半世苍凉 提交于 2019-12-24 01:47:28
问题 FLOPS stands for FLoating-point Operations Per Second and I have some idea what Floating-point is. I want to know what these Operations are? Does +, -, *, / are the only operations or operations like taking logarithm(), exponential() are also FLOs? Does + and * of two floats take same time? And if they take different time, then what interpretation should I draw from the statement: Performance is 100 FLOPS . How many + and * are there in one second. I am not a computer science guy, so kindly

Slurm: Use cores from multiple nodes for R parallelization

别等时光非礼了梦想. 提交于 2019-12-24 00:52:46
问题 I want to parallelize an R script on a HPC with a Slurm scheduler. SLURM is configured with SelectType: CR_Core_Memory . Each compute node has 16 cores (32 threads). I pass the R script to SLURM with the following configuration using the clustermq as the interface to Slurm. #!/bin/sh #SBATCH --job-name={{ job_name }} #SBATCH --partition=normal #SBATCH --output={{ log_file | /dev/null }} # you can add .%a for array index #SBATCH --error={{ log_file | /dev/null }} #SBATCH --mem-per-cpu={{

slurm: use a control node also for computing

杀马特。学长 韩版系。学妹 提交于 2019-12-23 17:22:39
问题 I have set up a small cluster (9 nodes) for computing in our lab. Currrently I am using one node as slurm controller, i.e. it is not being used for computing. I would like to use it too, but I do not want to allocate all the CPUs, I would like to keep 2 CPU free for scheduling and other master-node-related tasks. Is it possible to write something like that in slurm.conf : NodeName=master NodeHostname=master CPUs=10 RealMemory=192000 TmpDisk=200000 State=UNKNOWN NodeName=node0[1-8]

What is the analogue of an NDIS filter in linux?

馋奶兔 提交于 2019-12-22 20:31:12
问题 I am working on an as close to real-time system as possible in linux and need to send about 600-800 bytes in a TCP packet as soon as I receive a specific packet. For best possible latencies I want this packet to be sent directly from the kernel instead of it the received packet going all the way up to the userspace and the applicaiton and then making its way back. If I were on windows I'd have written an NDIS filter which I would cache the packet to be sent with and the matching parameters so