文件

将MATLAB®计算划分为任务

The Parallel Computing Toolbox™ enables us to execute our MATLAB® programs on a cluster of computers. In this example, we look at how to divide a large collection of MATLAB operations into smaller work units, called tasks, which the workers in our cluster can execute. We do this programmatically using thepctdemo_helper_split_scalar.andpctdemo_helper_split_vector.Functions.

Prerequisites:

为了进一步阅读,请参阅:

获取个人资料

与并行计算工具箱中的每个其他示例一样,此示例需要知道要使用的群集。我们使用默认配置文件标识的群集。看群集配置文件在如何创建新配置文件以及如何更改默认配置文件的文档中。

profileName = parallel.defaultClusterProfile();

Starting with Sequential Code

One of the important advantages of the Parallel Computing Toolbox is that it builds very well on top of existing sequential code. It is actually beneficial to focus on sequential MATLAB code during the algorithm development, debugging and performance evaluation stages, because we then benefit from the rapid prototyping and interactive editing, debugging, and execution capabilities that MATLAB offers. During the development of the sequential code, we should separate the computations from the pre- and the post-processing, and make the core of the computations as simple and independent from the rest of the code as possible. Once our code is somewhat stable, it is time to look at distributing the computations. If we do a good job of creating modular sequential code for a coarse grained application, it should be rather easy to distribute those computations.

分析顺序问题

并行计算工具箱支持执行粗粒度的应用程序,即使用多个输入参数的单万博1manbetx个程序的独立,同时执行单个程序。我们现在尝试展示Matlab代码中经常看起来像粗粒计算的示例,并解释如何分发这些计算。我们专注于两个常见场景,在原始的,顺序matlab代码中由此组成

  • 使用输入参数的不同值调用几次单个函数。这种自然区域的计算有时被称为参数扫描那and the code often looks similar to the following MATLAB code:

对于i = 1:n y(i)= f(x(i));结尾
  • Invoking a single stochastic function several times. Suppose that the calculations ofg(x)涉及随机数,并且函数每次调用它时都会返回不同的值(即使输入参数也是如此X保持原样)。这些计算有时被称为Monte Carlo simulations那and the code often looks similar to the following MATLAB code:

对于i = 1:n y(i)= g(x);结尾

It is quite possible that the parameter sweeps and simulations appear in a slightly different form in our sequential MATLAB code. For example, if the functionF一世s vectorized, the parameter sweep may simply appear as

y = f(x);

蒙特卡罗模拟可能会出现

y = g(x, n);

示例:将模拟划分为任务

我们在下面使用一个非常小的例子,使用兰特as our function of interest. Imagine that we have a cluster with four workers, and we want to divide the function call兰特(1, 10)它们之间。这是最简单的事情parbecause it divides the computations between the workers without our having to make any decisions about how to best do that.

我们可以扩展函数调用兰特(1, 10)一世nto the corresponding为了环形:

为了一世= 1:10 y(i) = rand() end

并行化使用par只是包括更换为了与A.par。如果并行池在四个工人上打开,则执行工人:

Parcon i = 1:10 y(i)= rand()结束

Alternatively, we can usecreateJob.andcreateTaskto divide the execution of兰特(1, 10)between the four workers. We use four tasks, and have them generate random vectors of length 3, 3, 2, and 2. We have created a function calledpctdemo_helper_split_scalar.这有助于划分4个任务之间的10个随机数的生成:

numrand = 10;%我们想要这个随机数。numTasks = 4;%我们希望分成这一任务。clust = parcluster(profilename);作业= createjob(clust);[numpertask,numtasks] = pctdemo_helper_split_scalar(numrand,numtasks);

注意搞定怎么pctdemo_helper_split_scalar.splits the work of generating 10 random numbers between thenumTasks任务。元素numPerTaskare all positive, the vector length isnumTasks,其总和等于numrand.

DISP(NUMPERTASK)
3 3 2 2

We can now write a for-loop that creates all the tasks in the job. Task一世一世s to create a matrix of the size 1-by-numPerTask(i). When all the tasks have been created, we submit the job, wait for it to finish, and then retrieve the results.

为了i = 1:numtasks createTask(作业,@rand,1,{1,numpertask(i)});结尾submit(job); wait(job); y = fetchOutputs(job); cat(2, y{:})%将γ中的所有细胞连接到一列向量中。delete(job);
ans = Columns 1 through 7 0.3246 0.6618 0.6349 0.2646 0.0968 0.5052 0.8847 Columns 8 through 10 0.9993 0.8939 0.2502

Example: Dividing a Parameter Sweep into Tasks

For the purposes of this example, let's use theFunction as a very simple example. We letX是长度10的矢量:

X= 0.1:0.1:1;

and now we want to distribute the calculations of罪(x)on a cluster with 4 workers. As before, this is easiest to achieve withpar

par一世= 1:length(x) y(i) = sin(x(i)); end

If we decide to achieve this using jobs and tasks, we first need to determine how to divide the computations among the tasks. We have the 4 workers evaluate罪(x(1:3))罪(x(4:6))罪(x(7:8))那and罪(x(9:10))simultaneously. Because this kind of a division of a parameter sweep into separate tasks occurs frequently in our examples, we have created a function that does exactly that:

numTasks = 4;[xSplit, numTasks] = pctdemo_helper_split_vector(x, numTasks); celldisp(xSplit);
XSPLIT {1} = 0.1000 0.2000 0.3000 XSPLIT {2} = 0.4000 0.5000 0.6000 XSPLIT {3} = 0.7000 0.8000 XSPLIT {4} = 0.9000 1.0000

and it is now relatively easy to usecreateJob.andcreateTask,执行计算:

作业= createjob(clust);为了一世= 1:numTasks xThis = xSplit{i}; createTask(job, @sin, 1, {xThis});结尾submit(job); wait(job); y = fetchOutputs(job); delete(job); cat(2, y{:})%将γ中的所有细胞连接到一列向量中。
ans = Columns 1 through 7 0.0998 0.1987 0.2955 0.3894 0.4794 0.5646 0.6442 Columns 8 through 10 0.7174 0.7833 0.8415

More on Parameter Sweeps

The example involving the功能特别简单,因为Function is vectorized. We look at how to deal with nonvectorized functions in theWriting Task Functions例子。

Dividing MATLAB Operations into Tasks: Best Practices

使用作业和任务时,我们必须决定如何将计算划分为适当大小的任务,请注意以下内容:

  • 我们想要制作的函数调用数量

  • 执行每个函数调用所需的时间

  • 我们想要在我们的集群中使用的工人数量

我们至少需要尽可能多的任务,因为有工人,我们可以同时使用所有这些,这鼓励我们将我们的工作分成小单位。另一方面,与每个任务都有一个开销,并鼓励我们最大限度地减少任务的数量。因此,我们抵达以下内容:

  • If we only need to invoke our function a few times, and it takes only one or two seconds to evaluate it, we are better off not using the Parallel Computing Toolbox. Instead, we should simply perform our computations using MATLAB running on our local machine.

  • If we can evaluate our function very quickly, but we have to calculate many function values, we should let a task consist of calculating a number of function values. This way, we can potentially use many of our workers simultaneously, yet the task and job overhead is negligible relative to the running time. Note that we may have to write a new task function to do this, see theWriting Task Functions例子。拇指规则是:我们可以更快地评估功能,更重要的是将多个函数评估结合成一个任务。

  • If it takes a long time to invoke our function, but we only need to calculate a few function values, it seems sensible to let one task consist of calculating one function value. This way, the startup cost of the job is negligible, and we can have several workers in our cluster work simultaneously on the tasks in our job.

  • 如果需要很长时间才能调用我们的功能,我们需要计算许多函数值,我们可以选择我们所呈现的两种方法中的任何一个:让任务包括调用一次或多次函数。

There is a drawback to having many tasks in a single job: Due to network overhead, it may take a long time to create a job with a large number of tasks, and during that time the cluster may be idle. It is therefore advisable to split the MATLAB operations into as many tasks as needed, but to limit the number of tasks in a job to a reasonable number, say never more than a few hundred tasks in a job.

Was this topic helpful?