Future Generation Computer Systems 111 (2020) 570-581 Contents lists available at

partitions. The application starts by generating random cen-

Download 1,11 Mb.

Pdf ko'rish

bet	6/19
Sana	04.03.2022
Hajmi	1,11 Mb.
	#483111

1 2 3 4 5 6 7 8 9 ... 19

Bog'liq
Efficient development of high performance data analytics

partitions. The application starts by generating random cen-
ters (line 3). Then, the application iterates until a fixed num-
ber of iterations or until convergence (line 7). In each itera-
tion, the application computes the partial sums of each par-
tition given the current centers (line 11–12). To achieve this,
the application spawns a
cluster_points_sum
task for each
partition (line 12). Then, the application merges together the
result of the
cluster_points_sum
tasks in a reduction pro-
cess (line 14). The
merge_reduce
function iteratively calls the
merge_reduce_task
task until a single dictionary of centers,
number of vectors, and sum of these vectors is left. Finally, the
application divides the sum of the vectors by the number of
vectors to get the new mean of each center (line 16).
Note that the sum of vectors of each partition is processed in
parallel, and that the reduction process accumulates these sums
also in a distributed manner. Synchronization only occurs at the
end of the iteration to get the final sum of vectors (line 15). In
this manner, the application never transfers vectors, but means
and center identifiers. This helps to reduce communication costs,
especially in large datasets.
4.2. Cascade support vector machines
As said before, C-SVM is a parallel version of support vector
machines (SVM), a widely used classification algorithm. The ob-
jective of a classification algorithm is to build a decision function
from a set of input vectors that belong to a specific category.
Then, this decision function can be used to categorize other
vectors whose category is unknown. The process of building the
decision function is called
training
, and the process of categorizing
unknown vectors is called
prediction
. Different categories are also
known as
labels
, and vector dimensions are called
features
. In
the case of SVM, the decision function is a function of a subset
of the input vectors, which are called the
support vectors
. The
SVM algorithm finds the subset of the input vectors that better
represent the categories in the input dataset, which are typically
two. Finding the support vectors of a given dataset is a quadratic
optimization problem.
The main idea behind the C-SVM algorithm is to split the set of
input vectors into
N

Download 1,11 Mb.

Do'stlaringiz bilan baham:

1 2 3 4 5 6 7 8 9 ... 19