Draft:Distributed Machine Learning

Machine learning across multiple nodes

Review waiting, please be patient.

This may take 7 weeks or more, since drafts are reviewed in no specific order. There are 1,824 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Distributed Machine Learning (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 12 days ago by Agint25 (talk: D · +) · Last edited 12 days ago by Agint25

Distributed Machine Learning (DML)

[edit ]

Distributed Machine Learning (DML) deals with the problem of analyzing data in a distributed environment by paying attention to issues like computation, communication, storage, and human factors. Like many other fields of computing, DML focuses on algorithms and systems that scale up along these dimensions. DML algorithms showed up in many different fields under different names. Distributed Data Mining ^[1] ^[2]^[3], Meta Learning ^[4], High Performance Data Mining ^[5], Privacy Preserving Distributed Data Mining ^[6] ^[7] ^[8], Federated Machine Learning ^[9], and Multi-Agent Learning^[10] are some examples. The benefits of parallel/distributed computing in machine learning have been acknowledged in many different fields, including Neural Networks ^[11], Parallel Genetic Algorithms ^[12]^[13]^[14], Multi-Agent systems ^[15], and Data Fusion, among others.

Data Models, Computation, and Topology in Distributed Machine Learning

[edit ]

DML algorithms vary depending on the nature of the data models supported by the different distributed sites. For example, all the data sites can have the same set of features but with different data tuples. Traditionally, this is called the Homogeneous data model. On the other hand, the features observed at different sites can be different, possibly with some overlaps. The latter scenario is called the Heterogeneous data model. A wide range of DML algorithms have been developed for learning from both homogeneous^[16] ^[17] and heterogeneous data ^[18]^[19]. Various DML algorithms also exist for semi-structured and unstructured data ^[20].

DML algorithms can also be classified based on how they perform computational operations. For example, we can distribute the computation of the machine learning tasks among different nodes/processors ^[21]. Depending upon the architecture, we may consider computing a single instruction with multiple data (SIMD) items simultaneously on different processors. On the other hand, we may compute multiple instructions with multiple data (MIMD) items in parallel. We may also design a DML architecture where different nodes are highly interdependent and often work in a synchronized manner (tightly coupled). On the other hand, nodes may be loosely coupled --- work fairly independently by exchanging messages, often asynchronously.

Network topology also plays an important role in the design of DML algorithms. One may create an overlay network topology for designing how different nodes are going to communicate with each other. For example, we may select a client-server type topology where the client nodes communicate only with the server. On the other hand, we may use a peer-to-peer topology where there is no single server and every node communicates only with a small number of nodes (the neighbors). Often, local and asynchronous algorithms are used for such P2P DML applications ^[22].

Distributed Representation Construction

[edit ]

Principal Component Analysis (PCA) is frequently used for creating a low-dimensional representation of the data by constructing features that capture the maximally varying directions in the data. PCA is often used for clustering, classification, and predictive model building.

PCA from distributed homogeneous data is relatively straightforward in most cases. Since the covariance matrix is additively decomposable, one can simply compute the covariance matrix at each of the local participating sites and send those matrices to the central site. The central site can construct the global covariance matrix by adding the local covariance matrices with appropriate weights. This can be followed by a regular PCA of the global covariance matrix. The global eigenvectors can be broadcast to the local sites, and they can be subsequently used for projecting the local data for clustering and other related applications.

PCA from distributed heterogeneous data is a relatively more challenging problem. The Collective Principal Component Analysis (CPCA) algorithm ^[23]^[24] offers one way to perform distributed PCA from heterogeneous sites. The main steps of the CPCA algorithm are given below:

Perform local PCA at each site; select dominant eigenvectors and project the data along them.
Send a sample of the projected data along with the (dominant) eigenvectors.
Combine the projected data from all the sites.
Perform PCA on the global data set, identify the dominant eigenvectors, and transform them back to the original space.

To compute exact Principal Components (PCs), in principle, we need to reconstruct the original data from all projected local samples. However, since the PCA is invariant under a linear transformation, the global PCs are computed directly from projected samples.

Distributed Clustering

[edit ]

A wide range of distributed clustering algorithms have been reported in the DML literature. They can be grouped based on the type of data model supported by the distributed nodes.

Distributed Clustering from Homogeneous Data

[edit ]

Forman and Zhang^[25] proposed a center-based distributed clustering algorithm from Homogeneous data sites that only requires the exchange of sufficient statistics, which is essentially an extension of their earlier parallel clustering work^[26]. The Recursive Agglomeration of Clustering Hierarchies by Encircling Tactic (RACHET)^[27] is also based on the exchange of sufficient statistics. It collects local dendrograms that are merged into a global dendrogram. Each local dendrogram contains descriptive statistics about the local cluster centroid that are sufficient for the global aggregation. Both approaches iterate until the sufficient statistics converge or the desired quality is achieved.

Parthasarathy and Ogihara^[28] note that finding a suitable distance metric is an important problem in clustering, including distributed clustering. They define one such metric based on association rules.

The PADMA system^[29] is yet another distributed clustering-based system for document analysis from homogeneous data sites. Distributed clustering in PADMA is aided by relevance feedback-based supervised learning techniques. Additional work on parallel and distributed clustering is reported elsewhere^[30]^[31].

Distributed Clustering from Heterogeneous Data

[edit ]

McClean and her colleagues^[32] consider the clustering of heterogeneous distributed databases. They particularly focus on clustering heterogeneous data cubes comprised of attributes from different domains. They utilize Euclidean distance and Kullback-Leiber information divergence to measure differences between aggregates.

Clustering heterogeneous, distributed data sets constitutes an important class of problems. Kargupta, et. al.^[33] proposed a distributed clustering algorithm based on CPCA. This technique first applies a given off-the-shelf clustering algorithm to the local Principal Components (PCs). Then the global PCs are obtained from an appropriate data subset (projected) that is the union of all representative points from local clusters. Each site projects local data on the global PCs and again obtains new clusters, which are subsequently combined at the central site. A collective approach toward hierarchical clustering is proposed elsewhere^[34].

An ensemble-based approach to combining multiple clustering is proposed by Strehl and Ghosh^[35]. Given different clustering (possibly different number of clusters in each clustering), they propose a framework to construct an ensemble of clusters in a way to maximize the shared information between original clusters. In order to quantify the shared information, they use a mutual information-based approach. Mutual information essentially denotes that how two clusters are similar in terms of distributions of shared objects.

A distributed clustering algorithm for analyzing click-stream data is reported elsewhere^[36]. This algorithm works by generating local clusterings and then combining them by analyzing the local cluster descriptions. A cluster is represented using a set of transaction IDs. The combining phase uses duplicate cluster removal and a technique for generating maximal large itemsets (where items correspond to the transaction IDs) to define the new global clusters.

Distributed Supervised Learning

[edit ]

Just like unsupervised DML algorithms, their supervised counterparts can also be grouped based on the distributed data models they are designed to work with.

Distributed Supervised Learning from Homogeneous Data

[edit ]

Many of the DML algorithms for distributed supervised learning from homogeneous data sites are related to ensemble learning techniques^[37] ^[38]^[39]^[40]. The ensemble approach produces multiple models (base predictors) and combines the outputs of the base modules in order to enhance accuracy. Different models can be generated at different sites and ultimately aggregated using ensemble strategies. Several ensemble-based techniques have been reported in the literature.

Fan et al.^[41] discussed an AdaBoost-based ensemble approach from this perspective. Breiman^[42] considered Arcing as a way to aggregate multiple blocks of data, especially in on-line setting. An experimental investigation of Stacking^[43] for combining multiple models was reported elsewhere^[44]. The meta-learning framework^[45]^[46] offers another possible approach to learn classifiers from homogeneous, distributed data. In this approach, supervised learning techniques are first used to learn classifiers at local data sites; then meta-level classifiers are constructed by either learning from a data set generated using the locally learned concepts or combining local classifiers using ensemble techniques. The meta-level learning may be applied recursively, producing a hierarchy of meta-classifiers. Meta learning follows three main steps:

Generate base classifiers at each site using a classifier learning algorithm.
Collect the base classifiers at a central site. Produce meta-level data from a separate validation set and predictions generated by the base classifier on it.
Generate the final classifier (meta-classifier) from meta-level data. Learning at the meta-level can work in many different ways. For example, we may generate a new dataset using the locally learned classifiers. We may also move some of the original training data from the local sites, blend it with the data artificially generated by the local classifiers, and then run a learning algorithm to learn the meta-level classifiers. We may also decide the output of the meta-classifier by counting votes cast by different base models. A meta-learning-type technique known as knowledge probing is reported in^[47]. A Java-based distributed system for meta-learning is reported elsewhere^[48]^[49]. Meta-learning illustrates two characteristics of DML algorithms — parallelism and reduced communication. All base classifiers are generated in parallel and collected at the central location along with the validation set.

Distributed Supervised Learning from Heterogenous Data

[edit ]

Homogeneous DML algorithms usually do not work well for learning from heterogeneous distributed data. In the heterogeneous case, each local site observes only a subset of features. Therefore, a DML algorithm must be able to learn a model using different features observed at different sites without downloading all the data to a single location. Ensemble-based approaches described in the previous subsection usually generate high-variance local models^[50] and fail to detect the interaction between features observed at different sites. This makes the problem fundamentally challenging.

In some applications, heterogeneous DML may not require detecting interactions between features from different sites. In other words, the underlying problem may be node-wise decomposable. This scenario is relatively easy to handle. An ensemble-based approach to learn distributed classifiers is likely to work well for this case. Even if the application does not involve distributed data, vertical partitioning of data for decomposing the learning problem into smaller sub-problems using a data parallel approach can speed up the process^[51]. However, the assumption of node-wise decomposability is not necessarily correct in every application. In the general case, heterogeneous DML may require building classifiers using non-linearly interacting features from different sites.

The WoRLD system^[52] also works by making some assumptions about the class of DML problems. It works by collecting first order statistics from the data. It considers the problem of concept learning from heterogeneous sites by developing an "activation spreading" approach. This approach first computes the cardinal distribution of the feature values in the individual data sets. Next, this distribution information is propagated across different sites. Features with strong correlations with the concept space are identified based on the first order statistics of the cardinal distribution. Since the technique is based on the first order statistical approximation of the underlying distribution, it may not be appropriate for machine learning problems where concept learning requires higher order statistics.

There exist a few DML algorithms that use an ensemble of classifiers for mining heterogeneous data sites. However, these techniques use special-purpose aggregation algorithms in order to handle some of the issues discussed earlier in this section. The aggregation technique proposed in^[53] uses an order statistics-based approach for combining high variance local models generated from heterogeneous sites. The technique works by ordering the predictions of different classifiers and using them in an appropriate manner. Their work developed several methods, including the selection of an appropriate order statistic as the classifier and taking a linear combination of some of the order statistics ("spread" and "trimmed mean" classifiers). It also analyzes the error of such a classifier in various situations. Although these techniques are more robust than other ensemble-based models, they do not explicitly consider interactions across multiple sites.

Park and his colleagues^[54] developed a technique to learn decision trees from heterogeneous, distributed sites. The approach can be classified as an ensemble-based approach. However, they also proposed a Fourier spectrum-based technique to aggregate the ensemble of decision trees. They note that any pattern involving features from different sites cannot be captured by the simple aggregation of local classifiers generated using only the local features. In order to detect such patterns, they first identify a subset of data that none of the local classifiers can classify with high confidence. This subset of the data is merged at the central site, and another classifier (central classifier) is constructed from it. When a combination of local classifiers cannot classify a new observation with a high confidence, the central classifier is used instead. This approach exhibits a better performance than a simple aggregation of local models. However, its performance is sensitive to the confidence threshold.

Kargupta and his colleagues considered a Collective framework to address data analysis in heterogeneous DML environments and proposed the CDM^[55] framework. CDM can be deployed for learning classifiers and predictive models from distributed data. Instead of combining incomplete local models, it seeks to find globally meaningful pieces of information from each local site. In other words, it obtains local building blocks that directly constitute the global model. Given a set of labeled training data, CDM learns a function that approximates it. The foundation of CDM is based on the observation that any function can be represented in a distributed fashion using an appropriate set of basis functions. When the basis functions are orthonormal, the local analysis produces correct and useful results that can be directly used as a component of the global model without any loss of accuracy. The main steps of CDM can be summarized as follows:

Generate approximate orthonormal basis coefficients at each local site.
Move an appropriately chosen sample of the data sets from each site to a single site and generate the approximate basis coefficients corresponding to non-linear cross terms.
Combine the local models, transform the model into the user described canonical representation, and output the model.

Here non-linear terms represent a set of coefficients (or patterns) that cannot be determined at a local site. In essence, the performance of a CDM model depends on the quality of estimated cross-terms. Typically, CDM requires an exchange of a small sample that is often negligible compared to the entire data.

The CDM approach was originally explored using two important classes of function induction problems — learning decision trees and multivariate regressors. Fourier and Wavelet-based representations of functions have been proposed elsewhere^[56] for constructing decision trees and multivariate regressors, respectively. The Fourier spectrum-based approach works by estimating the Fourier Coefficients (FCs) from the data. It estimates the local FCs from the local data and FCs involving features from different data sites using a selected small subset of data collected at the central site. It has been shown elsewhere^[57]^[58] that one can easily compute the Fourier spectrum of a decision tree and vice versa. This observation can be exploited to construct decision trees from the estimated FCs. However, fast estimation of FCs from data is a non-trivial job. Estimation techniques usually work well when the data is uniformly distributed. This problem is addressed by the development of a resampling-based technique^[59] for the estimation of the Fourier spectrum.

The collective multivariate regression ^[60] chooses a wavelet basis to represent local data. For each feature in data, wavelet transformation is applied, and significant coefficients are collected at the central site. Then the regression is performed directly on the wavelet coefficients. This approach has a significant advantage in communication reduction since a set of wavelet coefficients usually represents raw data in a highly compressed format.

Choen et al.^[61] developed an algorithm for distributed asynchronous deep neural network training using a single momentum buffer to mitigate gradient staleness. This approach tries to improve the scalability, stability, and speed of training large neural networks in a distributed environment.

Scaling Up Machine Learning Using High Performance Machines

[edit ]

The field of high-performance parallel and distributed computing is also closely related to DML in many ways. High-performance parallel computing environments are widely used for scaling up machine learning from very large data sets. There exists a large volume of high-performance machine learning and data mining literature^[62]^[63]^[64]^[65]^[66]^[67]^[68]^[69].

Peer-to-Peer (P2P) DML

[edit ]

Algorithms for machine learning over P2P networks can be grouped in the following main areas: (1) heuristics-based, (2) broadcast-based, (3) gossip-based, and (4) local algorithms.

The P2P k-Means algorithm by Bandyopadhyay et al.^[70] is an example of the heuristics-based approach. In this category of algorithms, usually a peer learns a model based on its own data and the data collected from its neighbors. Often, these algorithms do not come with accuracy guarantees.

Sharfman et al.^[71] reported broadcast-based algorithms for P2P systems. Since these algorithms rely on broadcasts-based communication, the communication cost usually increases fast as the number of nodes increases.

Gossip algorithms rely on the properties of random samples to provide probabilistic guarantees on the accuracy of the results. Researchers have proposed various algorithms for computing data aggregates, such as the average, sum, and max on P2P networks using gossip-based techniques. Kempe et al.^[72] and Boyd et al.^[73] present such primitives. In gossip protocols, a peer exchanges data or statistics with a random peer. However, they can still be quite costly.

Local distributed algorithms have been proposed for data analysis in P2P networks. Local algorithms rely upon data-dependent conditions (local rules) in order to stop propagating messages. This means that if the data distribution does not change, the communication overhead is relatively low. On the other hand, the local rules are violated when the distribution changes. Local algorithms were first introduced in the context of graph theory by Afek et al.^[74] and Linial^[75]. Local algorithms have been developed for several data analysis problems: association rule mining^[76], outlier detection^[77], meta-classification^[78], Eigen-monitoring^[79], Decision Tree induction^[80], Top-k Inner Product monitoring^[81] over P2P networks.

Privacy Preserving Distributed Machine Learning

[edit ]

Privacy is an important issue in many applications. A growing body of literature on privacy-sensitive DML is emerging. These algorithms can be divided into different groups based on the model of privacy they adopt. One approach adopts a distributed framework with various supported models of privacy. On the other hand, there exist some approaches that add randomized perturbations to the data in such a way that the individual data values are distorted while still preserving the underlying distribution properties at a macroscopic level. The following part of this section briefly discusses these two approaches.

The distributed approach supports the computation of machine learning models and extraction of "patterns" at a given node by exchanging only the minimal necessary information among the participating nodes without transmitting the raw data. The field of DML^[82] offers several distributed algorithms that are sensitive to privacy. For example, the meta-learning-based JAM system^[83] was designed for analyzing multi-party distributed sensitive data, such as financial fraud detection. The Fourier spectrum-based approach to represent and construct decision trees^[84]^[85], the Collective hierarchical clustering^[86] are examples of additional distributed machine learning algorithms that may have applications in privacy-preserving learning from distributed data. Several additional distributed techniques to analyze multi-party data have been reported. A privacy-preserving technique to construct decision trees^[87] proposed elsewhere^[88], a multi-party secured computation framework^[89] , association rule mining from homogeneous^[90] and heterogeneous^[91] distributed data sets are some examples. There also exists a collection of useful secure multi-party computation primitives (e.g., secure sum computation^[92], secure scalar product computation) that can be used for developing distributed privacy-preserving machine learning algorithms.

There is also a somewhat different approach, and the algorithms that belong to this group work by first perturbing the data using randomized techniques. The perturbed data is then used to extract the patterns and models. The randomized value distortion technique for learning decision trees and association rule learning^[93] are examples of this approach. Additional work on randomized masking of data can be found elsewhere^[94]. However, Kargupta^[95] showed that simple additive noise may not be suitable for privacy protection.

Federated Machine Learning

[edit ]

A restricted form of DML that primarily seems to focus on iterative deep learning from distributed data. More information about Federated Machine Learning can be found elsewhere^[96].

References

[edit ]

^ Kargupta H. and Sivakumar K. Existential Pleasures of Distributed Data Mining. In Data Mining: Next Generation Challenges and Future Directions, edited by H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha, MIT/AAAI Press, 2004.
^ Kargupta, H., & Chan, P. Advances in Distributed Data Mining. MIT/AAAI Press, 2000.
^ DML Bibliography. https://agnik.com/sparks/dmlbib.html
^ Chan, P., & Stolfo, S. J. A Comparative Evaluation of Voting and Meta-learning on Partitioned Data. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 90–98), 1995. https://doi.org/10.1016/B978-1-55860-377-6.50020-7
^ M. Joshi, E. Han, G. Karypis, and V. Kumar. Parallel alogrithms for data mining. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000. https://hdl.handle.net/11299/215466
^ M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In SIGMOD Workshop on DMKD, Madison, WI, June 2002. https://doi.org/10.1109/TKDE.2004.45
^ H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the Privacy Preserving Properties of Random Data Perturbation Techniques. Proceedings of the IEEE International Conference on Data Mining. Melbourne , Florida , USA , pp. 99-106, 2003. https://doi.org/10.1109/ICDM.2003.1250908
^ Gilburd, Bobi, Assaf Schuster, and Ran Wolff. "Privacy-preserving data mining on data grids in the presence of malicious participants." Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, IEEE, 2004. https://doi.org/10.1109/HPDC.2004.1323540
^ McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2016). Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv preprint arXiv:1602.05629. https://doi.org/10.48550/arXiv.1602.05629
^ A. Joshi. To learn or not to learn. In Gerhard Weiß and Sundip Sen, editors, Adaption and Learning in Multi-Agent Systems, number 1042 in Lecture Notes in Computer Science: Lecture Notes in Artificial Intelligence, pages 127–139, New York, 1995. Springer-Verlag. Proceedings IJCI'95 Workshop, Montreal, Canada, 1995
^ Rumelhart, D. E., McClelland, J. L., & PDP Research Group. Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations. MIT Press, 1986. https://doi.org/10.7551/mitpress/5236.001.0001
^ Holland, J. H. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, 1975.
^ David E. Goldberg, "Genetic Algorithms in Search, Optimization and Machine Learning," Kluwer Academic Publishers, Boston, MA, 1989.
^ Cantú-Paz, Erick. "A survey of parallel genetic algorithms." Calculateurs paralleles, reseaux et systems repartis 10.2 (1998): 141-171.
^ A. Joshi. To learn or not to learn. In Gerhard Weiß and Sundip Sen, editors, Adaption and Learning in Multi-Agent Systems, number 1042 in Lecture Notes in Computer Science: Lecture Notes in Artificial Intelligence, pages 127–139, New York, 1995. Springer-Verlag. Proceedings IJCI'95 Workshop, Montreal, Canada, 1995.
^ H. Kargupta, I. Hamzaoglu, and B. Stafford. Scalable, distributed data mining using an agent based architecture. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, Proceedings of Knowledge Discovery And Data Mining, pages 211–214, Menlo Park, CA, 1997. AAAI Press.
^ S. Stolfo et al. Jam: Java agents for meta-learning over distributed databases. In Proceedings Third International Conference on Knowledge Discovery and Data Mining, pages 74–81, Menlo Park, CA, 1997. AAAI Press
^ H. Kargupta, W. Huang, Krishnamoorthy S., and E. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems Journal Special Issue on Distributed and Parallel Knowledge Discovery, 3:422–448, 2001. https://doi.org/10.1007/PL00011677
^ Alexander Strehl and Joydeep Ghosh. Cluster ensembles – a knowledge reuse framework for combining partitionings. In Proceedings of AAAI 2002, Edmonton, Canada, July 2002. AAAI.
^ Eric Bauer and Ron Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1–2):105–139, 1999. https://doi.org/10.1023/A:1007515423169
^ M. Joshi, E. Han, G. Karypis, and V. Kumar. Parallel alogrithms for data mining. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000. https://hdl.handle.net/11299/215466
^ Schuster, Assaf, Ran Wolff, and Dan Trock. "A high-performance distributed algorithm for mining association rules." Knowledge and Information Systems 7.4 (2005): 458-475. https://doi.org/10.1007/s10115-004-0176-3
^ H. Kargupta, W. Huang, S. Krishnamrthy, B. Park, and S. Wang. Collective principal component analysis from distributed, heterogeneous data. In Proceedings of the Principals of Data Mining and Knowledge Discovery, May 2000. https://doi.org/10.1007/3-540-45372-5_50
^ H. Kargupta, W. Huang, Krishnamoorthy S., and E. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems Journal Special Issue on Distributed and Parallel Knowledge Discovery, 3:422–448, 2001. https://doi.org/10.1007/PL00011677
^ G. Forman and B. Zhang. Distributed data clustering can be efficient and exact.In SIGKDD Explorations, volume 2 of 2, pages 34–38. ACM Press, New York, 2000.
^ B. Zhang, M. Hsu, and G. Forman. Accurate recasting of parameter estimation algorithms using sufficient statistics for efficient parallel speed-up: Demonstrated for center-bas ed data clustering algorithms. In PKDD, September 2000. https://doi.org/10.1007/3-540-45372-5_24
^ N.F. Samatova, G. Ostrouchov, A. Geist, and A. Melechko. Rachet: An efficient cover-based merging of clustering hierarchies from distributed datasets. An International Journal of Distributed and Parallel Databases, 11(2):157–180, 2002. https://doi.org/10.1023/A:1013988102576
^ S. Parthasarathy and M. Ogihara. Clustering distributed homogeneous datasets. In PDKK, pages 566–574, 2000. https://doi.org/10.1007/3-540-45372-5_67
^ Eric Bauer and Ron Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1–2):105–139, 1999. https://doi.org/10.1023/A:1007515423169
^ Inderjit Dhillon and Dharmendra Modha. A data-clustering algorithm on distributed memory multiprocessors. In Proceedings of the KDD’99 Workshop on High Performance Knowledge Discovery, pages 245–260, 1999. https://doi.org/10.1023/A:1007612920971
^ B. Zhang, M. Hsu, and G. Forman. Accurate recasting of parameter estimation algorithms using sufficient statistics for efficient parallel speed-up: Demonstrated for center-bas ed data clustering algorithms. In PKDD, September 2000. https://doi.org/10.1007/3-540-45372-5_24
^ S. McClean, B. Scotney, and K. Greer. Conceptual clustering heterogeneous distributed databases. In Workshop on Distributed and Parallel Knowledge Discovery, Boston, MA, USA, 2000.
^ H. Kargupta, W. Huang, Krishnamoorthy S., and E. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems Journal Special Issue on Distributed and Parallel Knowledge Discovery, 3:422–448, 2001. https://doi.org/10.1007/PL00011677
^ E. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Lecture Notes in Computer Science, volume 1759, pages 221–244. Springer-Verlag, 1999. https://doi.org/10.1007/3-540-46502-2_12
^ Alexander Strehl and Joydeep Ghosh. Cluster ensembles – a knowledge reuse framework for combining partitionings. In Proceedings of AAAI 2002, Edmonton, Canada, July 2002. AAAI.
^ M. Sayal and P. Scheuermann. A distributed clustering algorithm for web-based access patterns. In Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, pages 41–48, Boston, 2000.
^ Eric Bauer and Ron Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1–2):105–139, 1999. https://doi.org/10.1023/A:1007515423169
^ Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 40(2):139–158, 2000. https://doi.org/10.1023/A:1007607513941
^ Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 40(2):139–158, 2000. https://doi.org/10.1023/A:1007607513941
^ David Opitz and Richard Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169–198, 1999. https://doi.org/10.1613/jair.614
^ Wei Fan, Sal Stolfo, and Junxin Zhang. The application of Adaboost for distributed, scalable and on-line learning. In Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 362–366, San Diego, California, 1999
^ Leo Breiman. Pasting small votes for classification in large databases and online. Machine Learning, 36(1–2):85–103, 1999. https://doi.org/10.1023/A:1007563306331
^ D. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992. https://doi.org/10.1016/S0893-6080(05)80023-1
^ K.M. Ting and B.T. Low. Model combination in the multiple-data-base scenario. In 9th European Conference on Machine Learning, pages 250–265, 1997. https://doi.org/10.1007/3-540-62858-4_90
^ P. Chan and S. Stolfo. Experiments on multistrategy learning by meta-learning. In Proceeding of the Second International Conference on Information Knowledge Management, pages 314–323, 1993.
^ P. Chan and S. Stolfo. Toward scalable learning with non-uniform class and cost distribution: A case study in credit card fraud detection. In Proceeding of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press, September 1998.
^ Y. Guo and J. Sutiwaraphun. Distributed learning with knowledge probing: A new framework for distributed data mining. In Hillol Kargupa and Phillip Chan, editors, Advances in Distributed and Parallel Knowledge Discovery, pages 113– 131. AAAI Press, 2000. https://doi.org/10.1007/s100440200035
^ W. Lee, S. Stolfo, and Kui Mok. Adaptive intrusion detection: a data mining approach. Artificial Intelligence Review, 14(6):533–567, December 2000. https://doi.org/10.1023/A:1006624031083
^ S. Stolfo et al. Jam: Java agents for meta-learning over distributed databases. In Proceedings Third International Conference on Knowledge Discovery and Data Mining, pages 74–81, Menlo Park, CA, 1997. AAAI Press.
^ K. Tumer and J. Ghosh. Robust order statistics based ensemble for distributed data mining. In Hillol Kargupta and Philip Chan, editors, Advances in Distributed and Parallel Knowledge Discovery. MIT, 2000. https://www.researchgate.net/publication/2603941_Distributed_Data_Mining_Bibliography
^ F. J. Provost and B. Buchanan. Inductive policy: The pragmatics of bias selection. Machine Learning, 20:35–61, 1995. https://doi.org/10.1023/A:1022634118255
^ J. Aronis, V. Kolluri, F. Provost, and B. Buchanan. The WoRLD: Knowledge discovery and multiple distributed databases. In Proceedings of Florida Artificial Intelligence Research Symposium (FLAIRS-97), 1997. https://www.researchgate.net/publication/2685085_The_WoRLD_Knowledge_Discovery_from_Multiple_Distributed_Databases
^ K. Tumer and J. Ghosh. Robust order statistics based ensemble for distributed data mining. In Hillol Kargupta and Philip Chan, editors, Advances in Distributed and Parallel Knowledge Discovery. MIT, 2000. https://www.researchgate.net/publication/2603941_Distributed_Data_Mining_Bibliography
^ B. Park, H. Kargupta, E. Johnson, E. Sanseverino, D. Hershberger, and L. Silvestre. Distributed, collaborative data analysis from heterogeneous sites using a scalable evolutionary technique. Applied Intelligence, 16(1), January 2002. https://doi.org/10.1023/A:1012813326519
^ H. Kargupta, B. Park, D. Hershberger, and E. Johnson. Collective data mining: a new perspective towards distributed data mining. In Hillol Kargupta and Philip Chan, editors, Advances in Distributed and Parallel Knowledge Discovery. AAI/MIT Press, 2000. https://www.researchgate.net/publication/2631605_Collective_Data_Mining_A_New_Perspective_Toward_Distributed_Data_Mining
^ Daryl E. Hershberger and Hillol Kargupta. Distributed multivariate regression using wavelet-based collective data mining. Journal of Parallel and Distributed Computing, 61(3):372–400, 2001. https://doi.org/10.1006/jpdc.2000.1694
^ H. Kargupta, K. Sivakumar, and S. Ghosh. Dependency detection in mobimine and random matrices. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 250–262. Springer, 2002. https://doi.org/10.1007/3-540-45681-3_21
^ B. Park, R. Ayyagari, and H. Kargupta. A Fourier analysis-based approach to learn classifier from distributed heterogeneous data. In Proceedings of the First SIAM International Conference on Data Mining, Chicago, US, 2001. https://doi.org/10.1137/1.9781611972719.19
^ R. Ayyagari and H. Kargupta. A resampling technique for learning the Fourier spectrum of skewed data. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’2002), Madison, WI, June 2002. https://www.researchgate.net/publication/2529306_A_Resampling_Technique_for_Learning_the_Fourier_Spectrum_of_Skewed_Data
^ Daryl E. Hershberger and Hillol Kargupta. Distributed multivariate regression using wavelet-based collective data mining. Journal of Parallel and Distributed Computing, 61(3):372–400, 2001. https://doi.org/10.1006/jpdc.2000.1694
^ Cohen, Refael, Ido Hakimi, and Assaf Schuster. "SMEGA2: distributed asynchronous deep neural network training with a single momentum buffer." Proceedings of the 51st International Conference on Parallel Processing. 2022. https://doi.org/10.1145/3545008.3545010
^ Chan, P., & Stolfo, S. J. A Comparative Evaluation of Voting and Meta-learning on Partitioned Data. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 90–98), 1995. https://doi.org/10.1016/B978-1-55860-377-6.50020-7
^ Alex A. Freitas and Simon H. Lavington. Mining Very Large Databases With Parallel Processing. Kluwer Academic Publishers, 1998.
^ J. Han, J. Chiang, S. Chee, J. Chen, Q. Chen, S. Cheng, W. Gong, M. Kamber, K. Koperski, G. Liu, Y. Lu, N. St-fanovic, L. Winstone, B. Xia, O. R. Zaiane, S. Zhang, and H. Zhu. Dbminer: A system for data mining in relational databases and data warehouses. In Proc. CASCON’97: Meeting of Minds, Toronto, Canada, November 1997.
^ M. Joshi, E. Han, G. Karypis, and V. Kumar. Parallel alogrithms for data mining. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000. https://hdl.handle.net/11299/215466
^ K. Kamath and R. Musick. Scalable data mining through fine-grained parallelism: The present and the future. In H. Kargupta and P. Chan, editors, Advances in Distributed and Parallel Knowledge Discovery, pages 29–77. MIT Press, 2000.
^ S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems, 3(1):1–29, 2001. https://doi.org/10.1007/PL00011656
^ M. J. Zaki, S. Parthasarathy, and W. Li. A localized algorithm for parallel association mining. In 9th ACM Symp. Parallel Algorithms and Architectures, pages 321–330, 1997. https://doi.org/10.1145/258492.258524
^ M. J. et. al. Zaki. Parallel data mining for association rules on shared memory multi-processors. In Supercomputng ’96, 1996. https://doi.org/10.1145/369028.369117
^ S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering distributed data streams in peer-to-peer environments. Information Science, 176(14):1952–1985, 2006. https://doi.org/10.1016/j.ins.200511007
^ I. Sharfman, A. Schuster, and D. Keren. A geometric approach to monitoring threshold functions over distributed data streams. In ACM SIGMOD, pages 301–312, Chicago, Illinois, June 2006. https://doi.org/10.1145/1292609.1292613
^ D. Kempe, A. Dobra, and J. Gehrke. Computing aggregate information using gossip. In FOCS, Cambridge, 2003.
^ S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Gossip algorithms: Design, analysis and applications. In Proceedings IEEE Infocom, pages 1653–1664, Miami, March 2005. https://doi.org/10.1109/INFCOM.2005.1498447
^ Afek, S. Kutten, and M. Yung. Local detection for global self stabilization. In Theoretical Computer Science, 186(1–2):199–230, October 1997. https://doi.org/10.1016/S0304-3975(96)00286-1
^ N. Linial. Locality in distributed graph algorithms. SIAM Journal of Computing, 21:193–201, 1992. https://doi.org/10.1137/0221015
^ Wolff and A. Schuster. Association rule mining in peer-to-peer systems. In ICDM, 2003. https://doi.org/10.1109/TSMCB.2004.836888
^ Branch, B. Szymanski, C. Gionnella, R. Wolff, and H. Kargupta. In-network outlier detection in wireless sensor networks. In Proceedings of ICDCS’06, Lisbon, Portugal, July 2006. https://doi.org/10.1007/s10115-011-0474-5
^ Ping Luo, Hui Xiong, Kevin Lu, and Zhongzhi Shi. Dis tributed classification in peer-to-peer networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 968–976, 2007. https://doi.org/10.1145/1281192.1281296
^ K. Bhaduri, K. Das, K. Borne, C. Giannella, T. Mahule, H. Kargupta. (2011) Scalable, Asynchronous, Distributed Eigen-Monitoring of Astronomy Data Streams. Statistical Analysis and Data Mining Journal. Volume 4, Issue 3, pp. 336-352. June 2011. https://doi.org/10.1002/sam.10120
^ K. Bhaduri, R. Wolff, C. Giannella, H. Kargupta. (2008). Distributed Decision Tree Induction in Peer-to-Peer Systems. Statistical Analysis and Data Mining. Volume 1, Issue 2, pp. 85-103. https://doi.org/10.1002/sam.10006
^ K. Das, K Bhaduri, K. Liu, and H. Kargupta. Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network. IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 20, No. 4, pp. 475-488, April 2008. https://doi.org/10.1109/TKDE.2007.190714
^ H. Kargupta, B. Park, D. Hershberger, and E. Johnson. Collective data mining: a new perspective towards distributed data mining. In Hillol Kargupta and Philip Chan, editors, Advances in Distributed and Parallel Knowledge Discovery. AAI/MIT Press, 2000. https://www.researchgate.net/publication/2631605_Collective_Data_Mining_A_New_Perspective_Toward_Distributed_Data_Mining
^ S. Stolfo et al. Jam: Java agents for meta-learning over distributed databases. In Proceedings Third International Conference on Knowledge Discovery and Data Mining, pages 74–81, Menlo Park, CA, 1997. AAAI Press.
^ H. Kargupta, B. Park, B. Mining Time-Critical Data Streams from Mobile Devices using Decision Trees and Their Fourier Spectrum. Accepted for publication in the IEEE Transaction on Knowledge and Data Engineering, 2003.
^ B. Park, R. Ayyagari, and H. Kargupta. A Fourier analysis-based approach to learn classifier from distributed heterogeneous data. In Proceedings of the First SIAM International Conference on Data Mining, Chicago, US, 2001. https://doi.org/10.1137/1.9781611972719.19
^ E. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Lecture Notes in Computer Science, volume 1759, pages 221–244. Springer-Verlag, 1999. https://doi.org/10.1007/3-540-46502-2_12
^ J. Ross Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986. https://doi.org/10.1007/BF00116251
^ Y. Lindell and B. Pinkas. Privacy preserving data mining. Lecture Notes in Computer Science, 1880:36–54, 2000. https://doi.org/10.1007/3-540-44598-6_3
^ W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: A review and open problems. In New Security Paradigms Workshop, pages 11 – 20, 2001. https://doi.org/10.1145/508171.508174
^ M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In SIGMOD Workshop on DMKD, Madison, WI, June 2002. https://doi.org/10.1109/TKDE.2004.45
^ J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data, July 2002. https://doi.org/10.1145/775047.775142
^ B. Schneier. Applied cryptography. John Wiley and Sons, 1995.
^ S. Evfimievski. Randomization techniques for privacy preserving association rule mining. In SIGKDD Explorations, volume 4(2), Dec 2002. https://doi.org/10.1145/772862.772869
^ J. F. Traub, Y. Yemini, and H. Woz’niakowski. The statistical security of a statistical database. ACM Transactions on Database Systems (TODS), 9(4):672– 679, 1984. https://doi.org/10.1145/1994.383392
^ H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the Privacy Preserving Properties of Random Data Perturbation Techniques. Proceedings of the IEEE International Conference on Data Mining. Melbourne , Florida , USA , pp. 99-106, 2003. https://doi.org/10.1109/ICDM.2003.1250908
^ McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2016). Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv preprint arXiv:1602.05629. https://doi.org/10.48550/arXiv.1602.05629

Retrieved from "https://en.wikipedia.org/w/index.php?title=Draft:Distributed_Machine_Learning&oldid=1323541099"