276°
Posted 20 hours ago

Mining of Massive Datasets

£9.9£99Clearance
ZTS2023's avatar
Shared by
ZTS2023
Joined in 2023
82
63

About this deal

Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford CS246: Mining The focus of the book is on data mining (on large datasets) as opposed to machine learning. The distinction may strike the reader as somewhat arbitrary, given the degree of interaction between these two fields, but the authors justify it in terms of a focus on algorithms that can be applied directly to data. Although these include what is known in machine learning circles as "unsupervised learning," the book draws most heavily on databases and information retrieval sources. The first two chapters cover the relevant concepts and tools from these main sources, along with preliminaries on statistical modeling and hash functions, the latter being pervasive throughout the book. The MapReduce programming model is naturally given a prominent place and is explained in great detail. most welcome. Please let us know if you are using these materials in your course and we will list and link to your course.

It contains new material on Spark, Tensorflow, minhashing, community-finding, simrank, graph algorithms, and decision trees. Good knowledge of Java and Python will be extremely helpful since most assignments will require the use of Spark. Familiarity with basic probability theory (CS109 or Stat116 or equivalent is sufficient but not necessary).To support deeper explorations, most of the chapters are supplemented with further reading references. Familiarity with basic linear algebra (e.g., any of Math 51, Math 103, Math 113, CS 205, or EE 263 would be much more than necessary). Stanford Mining Massive Datasets graduate certificate by completing a sequence of four Stanford Computer Science courses. A graduate certificate is a great way to keep the skills and knowledge in your field current. More information is available at the Stanford Center for Professional Development (SCPD).

Clustering is the process of examining a collection of “points,” and grouping the points into “clusters” according to some distance measure. The goal is that points in the same cluster have a small distance from one another, while points in different clusters are at a large distance from one another. A suggestion of what clusters might look like was seen in Fig. 1.1. However, there the intent was that there were three clusters around three different road intersections, but two of the clusters blended into one another because they were not sufficiently separated. CS246: Mining Massive Datasets is graduate level course that discusses data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis is on Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data. You will then be able to create a class using these materials. Manuals explaining the use of the system are available Massive Datasets course. Note that the slides do not necessarily cover all the material convered in the corresponding chapters.

Are you sure you want to delete your account?

Lecture slides will be posted here shortly before each lecture. If you wish to view slides further in advance, refer to 2022 course offering's slides, which are mostly similar. To begin, we introduce the “market-basket” model of data, which is essentially a many-many relationship between two kinds of elements, called “items” and “baskets,” but with some assumptions about the shape of the data. The frequent-itemsets problem is that of finding sets of items that appear in (are related to) many of the same baskets. This introduction is followed by the book's main topics, starting with a chapter on techniques for assessing the similarity of data items in large datasets. This covers the similarity and distance measures used in conventional applications, but with special emphasis on the techniques needed to render these measures applicable to large-scale data processing. This approach is nicely illustrated by the use of min-hash functions to approximate Jaccard similarity. The next chapter focuses on mining data streams, including sampling, Bloom filters, counting, and moment estimation. Next, we consider approximate algorithms that work faster but are not guaranteed to find all frequent itemsets. Also in this class of algorithms are those that exploit parallelism, including the parallelism we can obtain through a MapReduce formulation. Finally, we discuss briefly how to find frequent itemsets in a data stream. The following is the second edition of the book. There are three new chapters, on mining large graphs, dimensionality reduction, and machine learning. There is also a revised Chapter 2 that treats map-reduce programming in a manner closer to how it is used in practice.

Asda Great Deal

Free UK shipping. 15 day free returns.
Community Updates
*So you can easily identify outgoing links on our site, we've marked them with an "*" symbol. Links on our site are monetised, but this never affects which deals get posted. Find more info in our FAQs and About Us page.
New Comment