41
10.20.2005 A Hadoop Primer Feb 2011

A Hadoop Primer

  • Upload
    sogrady

  • View
    2.934

  • Download
    1

Embed Size (px)

DESCRIPTION

A simple introduction to Hadoop talk given to the Maine Java Users' Group February 15, 2011.

Citation preview

Page 1: A Hadoop Primer

10.20.2005

A Hadoop Primer

Feb 2011

Page 2: A Hadoop Primer

2

http://redmonk.com/public/hadoop.pdf

Page 3: A Hadoop Primer

3

The Background

Page 4: A Hadoop Primer

4

October, 2003

Page 5: A Hadoop Primer

5

December, 2004

Page 6: A Hadoop Primer

6

Map::Reduce

Page 7: A Hadoop Primer

7

Job::Map Reduce::Output

Page 8: A Hadoop Primer

8

Counting Shakespeare

Page 9: A Hadoop Primer

9

The Birth of Hadoop

Page 10: A Hadoop Primer

10

Page 11: A Hadoop Primer

11

Page 12: A Hadoop Primer

12

Project Architecture

Source: Running Hadoop On Ubuntu Linux, Michael G. Noll, 8.8.07

Page 13: A Hadoop Primer

13

Project Traction

Page 14: A Hadoop Primer

14

Employment Potential

Page 15: A Hadoop Primer

15

Hadoop Users

Page 16: A Hadoop Primer

16

Why Hadoop?

Page 17: A Hadoop Primer

17

More Machines = More Faster

Page 18: A Hadoop Primer

18

The reason everyone knows

Page 19: A Hadoop Primer

19

BIG DATA

Page 20: A Hadoop Primer

20

“The big issue is not that everyone will suddenly operate at petabyte scale; a lot of folks do not have that much data.

The more important topics are the specifics of the storage and processing infrastructure and what approaches best suit each problem.”

- Bradford Cross, Flightcaster/Woven

Page 21: A Hadoop Primer

21

The reason not everyone knows

Page 22: A Hadoop Primer

22

DatanU s tr u

ct u

er

d

Page 23: A Hadoop Primer

23

What Hadoop Is

Page 24: A Hadoop Primer

24

“build Amazon's product search indices”“build the recommender system for behavioral targeting”“ETL style processing and statistics generation”“information extraction & search”“searching and analysis of millions of rental bookings”“we use Hadoop to summarize of user's tracking data”“we use Hadoop to store ad serving logs”“the freedom to query the data in an ad-hoc manner”“generating web graphs on 100 nodes”“we use Hadoop for batch-processing large RDF datasets”“facial similarity and recognition across large datasets““We are using Hadoop and Nutch to crawl Blog posts”“Used for ETL & data analysis on terascale datasets”

Source: http://wiki.apache.org/hadoop/PoweredBy

Page 25: A Hadoop Primer

25

What Hadoop Isn't

Page 26: A Hadoop Primer

26

A relational database killer

No Yes

Page 27: A Hadoop Primer

27

Beyond Hadoop

Page 28: A Hadoop Primer

28

The Hadoop Ecosystem

Page 29: A Hadoop Primer

29

What We Use Hadoop For

Page 30: A Hadoop Primer

30

Crawling Largeish Unstructured Datasets

Page 31: A Hadoop Primer

31

Like 1.3M StackOverflow Questions

Page 32: A Hadoop Primer

32

Or 1.7M HackerNews Entries

Page 33: A Hadoop Primer

33

Or Years of Apache Log Files

Page 34: A Hadoop Primer

34

How to Get Started

Page 35: A Hadoop Primer

35

We use Cloudera

Page 36: A Hadoop Primer

36

Mostly because it's easy

Page 37: A Hadoop Primer

37

This easy

Page 38: A Hadoop Primer

38

Or if you prefer

Page 39: A Hadoop Primer

39

Or maybe this

Page 40: A Hadoop Primer

40

QUESTIONS

Page 41: A Hadoop Primer

41

Student? Talk to us