Pres 3May 5May 9871

Preview:

Citation preview

  • 8/3/2019 Pres 3May 5May 9871

    1/11

    9871: Information Theory and CodingMay 3, 2011

    INFORMATION THEORY and CODING

    Founded by Shannon in 1948.

    This theory is for transmission/ recording over/in a channel.

    The channel can be wireless or wire channel (copper telephone or fiber optic

    cables), magnetic or optical disks.

    There are three aspects:

    Efficiency (Compression)

    Reliability (Error Detection and Correction)

    Security (Cryptography)

    Information Theory is based on the Probability Theory.

    Coding Theory makes use of finite fields, linear algebra, combinational design,etc.

  • 8/3/2019 Pres 3May 5May 9871

    2/11

    9871: Information Theory and CodingMay 3, 2011

    Shannons Information Theory and

    Coding

    Sent

    messages

    source receiver

    Received

    messages

    channel

    symbols

    0110101001110

    encoder decoder

    Source

    coding

    Channel

    coding

    Source

    decoding

    Channel

    decoding

    Error Detection andCorrectionChannel Capacity

    Compression

    Source Entropy

    Decompression

  • 8/3/2019 Pres 3May 5May 9871

    3/11

    9871: Information Theory and CodingMay 3, 2011

    Overview of Digital Communication and Storage Systems

    InformationSource

    Encoder Channel

    DecoderInformationSink

    m

    c

    rm

    GOAL: Information conveyed through (or stored in) the channel must bereproduced at the destination as reliable as possible. At the same time, it needsto allow the transmission of as much information as possible per unit time(communication system) or storage (storage system).

    Channel: produces a received signal r which differs from the original signal, c(the channel introduces noise, channel distortion, etc.)

    The decoder can only produce an estimate m of the original message, m.

  • 8/3/2019 Pres 3May 5May 9871

    4/11

    9871: Information Theory and CodingMay 3, 2011

    InformationSource

    The source message m consists of a time sequence of symbols emitted by

    the information source.

    The source can be:

    Continuous-time Source, if this message is continuous time, e.g.,speech waveform.

    Discrete-time Source, if the message is discrete-time, e.g., datasequences from a computer.

    The symbols emitted by the source can be:

    Continuous-amplitude, e.g., speech waveform.

    Discrete-amplitude, e.g., text with a finite symbol alphabet.

    This course primarily concerns with discrete- time and discrete-amplitudesources, as practically all new communication or storage systems fall into thiscategory.

  • 8/3/2019 Pres 3May 5May 9871

    5/11

    9871: Information Theory and CodingMay 3, 2011

    DISCRETE SOURCES AND ENTROPY

    Discrete Information Sources and Entropy

    Source Alphabets and Entropy

    Joint and Conditional Entropy

    Entropy of Symbol Blocks and the Chain Rule

    Source Coding

    Huffman Coding

    Dictionary Codes and Lempel-Ziv Coding

    Arithmetic Coding

  • 8/3/2019 Pres 3May 5May 9871

    6/11

    9871: Information Theory and CodingMay 3, 2011

    Source Alphabet and Entropy

    The Information Theory is based on the Probability Theory, as the term

    information carries with it a connotation of UNPREDICTABILITY (SURPRISE) inthe transmitted signal.

    The Information Source is defined by :

    - The set of output symbols

    - The probability rules which govern the emission of these symbols.

    Finite-Discrete Source: finite number of unique symbols.

    The symbol set is called the Source Alphabet.

  • 8/3/2019 Pres 3May 5May 9871

    7/11

    9871: Information Theory and CodingMay 3, 2011

    Source Alphabet and Entropy (contd)

    A is an alphabetwith Mpossible symbols,

    We can say that the emitted symbol is a random variable, which takes values in A.

    The number of elements in a set is called its cardinality, e.g., M=|A|.

    The source output symbols are:

    where is the symbol emitted by the source at time t. Note that here t is aninteger time index.

    Stationary Source: the set of probabilities is not a function of time.

    At any given time moment, the probability that the source emits is

    Probability mass function:

    0 1( , ... .....)ta s s s

    ts

    ma Pr( )m mp a

    0 1 1{ , ,..., }A MP p p p

    0 1 1{ .... }MA a a a

    Definitions:

    Since the source emits only members of its alphabet, then1

    01

    M

    mmp

  • 8/3/2019 Pres 3May 5May 9871

    8/11

    9871: Information Theory and CodingMay 3, 2011

    Source Alphabet and Entropy (contd)

    Information Sources: Classification

    Stationary Versus Non-Stationary Source:

    For a stationary source the set of probabilities is not a function of time,whereas for a non-stationary source it is.

    Synchronous Source Versus Asynchronous Source:

    A synchronous source emits a new symbol at a fixed time interval, Ts,whereas for an asynchronous source the interval between emittedsymbols is not fixed.

    The latter can be approximated as synchronous, by defining a nullcharacter when the source does not emit at time t. We say the sourceemits a null character at time t.

  • 8/3/2019 Pres 3May 5May 9871

    9/11

    9871: Information Theory and CodingMay 3, 2011

    Source Alphabet and Entropy (contd)

    Representation of the Source Symbols

    -In digital systems, the binary representation is used.

    Distinction between Data and Information

    Example: An information source has an alphabet with only 1 symbol. This symbol

    is data, but this data is not information, as it is completely uninformative. Sinceinformation carries the connotation of uncertainty, the information content of thissource is zero.

    HOW CAN ONE MEASURE THE INFORMATION CONTENT OF A SOURCE?

    Pop Quiz:How many bits to represent the symbols 1, 2, 3, 4?or in a symbol ofnsymbols 1, 2, 3, , n?

    The symbols are referred to as Source Data.

    Answer:2 bits, and log2n bits, respectively.

    4=2log24, n=2log2 n

  • 8/3/2019 Pres 3May 5May 9871

    10/11

    9871: Information Theory and CodingMay 3, 2011

    Source Alphabet and Entropy (contd)

    Each transmitted Symbol 1 is just one choice out of 1/p1

    many possible choices and therefore Symbol 1 containslog2 1/p1bits information (1/p1= 2

    log21/p1)

    Similarly, Symbol k contains log2 1/pkbits information

    The average information bits per symbol for our source isH(A) =p1log2 1/p1++pnlog2 1/pn

    Example:Pick a marble from

    a bag of 2 blue, and5 read marbles

    Probability for pickinga red marble:

    pred= 5/7 Number of choices for each red picked

    1 /pred

    = 7/5 =1.4

    Shannon gave this precise mathematical definitionof the average amount of information conveyedper source symbol, used to measure theinformation content of a source.

  • 8/3/2019 Pres 3May 5May 9871

    11/11

    9871: Information Theory and CodingMay 3, 2011

    Source Alphabet and Entropy (contd)

    Entropy of a Source

    Unit of Measure: bit

    M is the cardinality of the source A.

    Range:20 ( ) logH A M 1

    1,...,

    mpM

    m M

    Measurement of the Information Efficiency of the Source

    RATIO OF THE ENTROPY OF THE SOURCE TO THE (AVERAGE) NUMBEROF BINARY DIGITS USED TO REPRESENT THE SOURCE DATA.

    When the entropy of the source is lower than the (average) number of bits usedto represent the source data, an efficient coding scheme can be used toencode the source information, using, an average, fewer binary digits.This is called Data Compression and the encoder used for that is calledSource Encoder.