Paper essay

yhnmzzz
paperpartII.pdf

Il. CODE CONSTRUCTION BASED ON

LEARNING A. The constructor-evaluatorframework We advocate a code design framework, as shown in Fig. 2. The

framework consists of two parts, i.e. a code constructor and a code evaluator. The code constructor iteratively learns a series Of valid code constructions based on the performance metric feedback from the code evaluator. The code evaluator provides an accurate performance metric calculation/estimation for a code construction under the decoder and channel condition of interest. The code construction keeps improving through the iterative interactions between the constructor and evaluator until the performance metric converges.

The constructor-evaluator framework in Fig. 2 is quite general. The code constructor knows neither the internal

fig. 2: Constructor-evaluator framework

mechanism nor the channel condition adopted by the code evaluator but requests the code evaluator to feed back an accurate performance metric of its current code construction under the evaluator-defined environment, by which the exploration of possible code construction opens for a wide range of decoding algorithms and channel conditions. Similar ideas have been proposed in existing works. For example, the differential

evolution algorithm was used to optimize the degree distribution of LDPC codes under both erasure channel [29] and AWGN channel The algorithm also treats the optimization problem as a black box, and merely relies on the cost function (e.g., decoding threshold) as feedback.

In most cases, the code constructor is trained offline, because both coded bits and decoding results can be efficiently generated and training computation is not an issue in an offline simulator. Once constructed e.g.. the performance metric converges, the resultant codes can be directly implemented in a low-complexity practical system with legacy encoding and decoding algorithms, which is applicable from the industry's point of view.

Note that the constructed codes are closely related to the code evaluator because of performance metric. Different code evaluators, e.g. with different decoding parameters, channel condition assumptions, or decoding algorithms, may result in different code constructions in the end. During training prcxedure, we choose some realistic decoding algorithms and channel condition. In theory, an online training is a natural extension by collecting performance metric samples from a real- world decoder and continuing to improve code B. Constructor

The code constructor generates code constructions

based performance metrics feedback from the code evaluator.

To fit the code design problem into the Al algorithms, a code construction is defined under the constraints of code representations. For example.

• Binary matrix: the generator matrix for any linear block codes. or the parity-check matrix for LDPC codes. This is the most general form of definition.

• Binary vector: a more succinct form of definition for some codes, including the generator polynomials for

convolutional codes, or whether a synthesized subchannel is frozen for polar codes.

• Nested representation: defining a set of codes in a set of nested matrices or vectors. This bears practical

importance due to low implementation complexity and rate compatibility. Examples include LTE-RM codes and 5G Polar codes.

According to the code representations and how the construction improvement procedure is modeled, there are several approaches to implement the constructor:

I) Reinforcement learning approach: RL approach can be used, because we model construction procedure as a Markov decision process (MDP). An MDP is defined by a 4-tuple (S,

S is the state space, A is the action space,

= Pr(st+l s' 1st — s,af a) is the probability that action a in stare s at rime t will lead to state s' at time t + 1,

R is the immediate reward feedback by code evaluator after transitioning from state s to state s', triggered by action a.

Code construction can be viewed as a decision process in general- For the binary matrix and binary vector code representations, the decisions correspond to which positions are O or l. For the nested code representation, the decisions correspond to how to evolve from a set of subcodes to their supercodes (or vice versa). In our setting, a state s con-esponds to a (valid) code construction. and an action a that leads to the state transition (s s') corresponds to a modification to the previous construction s. The state transition (s s') herein is deterministic by the action and the previous state s. The reward function is the performance metric measurement with respect to the decoder and channel condition of interest. In the end, a desired code construction can be obtained from the final state of the MDP.

There are several classical algorithms to solve MDP that are model-free, i.e., they do not require the model of the code evaluator. These include:

Q-learning [301: given the architecture of a code construction, its potential code construction schemes correspond to a finite set of discrete states and action spaces. In Q-learning, a table Q(s, a) is maintained and updated to record an expected total reward metric to take action a at state s. At learning stage, an e-greedy approach is often used to diverge the exploration in the state and action space- When a state transition (s, a, s', R) has been explored. the table Q(s,a) can be updated by: AQ(s, a) = (2)

where is learning rate, is reward discount factor, and R is reward from the evaluator. After sufficient exploration, the table Q(s, a) is then used to guide the MDP to maximize the total reward. Policy Gradient (PG) [31]: if the architecture of a code construction translates into an immense set of

states and actions, we consider continuous state and action space. PG defines a differentiable policy function

a), parameterized by OPG, to select the action at each state. The policy function (s, a) outputs a probability desity/mass function of taking each action a at state s according to the policy Then the next state s' is determined and the reward R is evaluated by the code evaluator. When a complete episode

is explored, where t is the time stamp and T is the time horizon length, the policy function is updated:

E Ri'l•

(3)

After sufficient exploration, the policy function can be used to lead the MDP to maximize the total

Advantage Actor critic (A2C) [321: A2C merges the idea of state value function into PG to take advantage of stepwise update, and speeds up the convergence. In addition to the policy (actor) function (s, a), A2C defines a differentiable value (critic) function Voc (s). The interaction between A2C and the code evaluator is similar to that of PG. For A2C, the policy (actor) update can be more frequent, i.e- in Ste}wise manner, since the cumulative reward from st, R. is estimated by the critic function. At each state translation exploration (s, a, s', R), the advantage value is calculated:

+7. voc - voc(s). (4) Then the actor function The critic function Voc can be updated by:

A9A = aA • Adv(s,

S', R) (5)

- Adv(s, s', R) voc voc(s). (6) By viewing code construction as a decision process, its

influence on code performance can be modeled (or approximated) as differentiable functions that can be realized by neural networks. A code construction (to be optimized) is embedded in the coefficients of the neural networks. Due to the excellent function approximation capability of neural networks, these coefficients can be learned through optimizations techniques such as gradient descent2) Genetic algorithm appmach: we observe that a code construction can usually be decomposed into many discrete decisions. For the binary matrix and binary vector code representations, the decisions on which positions are 0 or I can be made individually. These decisions may collaboratively contribute to the overall code

performance. They resemble the "chromosomes" of a code construction, where a set of good decisions is likely to produce good code constructions. The process of refining these decisions can be defined in iterative steps, where each step produces a better construction based on several candidate constructions. Genetic algorithm is well-suited for this purpose.

Below, we briefly describe how a genetic algorithm can be applied to the code design.

l) A number of code constructions {Cl , C2, • • • } are randomly generated, defined as initial population.

2) A subset of (good) code constructions are selected as parents, e.g., Cpl, Cp2•

3) New constructions (offspring) are produced from crossover among parents, e.g., {Cpl, Cp2} —+ Col.

4) Offsprings go through a random mutation to introduce new features, e.g., Col —Y Cml.

5) Finally, good offspring replace bad ones in the population, and the process repeats.

The above operations are defined in the context of error correction codes- Regarding code definition C, it may boil down to a set of chromosomes (binary vectors and matrices) accordingly.

Crossover is defined as taking part of the chromosomes from each parent. and combining them into an offspring. This step resembles "reproduction" in biologlœal terms, in which offspring are expected to inherit some good properties from their parents. Subsequently, mutation randomly alters some of the chromosomes to encourage exploration during evolution.

A fitness function is defined to indicate whether the newly produced offspring are good or bad. In this work, the fitness is defined as the code performance.

C. Evaluator

The evaluator provides performance metric measurements for code constructions. If the performance metric of the decoder is analyzable, it can be directly calculated. In most cases, the error correction performance estimation in terms of block error rate (BLER) can be performed based on sampling techniques, such as Monte-Carlo (MC) method. To ensure that the estimation is accurate and thereby does not mislead the constructor, sufficient MC sampling (simulation) should be performed to control the estimation confidence level. If the designed code is to work within a range of BLER level, the performance metric measurements can merge error correction performance at several SNR points. Measuring more than one SNR points provides a better control over the slope of BLER curve, at the cost of longer evaluation time. In addition, power consumption and implementation complexity for the encoding and the corresponding decoding also can be factored into the performance metric measurements.

Intuitively, the evaluator can be stationary. The decoding algorithm, including the parameters and the channel statistics can be static. Then the code design is preferred to be realized offline, and is not very sensitive to the code design time consumption. On the other hand, the evaluator can be nonstationary. For example, the

can be adjusted by:

channel statistics can be timevarying. Then online design may be required. In this case, a feedback link is required for performance measurement of each code construction. The communication cost and code design time consumption should be considered as well.