The attached file is the part of a paper, I need the summary of this part. You can put the picture from this paper into the summary. I need 1.5 ~ 2 pages. If you need the original paper, the link is: https://arxiv.org/pdf/1901.05719 ?fbclid=IwAR3PdeB_a9DRgoLZ3M4FF72S7C_0_Q-Rmr6Qeq6O4ZqJekC_P-mSS3egk0w
P.S. I need it in 24 hours.
Il. CODE CONSTRUCTION BASED ON
LEARNING A. The constructor-evaluatorframework
We advocate a code design framework, as shown in Fig. 2. The
framework consists of two parts, i.e. a code constructor and a
code evaluator. The code constructor iteratively learns a series
Of valid code constructions based on the performance metric
feedback from the code evaluator. The code evaluator provides
an accurate performance metric calculation/estimation for a code
construction under the decoder and channel condition of interest.
The code construction keeps improving through the iterative
interactions between the constructor and evaluator until the
performance metric converges.
The constructor-evaluator framework in Fig. 2 is quite general.
The code constructor knows neither the internal
fig. 2: Constructor-evaluator framework
mechanism nor the channel condition adopted by the code
evaluator but requests the code evaluator to feed back an accurate
performance metric of its current code construction under the
evaluator-defined environment, by which the exploration of
possible code construction opens for a wide range of decoding
algorithms and channel conditions. Similar ideas have been
proposed in existing works. For example, the differential
evolution algorithm was used to optimize the degree distribution
of LDPC codes under both erasure channel [29] and AWGN
channel The algorithm also treats the optimization problem as a
black box, and merely relies on the cost function (e.g., decoding
threshold) as feedback.
In most cases, the code constructor is trained offline, because
both coded bits and decoding results can be efficiently generated
and training computation is not an issue in an offline simulator.
Once constructed e.g.. the performance metric converges, the
resultant codes can be directly implemented in a low-complexity
practical system with legacy encoding and decoding algorithms,
which is applicable from the industry’s point of view.
Note that the constructed codes are closely related to the code
evaluator because of performance metric. Different code
evaluators, e.g. with different decoding parameters, channel
condition assumptions, or decoding algorithms, may result in
different code constructions in the end. During training
prcxedure, we choose some realistic decoding algorithms and
channel condition. In theory, an online training is a natural
extension by collecting performance metric samples from a real-
world decoder and continuing to improve code
B. Constructor
The code constructor generates code constructions
based performance metrics feedback from the code
evaluator.
To fit the code design problem into the Al algorithms,
a code construction is defined under the constraints of
code representations. For example.
• Binary matrix: the generator matrix for any linear
block codes. or the parity-check matrix for LDPC
codes. This is the most general form of definition.
• Binary vector: a more succinct form of definition for
some codes, including the generator polynomials for
convolutional codes, or whether a synthesized
subchannel is frozen for polar codes.
• Nested representation: defining a set of codes in a
set of nested matrices or vectors. This bears practical
importance due to low implementation complexity
and rate compatibility. Examples include LTE-RM
codes and 5G Polar codes.
According to the code representations and how the
construction improvement procedure is modeled, there are
several approaches to implement the constructor:
I) Reinforcement learning approach: RL approach can
be used, because we model construction procedure as a
Markov decision process (MDP). An MDP is defined by
a 4-tuple (S,
S is the state space,
A is the action space,
= Pr(st+l s’ 1st — s,af a) is the probability
that action a in stare s at rime t will lead to state s’ at
time t + 1,
R is the immediate reward feedback by code
evaluator after transitioning from state s to state s’,
triggered by action a.
Code construction can be viewed as a decision process
in general- For the binary matrix and binary vector code
representations, the decisions correspond to which
positions are O or l. For the nested code representation,
the decisions correspond to how to evolve from a set of
subcodes to their supercodes (or vice versa). In our
setting, a state s con-esponds to a (valid) code
construction. and an action a that leads to the state
transition (s s’) corresponds to a modification to the
previous construction s. The state transition (s s’) herein is
deterministic by the action and the previous state s. The
reward function is the performance metric measurement
with respect to the decoder and channel condition of
interest. In the end, a desired code construction can be
obtained from the final state of the MDP.
There are several classical algorithms to solve MDP
that are model-free, i.e., they do not require the model of
the code evaluator. These include:
Q-learning [301: given the architecture of a code
construction, its potential code construction schemes
correspond to a finite set of discrete states and action
spaces. In Q-learning, a table Q(s, a) is maintained
and updated to record an expected total reward
metric to take action a at state s. At learning stage, an
e-greedy approach is often used to diverge the
exploration in the state and action space- When a
state transition (s, a, s’, R) has been explored. the
table Q(s,a) can be updated by: AQ(s, a) = (2)
where is learning rate, is reward discount factor, and
R is reward from the evaluator. After sufficient
exploration, the table Q(s, a) is then used to guide the
MDP to maximize the total reward.
Policy Gradient (PG) [31]: if the architecture of a
code construction translates into an immense set of
states and actions, we consider continuous state and
action space. PG defines a differentiable policy function
a), parameterized by OPG, to select the action
at each state. The policy function (s, a) outputs a
probability desity/mass function of taking each action a
at state s according to the policy Then the next state s’
is determined and the reward R is evaluated by the code
evaluator. When a complete episode
is explored, where t is the time stamp and T is the time
horizon length, the policy function is updated:
E Ri’l•
(3)
After sufficient exploration, the policy function
can be used to lead the MDP to maximize the total
Advantage Actor critic (A2C) [321: A2C merges the
idea of state value function into PG to take advantage of
stepwise update, and speeds up the convergence. In
addition to the policy (actor) function (s, a), A2C
defines a differentiable value (critic) function Voc (s).
The interaction between A2C and the code evaluator is
similar to that of PG. For A2C, the policy (actor) update
can be more frequent, i.e- in Ste}wise manner, since the
cumulative reward from st, R. is estimated by the critic
function. At each state translation exploration (s, a, s’,
R), the advantage value is calculated:
+7. voc – voc(s). (4)
Then the actor function The critic function Voc
can be updated by:
A9A = aA • Adv(s,
S’, R) (5)
– Adv(s, s’, R) voc voc(s). (6)
By viewing code construction as a decision process, its
influence on code performance can be modeled (or
approximated) as differentiable functions that can be
realized by neural networks. A code construction (to be
optimized) is embedded in the coefficients of the neural
networks. Due to the excellent function approximation
capability of neural networks, these coefficients can be
learned through optimizations techniques such as gradient
descent2) Genetic algorithm appmach: we observe that a
code construction can usually be decomposed into many
discrete decisions. For the binary matrix and binary vector
code representations, the decisions on which positions are
0 or I can be made individually. These decisions may
collaboratively contribute to the overall code
performance. They resemble the “chromosomes” of a
code construction, where a set of good decisions is likely
to produce good code constructions. The process of
refining these decisions can be defined in iterative steps,
where each step produces a better construction based on
several candidate constructions. Genetic algorithm is
well-suited for this purpose.
Below, we briefly describe how a genetic algorithm can
be applied to the code design.
l) A number of code constructions {Cl , C2, • • • } are
randomly generated, defined as initial population.
2) A subset of (good) code constructions are selected
as parents, e.g., Cpl, Cp2•
3) New constructions (offspring) are produced from
crossover among parents, e.g., {Cpl, Cp2} —+ Col.
4) Offsprings go through a random mutation to
introduce new features, e.g., Col —Y Cml.
5) Finally, good offspring replace bad ones in the
population, and the process repeats.
The above operations are defined in the context of error
correction codes- Regarding code definition C, it may boil
down to a set of chromosomes (binary vectors and
matrices) accordingly.
Crossover is defined as taking part of the chromosomes
from each parent. and combining them into an offspring.
This step resembles “reproduction” in biologlœal terms,
in which offspring are expected to inherit some good
properties from their parents. Subsequently, mutation
randomly alters some of the chromosomes to encourage
exploration during evolution.
A fitness function is defined to indicate whether the newly
produced offspring are good or bad. In this work, the fitness is
defined as the code performance.
C. Evaluator
The evaluator provides performance metric
measurements for code constructions. If the performance
metric of the decoder is analyzable, it can be directly
calculated. In most cases, the error correction
performance estimation in terms of block error rate
(BLER) can be performed based on sampling techniques,
such as Monte-Carlo (MC) method. To ensure that the
estimation is accurate and thereby does not mislead the
constructor, sufficient MC sampling (simulation) should
be performed to control the estimation confidence level.
If the designed code is to work within a range of BLER
level, the performance metric measurements can merge
error correction performance at several SNR points.
Measuring more than one SNR points provides a better
control over the slope of BLER curve, at the cost of longer
evaluation time. In addition, power consumption and
implementation complexity for the encoding and the
corresponding decoding also can be factored into the
performance metric measurements.
Intuitively, the evaluator can be stationary. The
decoding algorithm, including the parameters and the
channel statistics can be static. Then the code design is
preferred to be realized offline, and is not very sensitive
to the code design time consumption. On the other hand,
the evaluator can be nonstationary. For example, the
can be adjusted by:
channel statistics can be timevarying. Then online design
may be required. In this case, a feedback link is required
for performance measurement of each code construction.
The communication cost and code design time
consumption should be considered as well.