[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

发布时间:2022-06-25 发布网站:脚本宝典
脚本宝典收集整理的这篇文章主要介绍了[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork脚本宝典觉得挺不错的,现在分享给大家,也给大家做个参考。

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

LSTM

Outline

  • RNNs and vanishing/exploding gradients
  • Solutions

RNNs

  • Advantages

    • Captures dependencies within a short range
    • Takes up less RAM than other n-gram models
  • Disadvantages

    • Struggles with longer sequences

    • Prone to vanishing or exploding gradients

      [吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

      [吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

      [吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Solving for vanishing or exploding gradients

  • Identity RNN with ReLU activation ,-1->0 将负值替换为0,使其更接近单位矩阵

    [begin{bmatrix} 1 & 0 & 0 & 0\ 0 & 1 & 0 & 0\ 0 & 0 & 1 & 0\ 0 & 0 & 0 & 1\ end{bmatrix} ]

  • Gradient clipping 32->25 将>25的值剪裁到25,限制梯度的大小

  • Skip connections

    [吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Introduction

Outline

  • Meet the Long short-term memory unit!
  • LSTM architecture
  • Applications

LSTMs:a memorable solution

  • Learns when to remember and when to forget

  • Basic anatomy:

    • A cell state
    • A hidden state with three gates
    • Loops back again at the end of each time step
  • Gates allow gradients to flow unchanged

LSTMs: Based on previous understanding

Cell state=before conversation

接到朋友电话之前,想的是无关朋友的内容 Forget gate=beginning of conversation

接电话时,把无关的想法放在一边,保留我想要的任何内容Input gate=thinking of a response

在通话进行时,获得来自朋友的新信息,同时想接下来可能会谈什么

Output gate=responding

当你决定下一步要说什么

Updated cell state=after conversation

重复直到挂电话,记忆更新了几次

LSTM Basic Structure

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

applications of LSTMs

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Summary

  • LSTMs offer a solution to vanishing gradients
  • Typical LSTMs have a cell and three gates:
    • Forget gate
    • Input gate
    • Output gate

LSTM architecture

Cell State, Hidden State

  • Cell State 充当memory
  • Hidden State 是做出预测的原因

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

The Forget Gate

指出什么要保留,什么要丢弃,通过sigmod函数,值被压缩到 0-1 之间,越靠近0越应该被丢掉,接近1 被保留

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

The Input Gate

更新状态,有两层,sigmod层和tanh

sigmod:采用先前隐藏状态、当前输入,选择要更新的值,值压缩到 0-1,越接近1重要性越高

tanh:也是采用先前隐藏状态、当前输入,值压缩到 -1 到 1,有助于调节网络中的信息流

最后两个相乘得输出

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Calculating the Cell State

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

The Output Gate

决定下一个隐藏状态是什么

采用先前的隐藏状态、当前输入,通过sigmod

最近更新的 cell state 通过 tanh

接下来两个相乘得 h

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Summary

  • LSTMs use a series of gates to decide which information to keep:

    • Forget gate decides what to keep

    • Input gate decides what to add

    • Output gate decides what the next hidden state will be

  • One time step is completed after updating the states

Named Entity Recognition (NER)

Introduction

What is Named Entity Recognition

  • Locates and extracts predefined entities from text
  • Places,organizations,names,time and dates

Types of Entities

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Example of a labeled sentence

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Application of NER systems

  • Search engine efficiency
  • Recommendation engines
  • Customer service
  • Automatic trading

Training NERs process

Data Processing

Outline

  • Convert words and entity classes into arrays
  • Token padding
  • Create a data generator

Processing data for NERs

  • Assign each class a number 为每个实体类分配唯一数字

  • Assign each word a number 为每个单词分别其实体类的数字

    [吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Token padding

For LSTMs, all sequences need to be the same size.

  • Set sequence length to a certain number
  • Use the <PAD>token to fill empty spaces

Training the NER

  1. Create a tensor for each input and its corresponding number
  2. Put them in a batch ---->64,128,256,512..
  3. Feed it into an LSTM unit
  4. Run the output through a dense layer
  5. Predict using a log softmax over K classes

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Layers in Trax

model = t1.Serial(
	t1.Embedding(),
    t1.LSTM(),
    t1.Dense()
	t1.LogSoftmax()
)

Summary

  • Convert words and entities into same-length numerical arrays
  • Train in batches for faster processing
  • Run the output through a final layer and activation

Computing Accuracy

Evaluating the model

  1. Pass test set through the model
  2. Get arg max across the prediction array
  3. Mask padded tokens
  4. .Compare outputs against test labels

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Summary

  • If padding tokens,remember to mask them when computing accuracy
  • Coding assignment!

Siamese Networks

Introduction

两个相同的神经网络组成的神经网络,最后合并

Question Duplicates

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

比比较单词序列的含义

What do Siamese Networks learn?

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Siamese Networks in NLP

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Architecture

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Cost function

Loss function

将问题How old are you 作为anchor锚点,用于比较其他的问题

与锚点有相似的意义则为positive question,没有则negative question

相似的问题,相似度会接近1,反之接近 -1

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Triplets

Triplets

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

如果给模型正损失值,模型将一次来更新权重得到改进;

如果给模型一个负损失值,就像告诉模型做的好,加大权重,所以不要给负值,如果损失值<0就得0

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Simple loss:

[diff=s(A,N)-S(A,P) ]

Non linearity:

[mathcal{L}=left{ begin{aligned} & 0; & if diffleq0\ & diff; & if diff>0 end{aligned} right. tag{1} ]

Triplet Loss

将函数左移,如选取alpha=0.2,

如果相似度之间的差距很小,如-0.1,在加入了alpha之后结果>0,让模型可以从中学习

Alpha margin:

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

[mathcal{L}=left{begin{aligned} & 0; & if diff+alphaleq0\ & diff+alpha; & if diff+alpha>0 end{aligned}right. \ mathcal{L}(A,P,N)=max(diff+alpha,0) ]

Triplet Selection

当模型正确预测(A,P)(A,N)更相似时,Loss=0

此时已不能从 Triplets 学到更多,可以选择更有效的训练,选择让模型出错的triplets而不是随机的,叫hard triplets,是(A,N)十分接近但是仍然小于(A,P)的相似性

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Compute the cost

Introduction

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

d_modelembedding的维度,同时等于列数为5,batch_size是行数为4

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

绿色对角线是重复问题的相似度,会大于非对角线的值

橙色是不重复问题的相似度

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

mean negative: mean of off-diagonal values in each row 每行的非对角线元素,如第一行不包括 0.9 closest negative: off-diagonal value closest to (but less than)the value on diagonal in each row

非对角线元素中最接近对角线元素值的元素,如第一行为 0.3

即改相似度为0.3的negative示例对学习贡献最大

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

mean neg通过仅对平均值的训练,减少噪声

(噪声是接近0的,即几个噪声值的平均值通常为0)

Closest_neg:与negative example的余弦相似度之间的最小差异,

如果与alpha的差异很小,就可以获得更大的Loss,通过将训练重点放在产生更高损失值的示例上,让模型更快更新权重

接下来将两个Loss相加

Hard Negative Mining

[mathcal{L}_mathsf{Full}=mathcal{L}_1+mathcal{L}_2 \ mathcal{L}_mathsf{Full}(A,P,N)=mathcal{L}_1+mathcal{L}_2\ mathcal{J}=sum_{i=1}^mmathcal{L}_mathsf{Full}(A^{(i)},P^{(i)},N^{(i)}) ]

One Shot learning

Classification vs One Shot Learning

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

如辨别这首诗是不是Lucas写的,如果分类,将这类诗集加入到类别中,就会变成 K+1类,需要重新训练

One shot learning不是分类,在Lucas的诗集上训练,然后计算两个类别的相似性

Need for retraining 如在银行出现新签名不需要重新训练

设置阈值来判断是否同一类

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Training Testing

Dataset

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Prepare Batches

同一个batch的内容是不会重复的,但和另一个batch对应位置的元素重复

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Siamese Model

两个子网的参数相同所以只训练一组权重

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

Create a subnetwork:

  1. Embedding
  2. LSTM
  3. Vectors
  4. Cosine Similarity

Testing

[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork

脚本宝典总结

以上是脚本宝典为你收集整理的[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork全部内容,希望文章能够帮你解决[吴恩达团队自然语言处理第3课_2]LSTM NER SiameseNetwork所遇到的问题。

如果觉得脚本宝典网站内容还不错,欢迎将脚本宝典推荐好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。
标签: