[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-07-27 (世界標準時間)。"],[[["\u003cp\u003eThis page provides definitions for glossary terms related to Sequence Models.\u003c/p\u003e\n"],["\u003cp\u003eSequence models are used to analyze sequential data like text or video sequences.\u003c/p\u003e\n"],["\u003cp\u003eRecurrent Neural Networks (RNNs) are a key type of sequence model, with LSTMs being a popular variant.\u003c/p\u003e\n"],["\u003cp\u003eCommon challenges in training sequence models include the exploding and vanishing gradient problems.\u003c/p\u003e\n"],["\u003cp\u003eN-grams are used to represent sequences of words and are crucial for natural language understanding tasks.\u003c/p\u003e\n"]]],[],null,["This page contains Sequence Models glossary terms. For all glossary terms,\n[click here](/machine-learning/glossary).\n\n\nB\n\n\u003cbr /\u003e\n\n\nbigram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=2.\n\n\nE\n\n\u003cbr /\u003e\n\n\nexploding gradient problem \n#seq\n\n\u003cbr /\u003e\n\nThe tendency for [**gradients**](/machine-learning/glossary#gradient) in\n[**deep neural networks**](/machine-learning/glossary#deep_neural_network) (especially\n[**recurrent neural networks**](#recurrent_neural_network)) to become\nsurprisingly steep (high). Steep gradients often cause very large updates\nto the [**weights**](/machine-learning/glossary#weight) of each [**node**](/machine-learning/glossary#node) in a\ndeep neural network.\n\nModels suffering from the exploding gradient problem become difficult\nor impossible to train. [**Gradient clipping**](#gradient_clipping)\ncan mitigate this problem.\n\nCompare to [**vanishing gradient problem**](#vanishing_gradient_problem).\n\n\nF\n\n\u003cbr /\u003e\n\n\nforget gate \n#seq\n\n\u003cbr /\u003e\n\nThe portion of a [**Long Short-Term Memory**](#Long_Short-Term_Memory)\ncell that regulates the flow of information through the cell.\nForget gates maintain context by deciding which information to discard\nfrom the cell state.\n\n\nG\n\n\u003cbr /\u003e\n\n\ngradient clipping \n#seq\n\n\u003cbr /\u003e\n\nA commonly used mechanism to mitigate the\n[**exploding gradient problem**](#exploding_gradient_problem) by artificially\nlimiting (clipping) the maximum value of gradients when using\n[**gradient descent**](/machine-learning/glossary#gradient_descent) to [**train**](/machine-learning/glossary#training) a model.\n\n\nL\n\n\u003cbr /\u003e\n\n\nLong Short-Term Memory (LSTM) \n#seq\n\n\u003cbr /\u003e\n\nA type of cell in a\n[**recurrent neural network**](#recurrent_neural_network) used to process\nsequences of data in applications such as handwriting recognition,\n[**machine translation**](/machine-learning/glossary#machine-translation), and image captioning. LSTMs\naddress the [**vanishing gradient problem**](#vanishing_gradient_problem) that\noccurs when training RNNs due to long data sequences by maintaining history in\nan internal memory state based on new input and context from previous cells in\nthe RNN.\n\n\nLSTM \n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**Long Short-Term Memory**](#Long_Short-Term_Memory).\n\n\nN\n\n\u003cbr /\u003e\n\n\nN-gram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn ordered sequence of N words. For example, *truly madly* is a 2-gram. Because\norder is relevant, *madly truly* is a different 2-gram than *truly madly*.\n\n| N | Name(s) for this kind of N-gram | Examples |\n|---|---------------------------------|-----------------------------------------------------------|\n| 2 | bigram or 2-gram | *to go, go to, eat lunch, eat dinner* |\n| 3 | trigram or 3-gram | *ate too much, happily ever after, the bell tolls* |\n| 4 | 4-gram | *walk in the park, dust in the wind, the boy ate lentils* |\n\nMany [**natural language understanding**](/machine-learning/glossary#natural_language_understanding)\nmodels rely on N-grams to predict the next word that the user will type\nor say. For example, suppose a user typed *happily ever* .\nAn NLU model based on trigrams would likely predict that the\nuser will next type the word *after*.\n\nContrast N-grams with [**bag of words**](/machine-learning/glossary#bag_of_words), which are\nunordered sets of words.\n\nSee [Large language models](/machine-learning/crash-course/llm)\nin Machine Learning Crash Course for more information.\n\n\nR\n\n\u003cbr /\u003e\n\n\nrecurrent neural network \n#seq\n\n\u003cbr /\u003e\n\nA [**neural network**](/machine-learning/glossary#neural_network) that is intentionally run multiple\ntimes, where parts of each run feed into the next run. Specifically,\nhidden layers from the previous run provide part of the\ninput to the same hidden layer in the next run. Recurrent neural networks\nare particularly useful for evaluating sequences, so that the hidden layers\ncan learn from previous runs of the neural network on earlier parts of\nthe sequence.\n\nFor example, the following figure shows a recurrent neural network that\nruns four times. Notice that the values learned in the hidden layers from\nthe first run become part of the input to the same hidden layers in\nthe second run. Similarly, the values learned in the hidden layer on the\nsecond run become part of the input to the same hidden layer in the\nthird run. In this way, the recurrent neural network gradually trains and\npredicts the meaning of the entire sequence rather than just the meaning\nof individual words.\n\n\nRNN \n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**recurrent neural networks**](#recurrent_neural_network).\n\n\nS\n\n\u003cbr /\u003e\n\n\nsequence model \n#seq\n\n\u003cbr /\u003e\n\nA model whose inputs have a sequential dependence. For example, predicting\nthe next video watched from a sequence of previously watched videos.\n\n\nT\n\n\u003cbr /\u003e\n\n\ntimestep \n#seq\n\n\u003cbr /\u003e\n\nOne \"unrolled\" cell within a\n[**recurrent neural network**](#recurrent_neural_network).\nFor example, the following figure shows three timesteps (labeled with\nthe subscripts t-1, t, and t+1):\n\n\ntrigram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=3.\n\n\nV\n\n\u003cbr /\u003e\n\n\nvanishing gradient problem \n#seq\n\n\u003cbr /\u003e\n\nThe tendency for the gradients of early [**hidden layers**](/machine-learning/glossary#hidden_layer)\nof some [**deep neural networks**](/machine-learning/glossary#deep_neural_network) to become\nsurprisingly flat (low). Increasingly lower gradients result in increasingly\nsmaller changes to the weights on nodes in a deep neural network, leading to\nlittle or no learning. Models suffering from the vanishing gradient problem\nbecome difficult or impossible to train.\n[**Long Short-Term Memory**](#Long_Short-Term_Memory) cells address this issue.\n\nCompare to [**exploding gradient problem**](#exploding_gradient_problem)."]]