What is Context-Free Grammar?
Last Updated :
23 May, 2025
A grammar consists of one or more variables that represent classes of strings (i.e., languages) . There are rules that say how the strings in each class are constructed. The construction can use :
- Symbols of the alphabet
- Strings that are already known to be in one of the classes
- Or both
Context-Free Grammar
A context-free grammar (CFG) is a formal system used to describe a class of languages known as context-free languages (CFLs). Purpose of context-free grammar is:
- To list all strings in a language using a set of rules (production rules).
- It extends the capabilities of regular expressions and finite automata.
A GFG (or just a grammar) G is a tuple G = (V, T, P, S) where
- V is the (finite) set of variables (or non terminals or syntactic categories). Each variable represents a language, i.e., a set of strings
- T is a finite set of terminals, i.e., the symbols that form the strings of the language being defined
- P is a set of production rules that represent the recursive definition of the language.
- S is the start symbol that represents the language being defined. Other variables represent auxiliary classes of strings that are used to define the language of the start symbol.
A grammar is said to be the Context-free grammar if every production is in the form of:
G -> (V∪T)* , where G ∊ V
- V (Variables/Non-terminals): These are symbols that can be replaced using production rules. They help in defining the structure of the grammar. Typically, non-terminals are represented by uppercase letters (e.g., S, A, B).
- T (Terminals): These are symbols that appear in the final strings of the language and cannot be replaced further. They are usually represented by lowercase letters (e.g., a, b, c) or specific symbols.
- The left-hand side can only be a Variable, it cannot be a terminal.
- But on the right-hand side here it can be a Variable or Terminal or both combination of Variable and Terminal.
The above equation states that every production which contains any combination of the 'V' variable or 'T' terminal is said to be a context-free grammar.
Core Concepts of CFGs
A CFG is defined by:
- Nonterminal symbols (variables): Represent abstract categories or placeholders
(e.g., E,SE,S).
- Terminal symbols (alphabet): The actual characters or tokens in the language
(e.g., a, b,+,∗,(,)a, b, +, *, (, )a, b,+,∗,(,)).
- Production rules: Specify how non terminals can be replaced with other non terminals or terminals
(e.g., E→E+EE → E + EE→E+E).
- Start symbol: A special nonterminal from which derivations begin.
CFG vs. Other Models
Model | Description |
---|
Finite Automata | Accept strings via computation (accept/reject). |
Regular Expressions | Match strings by describing their structure. |
CFG | Generate strings via recursive replacement. |
Example: Arithmetic Expressions
Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division.
Here is one possible :
Production Rules:
CFG:
E → int
E → E Op E
E → (E)
Op → +
Op → -
Op → *
Op → /
Example Derivation:
E
⇒ E Op E
⇒ E Op int
⇒ int Op int
⇒ int / int
Designing a CFG
When creating CFGs:
- Base case: Define the simplest valid strings.
- Recursive rules: Combine smaller components into larger ones.
Examples:
1. Palindromes over {a, b}:
S → ε | a | b | aSa | bSb
2. Balanced Parentheses:
S → ε | (S) | SS
Languages Defined by CFGs
The language L(G) generated by a CFG G is: L(G)={ ω∈Σ*∣S⇒∗ω}
- ω: Strings made of terminals.
- S⇒∗ω: S derives ω via zero or more production applications.
Regular Languages vs. Context-Free Languages
Property | Regular Languages | Context-Free Languages |
---|
Power | Limited | More expressive |
Memory Requirements | Finite | Unbounded recursion |
Definable Structures | Simple patterns (e.g., repetition) | Nested structures (e.g., palindromes, balanced parentheses) |
Non-CFG Example
Productions such as:
a->bSa, or
a->ba is not a CFG as on the left-hand side there is a terminal which does not follow the CFGs rule.
But we can construct it by :
Lets consider the string "aba" and and try to derive the given grammar from the productions given. We start with symbol S, apply production rule S->bSa and then (S->a) to get the string "aba".
Parse tree of string "aba"In the computer science field, context-free grammars are frequently used, especially in the areas of formal language theory, compiler development, and natural language processing. It is also used for explaining the syntax of programming languages and other formal languages.
Limitations of Context-Free Grammar
- Cannot Handle Everything :
- CFGs are good for defining basic rules of a language, but they can’t handle everything.
- Some rules in English or programming languages are too complex for CFG.
- Can Be Confusing (Ambiguity)
- Sometimes, CFG can allow more than one meaning for the same sentence or code.
- This is called ambiguity, and it makes it hard for the computer to understand the correct meaning.
- Can’t Check Meaning
- CFGs only look at the structure, not the meaning.
- They can’t check if the types match, if variables are used properly, or if functions are called correctly.
21. Introduction to Context Free Grammars in TOC
Visit Course
Similar Reads
Simplifying Context Free Grammars A Context-Free Grammar (CFG) is a formal grammar that consists of a set of production rules used to generate strings in a language. However, many grammars contain redundant rules, unreachable symbols, or unnecessary complexities. Simplifying a CFG helps in reducing its size while preserving the gene
6 min read
Classification of Context Free Grammars A Context-Free Grammar (CFG) is a formal rule system used to describe the syntax of programming languages in compiler design. It provides a set of production rules that specify how symbols (terminals and non-terminals) can be combined to form valid sentences in the language. CFGs are important in th
4 min read
Ambiguity in Context free Grammar and Languages Context-Free Grammars (CFGs) are essential in formal language theory and play a crucial role in programming language design, compiler construction, and automata theory. One key challenge in CFGs is ambiguity, which can lead to multiple derivations for the same string.Understanding Derivation in Cont
3 min read
What is CoNLL Data Format? The CoNLL data format, commonly used in computational linguistics and natural language processing (NLP), refers to a text format that facilitates the organization and processing of linguistic data for tasks such as part-of-speech tagging, syntactic parsing, and named entity recognition. Originally d
6 min read
Regular Expression Vs Context Free Grammar Regular Expressions are capable of describing the syntax of Tokens. Any syntactic construct that can be described by Regular Expression can also be described by the Context free grammar. Regular Expression: (a|b)(a|b|01) Context-free grammar: S --> aA|bA A --> aA|bA|0A|1A|e *e denotes epsilon.
2 min read
Converting Context Free Grammar to Greibach Normal Form Context-free grammar (CFG) and Greibach Normal Form (GNF) are fundamental concepts in formal language theory, particularly in the field of compiler design and automata theory. This article delves into what CFG and GNF are, provides examples, and outlines the steps to convert a CFG into GNF.What is C
6 min read