Search in PDF using C#

Searching for specific text in PDF documents is a common task in document management systems, legal review tools, invoice processors, and other enterprise applications. Whether you are trying to locate a keyword, extract certain values, or redact sensitive information, automating PDF search can save a lot of time and effort.

In this blog post, you will learn how to search text in PDF files programmatically using C#. We will walk you through the key features step-by-step with practical C# code examples.

This article covers the following topics:

C# Library to Search in PDF Documents

Aspose.PDF for .NET simplifies the process of searching text in PDF files using C#. It allows you to find exact words, match patterns using regular expressions, and even highlight or replace matched text—all with just a few lines of code. This powerful library provides robust features for manipulating PDF documents. It allows developers to easily implement search functionalities. With Aspose.PDF, you can quickly find words in a PDF, making it an invaluable tool for software developers.

Before diving into PDF text searching, you need to set up your development environment. Follow these steps to get started with Aspose.PDF for .NET:

1. Install Aspose.PDF for .NET.

Download it from the releases or install it via NuGet. Open your .NET project in Visual Studio and run the following command in the NuGet Package Manager Console:

PM> Install-Package Aspose.PDF

This command adds the Aspose.PDF library to your project so you can access its powerful PDF processing features.

2. Import Required Namespaces

At the top of your C# file, add these using directives:

using Aspose.Pdf;
using Aspose.Pdf.Text;

Now you’re ready to start searching text inside your PDF files using Aspose.PDF’s API.

Search Text in PDF Using C#

With Aspose.PDF for .NET, you can easily search for specific words or phrases in a PDF, locate all their instances, and take actions like highlighting them or extracting their details.

Follow these steps to perform a basic text search:

  1. Load the target PDF file using the Document class.
  2. Create a TextFragmentAbsorber to define the search keyword.
  3. Run the absorber across all pages using the Accept() method.
  4. Retrieve all matching text fragments.
  5. Print the number of matches found.
  6. Loop through and display each match with its page number.

The following code example implements these steps.

Output Example

Found 3 instance(s) of the keyword.
Text: invoice | Page: 1
Text: invoice | Page: 2
Text: invoice | Page: 3

This example demonstrates a simple keyword search that works across all pages in the PDF. You’ll see the matched text along with its page number.

To better understand what’s happening in the code, here’s a quick breakdown of the key classes and methods involved:

  • Document: Represents the entire PDF file. It provides access to pages, content, and structure.
  • TextFragmentAbsorber: Finds all occurrences of a given string or pattern within the PDF. You can also enable features like case-insensitive or regex-based search.
  • Accept(): Applies the absorber to each page. It scans through the document and collects matching fragments.
  • TextFragments: A collection of all the matched text fragments returned by the absorber.
  • TextFragment: Each individual match with details like content, position, and page number.

Case-Insensitive and Whole Word Search using C#

When you search PDF content, you need to control how the system finds matches to ensure accurate results. Sometimes, you want to ignore letter casing (“Invoice” vs. “invoice”), or you want to match full words only—not partial matches within other words.

Aspose.PDF for .NET gives you the tools to do both.

By default, searches are case-sensitive. To ignore letter casing, use TextSearchOptions with IgnoreCase enabled:

This will find both “Invoice”, “invoice”, “INVOICE”, and other variations.

Match Whole Words Only

You can also prevent partial matches. For example, searching for car shouldn’t match care or scar.

This ensures only standalone instances of the word “car” are matched.

Searching with Regular Expressions in PDF

In certain cases, you need to find more than a specific word—you want to match patterns like dates, email addresses, or reference numbers. That’s where regular expressions (regex) come in.

Aspose.PDF for .NET allows you to use regex for advanced text searching across any part of your PDF document.

Example: Find All Dates in a PDF

Let’s say you want to find all dates in the format dd/mm/yyyy:

Other Useful Patterns:

  • Emails: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
  • Phone Numbers: \d{3}[-.\s]??\d{3}[-.\s]??\d{4}
  • Invoice Numbers: INV-\d+

Regex expands your search capabilities far beyond static text, helping you extract structured data from unstructured documents.

Search and Extract Text with Position Details

Sometimes, finding the text isn’t enough—you may need to know where exactly it appears in the PDF. Aspose.PDF lets you extract the page number, coordinates, and formatting details of every match.

This feature is especially useful for building indexes, tagging documents, or creating clickable links.

Example: Get Position of Each Match**

Sample Output

Text: invoice
Page: 1
Position - X: 33.482, Y: 708.246
Font: Helvetica, Size: 12
------------
Text: invoice
Page: 2
Position - X: 33.482, Y: 708.246
Font: Helvetica, Size: 12
------------
Text: invoice
Page: 3
Position - X: 33.482, Y: 708.246
Font: Helvetica, Size: 12
------------

You now know exactly where the word “invoice” appears, along with how it’s styled. This level of detail opens the door for advanced processing, such as annotations, tooltips, or dynamic highlights.

Highlighting or Replacing Found Text

Once you’ve located specific text in a PDF, you can take it a step further by highlighting it or even replacing it with new content. Aspose.PDF for .NET lets you style or modify the matched text easily using the TextFragment object.

Search and Highlight Text in PDF

You can visually highlight the text by changing its background and font color.

TextFragmentAbsorber absorber = new TextFragmentAbsorber("invoice");
pdfDocument.Pages.Accept(absorber);

foreach (TextFragment fragment in absorber.TextFragments)
{
    // Highlight by changing text appearance
    fragment.TextState.BackgroundColor = Color.Yellow;
    fragment.TextState.ForegroundColor = Color.Red;
    fragment.TextState.FontStyle = FontStyles.Bold;
}

This is useful for reviewing, redlining, or generating annotated reports.

Find and Replace Text

Need to redact or update text in the document? Just replace it directly:

foreach (TextFragment fragment in absorber.TextFragments)
{
    fragment.Text = "REDACTED";
}

You can even apply new formatting while replacing:

fragment.TextState.FontSize = 12;
fragment.TextState.Font = FontRepository.FindFont("Arial");
fragment.TextState.ForegroundColor = Color.Black;

Highlighting and replacing text programmatically allows you to automate many document processing tasks, like cleaning up templates, updating outdated content, or censoring private data.

Search Across All Pages or Specific Pages

By default, Aspose.PDF searches across all pages in a PDF. But sometimes, you might want to limit the search to a specific page or a range of pages—especially when working with large files or when the content is predictable.

Aspose.PDF makes it easy to do both.

Search on All Pages (Default)

If you don’t specify a page, the absorber automatically searches every page.

TextFragmentAbsorber absorber = new TextFragmentAbsorber("invoice");
pdfDocument.Pages.Accept(absorber); // Searches all pages

Search on a Specific Page

You can also search a single page by directly targeting it:

TextFragmentAbsorber absorber = new TextFragmentAbsorber("invoice");

// Search only on page 2
pdfDocument.Pages[2].Accept(absorber);

Search on a Range of Pages

To search a custom range (e.g., pages 2 to 4), just loop through the range:

TextFragmentAbsorber absorber = new TextFragmentAbsorber("invoice");

// Loop through selected pages
for (int i = 2; i <= 4; i++)
{
    pdfDocument.Pages[i].Accept(absorber);
}

This approach gives you full control over performance and precision, especially helpful for scanned or sectioned documents.

Advanced Use Case: Search and Redact Sensitive Information

In legal, HR, or financial documents, it’s common to hide sensitive content—like names, IDs, or account numbers—before sharing. Aspose.PDF for .NET makes this easy by combining search with redaction features.

You can search for terms and then apply a black overlay using RedactionAnnotation.

What Happens

  • The target text (“John Doe”) is found and covered with a black box.
  • This isn’t just visual—it removes the content from the PDF layer, making it unrecoverable from the file.

Get a Free License

Now that you’ve learned how to search, extract, highlight, and redact text in PDFs using Aspose.PDF for .NET, it’s time to put that knowledge into action.

Try it yourself: Download a free temporary license and start building your own smart PDF tools.

Search in PDF: Free Resources

Want to go beyond just searching text in PDFs? Explore the full capabilities of Aspose.PDF for .NET with these free, developer-friendly resources:

  • Developer’s Guide
    Learn how to create, modify, convert, and secure PDF files programmatically.
    Aspose.PDF for .NET Documentation

  • Free Online Tools
    Convert, merge, split, and edit PDF files directly in your browser.
    Aspose Free PDF Tools

  • API Reference
    Learn more about classes, properties, and methods available in Aspose.PDF for .NET to accelerate your development.
    Aspose.PDF API Reference

  • Support Forum
    Ask questions, report issues, and get answers directly from Aspose experts.
    Aspose Support Forum

These resources are free and available to help you get the most out of your PDF development journey.

Conclusion

Searching text in PDF files is a vital feature for many document-based applications—whether you’re extracting data, auditing content, or preparing files for redaction. With Aspose.PDF for .NET, you can easily perform keyword searches, use regular expressions, highlight results, and even redact sensitive information with precision and control. Aspose.PDF offers a developer-friendly API that simplifies complex PDF operations—saving you time while enabling powerful automation.

If you have any questions or need further assistance, please feel free to reach out at our free support forum.

See Also