PDF files are a great way to ensure the document you created maintains its formatting and attributes the way you intended, regardless of whether the machine on the file is opened. That said, it also makes editing PDF files a bit difficult.
Things become especially problematic when you work with PDF manipulation in code. Luckily, libraries like iTextSharp help developers create, edit, inspect and maintain PDF documents.
In this article, we’re talking about how you can extract specific text by colour using iTextSharp.
Also read: How to fix ‘Internal exception java.net.socketexception connection reset realms’?
Extracting text based on colour
There’s no way of extracting text directly based on its highlight or font colour in iTextSharp. That said, you can use the ExtractText() method, fill in the formatting details of a specific text and run them against a reference colour to get what you want.
A simple script to do so would look like this.
PdfLoadedDocument pdf;private void Form1_Load(object sender, System.EventArgs e)
{
//Loads the PDF document
pdf = new PdfLoadedDocument(@"link/to/file.pdf");
//Enter colour name here
textBox1.Text = "Blue";
}
private void button1_Click(object sender, EventArgs e)
{
List<TextData> TextFormat = new List<TextData>();
string text = null;
//Convert the colour string into an actual colour value
Color color = Color.FromName(textBox1.Text);
//Check for incorrect colour name
if(color.ToArgb()==0)
{
MessageBox.Show("Enter valid colour name");
return;
}
for (int i = 0; i < pdf.Pages.Count; i++)
{
//Load PDF page
PdfPageBase page = pdf.Pages[i];
//Extract the text with the specified formatting attributes
string pageTexts = page.ExtractText(out TextFormat);
for (int j = 0; j < TextFormat.Count; j++)
{
//Check for target colour
if (TextFormat[j].FontColor.ToArgb() == color.ToArgb())
{
//Write text to file
text += TextFormat[j].Text;
}
}
}
if (text != null)
MessageBox.Show(text);
else
MessageBox.Show("The document doesn't have any " + textBox1.Text + " coloured text");
}
Just as is the case with everything in coding, there are different methods to achieve the same result using different libraries or approaches. If you’re starting with iTextSharp, this is probably the simplest one to understand.
Also read: Fix: Unknown error: soap-error: encoding: object has no uirequestid property
Someone who writes/edits/shoots/hosts all things tech and when he’s not, streams himself racing virtual cars.
You can contact him here: [email protected]