In this article we will see how to extract images and text from Pdf , and convert it to a booklet by using Spire.pdf.
Introduction
Introduction
In this article we will see how to extract images and text
from Pdf file , and then convert it to a booklet by using
Spire.pdf. For those who aren’t familiar with spire pdf,it is a professional PDF
component applied to creating, writing, editing, handling and reading PDF files
without any external dependencies within .NET application. Using this .NET PDF
library, you can implement rich capabilities to create PDF files from scratch
or process existing PDF documents entirely through C#/VB.NET without installing
Adobe Acrobat.
Background
I was working a project and want to extract text from multiple pdf files in order to analyse the contents before exporting to the database. I surfed the internet and found spire pdf. It worked well for me and it's very easy to use. However the library is not only for extraction but it support with
many rich features, such as security setting, metadata update, importing data,
to name few. it also converts text, image and html to pdf with C#/VB.NET in
high quality.
Pre requisites
In this demo we are using visual studio 2015, .Net framework
4.5.2, and the Spire PDF for .Net (by e-iceblue) . .
Step 1: Open Visual Studio and select File ->New
Project, and from the new project dialog box select ->Visual C# ,and select ->Windows
Forms Application. Enter a project name at the bottom of the dialog box and
click OK button. Add the spire pdf dll
reference to the project.
Step2: I'm using a single button to extract images and text from pdf file but you can use separate buttons if you want.
Using The Code
§
Select button code
private void selectBtn_Click(object sender, EventArgs e) { OpenFileDialog dialog = new OpenFileDialog(); // file types, that will be allowed dialog.Filter = "Pdf | *.pdf"; dialog.ShowDialog(); tB1.Text = dialog.FileName; }'
§ Extract image & text button code
private void Extract_Click(object sender, EventArgs e) { if (tB1.Text != "") { SaveFileDialog savefile = new SaveFileDialog(); savefile.FileName = " TextInPdf.txt"; savefile.Filter = "TextFiles | *.txt"; // if user clicked OK if (savefile.ShowDialog() == DialogResult.OK) { try { //Create a pdf document. PdfDocument doc = new PdfDocument(); //load the file doc.LoadFromFile(tB1.Text); StringBuilder buffer = new StringBuilder(); IListimages = new List (); foreach (PdfPageBase page in doc.Pages) { buffer.Append(page.ExtractText()); foreach (Image image in page.ExtractImages()) { images.Add(image); } } doc.Close(); //save text String fileName = "TextInPdf.txt"; File.WriteAllText(fileName, buffer.ToString()); //save image int index = 0; foreach (Image image in images) { String imageFileName = String.Format("Image-{0}.png", index++); image.Save(imageFileName, ImageFormat.Png); } //Launching the text file. System.Diagnostics.Process.Start(fileName); } catch (Exception ex) { MessageBox.Show(ex.Message); } } } }
§
Convert to Booklet Button
code
private void bkletBtn_Click(object sender, EventArgs e) { if (tB1.Text != "") { SaveFileDialog savefile = new SaveFileDialog(); savefile.FileName = "booklet.pdf"; savefile.Filter = "pdf | *.pdf"; // if user clicked OK if (savefile.ShowDialog() == DialogResult.OK) { //Create a pdf document. PdfDocument doc = new PdfDocument(); String srcPdf = tB1.Text; float width = PdfPageSize.A4.Width * 2; float height = PdfPageSize.A4.Height; doc.CreateBooklet(srcPdf, width, height, true); //Save pdf file. doc.SaveToFile("Booklet.pdf"); doc.Close(); //Launching the Pdf file. PDFDocumentViewer("Booklet.pdf"); } } } private void PDFDocumentViewer(string fileName) { try { System.Diagnostics.Process.Start(fileName); } catch { } }
Now let’s run by selecting a pdf file ,and we will extract images and text from it.
And bellow are the images that are extracted from the pdf document
Now let’s Convert the pdf file to booklet by first selecting
the file, and clicking convert to booklet button in our application.
Conclusion
The spire pdf is very easy to use and very helpful for .Net developers.
The documentation is also very simple and self-explaining. In this article I showed
you a simple demonstration but there are also very interesting features, like form
filling, file conversion, etc. I’ll keep
sharing. Stay tuned
Thank you so much for your reading! If you have any
complaint or suggestion about the code or the article, please let me know.
Don't forget leaving your opinion in the comments section below. ;)
0 comments:
Post a Comment