article

How to Find & Highlight a Particular Word or Phrase in a Document in Android using Java

Email
Submitted on: 10/21/2015 7:37:53 AM
By: Sherazam  
Level: Intermediate
User Rating: Unrated
Compatibility: Java (JDK 1.3), Java (JDK 1.4), Java (JDK 1.5)
Views: 4252
 
     This article describes now to programmatically find and highlight a particular word or a phrase in a document using Aspose.Words. It might seem easy at first to just find the string of text in a document and change its formatting, but the main difficulty is that due to formatting, the match string could be spread over several runs of text. Consider the following example. The phrase “Hello World!” consists of three different runs, its beginning is italic, middle is bold, while the last part – regular text. In addition to formatting, any bookmarks in the middle of text will split it into more runs.

 
				This article describes now to programmatically find and highlight a particular word or a phrase in a document using Aspose.Words. It might seem easy at first to just find the string of text in a document and change its formatting, but the main difficulty is that due to formatting, the match string could be spread over several runs of text. Consider the following example. The phrase “Hello World!” consists of three different runs, its beginning is italic, middle is bold, while the last part – regular text. In addition to formatting, any bookmarks in the middle of text will split it into more runs. The above example is represented in Aspose.Words using the following objects:
Run(Run.Text = “Hello”, Font.Italic = true)
Run(Run.Text = “World”, Font.Bold = true)
Run(Run.Text = “!”)
This article provides a solution designed to handle the described case – if necessary it collects the word (or phrase) from several runs, while skipping non-run nodes. The sample code opens a document and find any instance of the text “your document”. A replace handler is set up to handle the logic to be applied to each resulting match found. In this case the resulting runs are split around the txt and the resulting runs highlighted, even those matches that have different formatting and span across multiple runs.
[Java Code Sample]
 
package FindAndHighlight;
import java.util.regex.Pattern;
import java.util.ArrayList;
import java.awt.Color;
import java.io.File;
import java.net.URI;
import com.aspose.words.Document;
import com.aspose.words.IReplacingCallback;
import com.aspose.words.ReplaceAction;
import com.aspose.words.NodeType;
import com.aspose.words.ReplacingArgs;
import com.aspose.words.Node;
import com.aspose.words.Run;
class Program
{
public static void main(String[] args) throws Exception
{
// Sample infrastructure.
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;
Document doc = new Document(dataDir + "TestFile.doc");
// We want the "your document" phrase to be highlighted.
Pattern regex = Pattern.compile("your document", Pattern.CASE_INSENSITIVE);
// Generally it is recommend if you are modifying the document in a custom replacement evaluator
// then you should use backward replacement by specifying false value to the third parameter of the replace method.
doc.getRange().replace(regex, new ReplaceEvaluatorFindAndHighlight(), false);
// Save the output document.
doc.save(dataDir + "TestFile Out.doc");
}
}
class ReplaceEvaluatorFindAndHighlight implements IReplacingCallback
{
/**
 * This method is called by the Aspose.Words find and replace engine for each match.
 * This method highlights the match string, even if it spans multiple runs.
 */
public int replacing(ReplacingArgs e) throws Exception
{
// This is a Run node that contains either the beginning or the complete match.
Node currentNode = e.getMatchNode();
// The first (and may be the only) run can contain text before the match,
// in this case it is necessary to split the run.
if (e.getMatchOffset() > 0)
currentNode = splitRun((Run)currentNode, e.getMatchOffset());
// This array is used to store all nodes of the match for further highlighting.
ArrayList runs = new ArrayList();
// Find all runs that contain parts of the match string.
int remainingLength = e.getMatch().group().length();
while (
(remainingLength > 0) &&
(currentNode != null) &&
(currentNode.getText().length() <= remainingLength))
{
runs.add(currentNode);
remainingLength = remainingLength - currentNode.getText().length();
// Select the next Run node.
// Have to loop because there could be other nodes such as BookmarkStart etc.
do
{
currentNode = currentNode.getNextSibling();
}
while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
}
// Split the last run that contains the match if there is any text left.
if ((currentNode != null) && (remainingLength > 0))
{
splitRun((Run)currentNode, remainingLength);
runs.add(currentNode);
}
// Now highlight all runs in the sequence.
for (Run run : (Iterable) runs)
run.getFont().setHighlightColor(Color.YELLOW);
// Signal to the replace engine to do nothing because we have already done all what we wanted.
return ReplaceAction.SKIP;
}
/**
* Splits text of the specified run into two runs.
* Inserts the new run just after the specified run.
*/
private static Run splitRun(Run run, int position) throws Exception
{
Run afterRun = (Run)run.deepClone(true);
afterRun.setText(run.getText().substring(position));
run.setText(run.getText().substring((0), (0) + (position)));
run.getParentNode().insertAfter(afterRun, run);
return afterRun;
}
}
 
More about Aspose.Words for Android
Aspose.Words for Android is a Java word processing component that enables developers to generate, modify, convert and render Word documents within their Android applications. Aspose.Words supports DOC, DOCX, OOXML, RTF, HTML, XHTML, MHTML, OpenDocument, ODT, PDF, XPS, EPUB & other formats. Other useful features include document creation, content and formatting manipulation, mail merge abilities, reporting features, platform independence, performance & scalability all with minimal learning curve.


Other 6 submission(s) by this author

 


Report Bad Submission
Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

Your Vote

What do you think of this article (in the Intermediate category)?
(The article with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)
 

Other User Comments


 There are no comments on this submission.
 

Add Your Feedback
Your feedback will be posted below and an email sent to the author. Please remember that the author was kind enough to share this with you, so any criticisms must be stated politely, or they will be deleted. (For feedback not related to this particular article, please click here instead.)
 

To post feedback, first please login.