|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objecttoxi.data.feeds.util.EntityStripper
public class EntityStripper
Strips HTML entities such as " from a string, replacing them by their Unicode equivalents.
Field Summary | |
---|---|
static int |
LONGEST_ENTITY
Longest an entity can be 10, at least in our tables, including the lead & and trail ;. |
static int |
SHORTEST_ENTITY
The shortest an entity can be 4, at least in our tables, including the lead & and trailing ;. |
static char |
UNICODE_NBSP_160_0x0a
unicode nbsp control char, 160, 0x0a. |
Constructor Summary | |
---|---|
EntityStripper()
|
Method Summary | |
---|---|
static char |
bareHTMLEntityToChar(java.lang.String bareEntity,
char howToTranslateNbsp)
convert an entity to a single char. |
static java.lang.String |
flattenHTML(java.lang.String text,
char translateNbspTo)
strips tags and entities from HTML. |
static java.lang.String |
flattenXML(java.lang.String text)
strips tags and entities from XML.. |
static char |
possEntityToChar(java.lang.String possBareEntityWithSemicolon)
Checks a number of gauntlet conditions to ensure this is a valid entity. |
static java.lang.String |
stripHTMLEntities(java.lang.String text,
char translateNbspTo)
Converts HTML to text converting entities such as " back to " and < back to < Ordinary text passes unchanged. |
static java.lang.String |
stripHTMLTags(java.lang.String html)
Removes tags from HTML leaving just the raw text. |
static java.lang.String |
stripXMLEntities(java.lang.String text)
Converts XML to text converting entities such as " back to " and < back to < Ordinary text passes unchanged. |
static java.lang.String |
stripXMLTags(java.lang.String xml)
Removes tags from XML leaving just the raw text. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final char UNICODE_NBSP_160_0x0a
public static final int LONGEST_ENTITY
public static final int SHORTEST_ENTITY
Constructor Detail |
---|
public EntityStripper()
Method Detail |
---|
public static char bareHTMLEntityToChar(java.lang.String bareEntity, char howToTranslateNbsp)
bareEntity
- String entity to convert convert. must have lead & and trail ;
stripped; may have form: #x12ff or #123 or lt or nbsp style
entity. Works faster if entity in lower case.howToTranslateNbsp
- char you would like   translated to, usually ' ' or (char)
160
public static java.lang.String flattenHTML(java.lang.String text, char translateNbspTo)
text
- to flattentranslateNbspTo
- char you would like translated to, usually ' ' or
(char) 160 .
public static java.lang.String flattenXML(java.lang.String text)
text
- to flatten
public static char possEntityToChar(java.lang.String possBareEntityWithSemicolon)
possBareEntityWithSemicolon
- string that may hold an entity. Lead & must be stripped, but
may optionally contain text past the ;
public static java.lang.String stripHTMLEntities(java.lang.String text, char translateNbspTo)
text
- raw text to be processed. Must not be null.translateNbspTo
- char you would like translated to, usually ' ' or
(char) 160 .
public static java.lang.String stripHTMLTags(java.lang.String html)
html
- input HTML
public static java.lang.String stripXMLEntities(java.lang.String text)
text
- raw XML text to be processed. Must not be null.
public static java.lang.String stripXMLTags(java.lang.String xml)
xml
- input XML
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |