Monday, June 3, 2024
 Popular · Latest · Hot · Upcoming
141
rated 0 times [  146] [ 5]  / answers: 1 / hits: 43377  / 12 Years ago, tue, february 26, 2013, 12:00:00

I need to filter a collection of strings based on a rather complex query - in it's raw form it looks like this:



nano* AND (regulat* OR *toxic* OR ((risk OR hazard) AND (exposure OR release)) )


An example of one of the strings to match against:



Workshop on the Second Regulatory Review on Nanomaterials, 30 January 2013, Brussels


So, I need to match using AND OR and wildcard characters - so, I presume I'll need to use a regex in JavaScript.



I have it all looping correctly, filtering and generally working, but I'm 100% sure my regex is wrong - and some results are being omitted wrongly - here it is:



/(nano[a-zA-Z])?(regulat[a-zA-Z]|[a-zA-Z]toxic[a-zA-Z]|((risk|hazard)*(exposure|release)))/i


Any help would be greatly appreciated - I really can't abstract my mind correctly to understand this syntax!



UPDATE:



Few people are point out the importance of the order in which the regex is constructed, however I have no control over the text strings that will be searched, so I need to find a solution that can work regardless of the order or either.



UPDATE:



Eventually used a PHP solution, due to deprecation of twitter API 1.0, see pastebin for example function ( I know it's better to paste code here, but there's a lot... ):



function: http://pastebin.com/MpWSGtHK
usage: http://pastebin.com/pP2AHEvk



Thanks for all help


More From » regex

 Answers
6

A single regex is not the right tool for this, IMO:



/^(?=.*bnano)(?=(?:.*bregulat|.*toxic|(?=.*(?:briskb|bhazardb))(?=.*(?:bexposureb|breleaseb))))/i.test(subject))


would return True if the string fulfills the criteria you set forth, but I find nested lookaheads quite incomprehensible. If JavaScript supported commented regexes, it would look like this:



^                 # Anchor search to start of string
(?=.*bnano) # Assert that the string contains a word that starts with nano
(?= # AND assert that the string contains...
(?: # either
.*bregulat # a word starting with regulat
| # OR
.*toxic # any word containing toxic
| # OR
(?= # assert that the string contains
.* # any string
(?: # followed by
briskb # the word risk
| # OR
bhazardb # the word hazard
) # (end of inner OR alternation)
) # (end of first AND condition)
(?= # AND assert that the string contains
.* # any string
(?: # followed by
bexposureb # the word exposure
| # OR
breleaseb # the word release
) # (end of inner OR alternation)
) # (end of second AND condition)
) # (end of outer OR alternation)
) # (end of lookahead assertion)


Note that the entire regex is composed of lookahead assertions, so the match result itself will always be the empty string.



Instead, you could use single regexes:



if (/bnano/i.test(str) &&
(
/bregulat|toxic/i.test(str) ||
(
/b(?:risk|hazard)b/i.test(str) &&
/b(?:exposure|release)b/i.test(str)
)
)
) /* all tests pass */

[#79991] Monday, February 25, 2013, 12 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
wyattkennyc

Total Points: 650
Total Questions: 102
Total Answers: 90

Location: Monaco
Member since Mon, May 23, 2022
2 Years ago
;