wiki

Post here doubts, ideas, suggestions and support requests about the website and the forums.
Forum rules
READ NOW: L2j Forums Rules of Conduct
Post Reply
User avatar
jurchiks
Posts: 6769
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

wiki

Post by jurchiks »

I've been updating a few pages in the wiki during the past couple of days, and I came across a somewhat dumb wordfilter. I tried putting the word "afford" in the text, but it said "ford found, gtfo!"...
Perhaps the wordfilter should check for whitespace/punctuation around the word?
Because if it's not "/[\s\.,-\_+!@#\$]+ford[\s\.,-\_+!@#\$]+/iu" or smth like that, then it's not the word you're looking for.
http://www.morewords.com/contains/ford/
Shows the words containing "ford" in them.
There are definately other words that could be mis-filtered, and IMHO they shouldn't be getting in the way like this.
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.
User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird »

too much ford cars loans, insurance and other spam ads on wiki.

you are right, maybe some complex regex fix that.
User avatar
jurchiks
Posts: 6769
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

Re: wiki

Post by jurchiks »

What, are there that many despite registration and captcha?
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.
User avatar
Zoey76
L2j Inner Circle
L2j Inner Circle
Posts: 7005
Joined: Tue Aug 11, 2009 3:36 am

Re: wiki

Post by Zoey76 »

ThePhoenixBird wrote:too much ford cars loans, insurance and other spam ads on wiki.

you are right, maybe some complex regex fix that.

Image
Powered by Eclipse 4.30 🌌 | Eclipse Temurin 21 ☕ | MariaDB 11.2.2 🗃️ | L2J Server 2.6.3.0 - High Five 🚀

🔗 Join our Discord! 🎮💬
User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird »

jurchiks wrote:What, are there that many despite registration and captcha?
hell yeah, actually that crazy regex spam word filter is what stops bots from posting junk in wiki

check how the bots bypass the captchas on the user registration log http://l2jserver.com/wiki/Special:Log/newusers
User avatar
jurchiks
Posts: 6769
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

Re: wiki

Post by jurchiks »

What's your regex like? Is it smth like this?

Code: Select all

$someString = 'audi|mazda|ford';$regex = "/\b($someString)\b/iu";$result = preg_match_all($regex, $yourTextHere, $matches);if ($result){    print_r($matches);}
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.
User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird »

Code: Select all

## Spam Regex$wgSpamRegex =  "/".                        # The "/" is the opening wrapper                "s-e-x|zoofilia|sexyongpin|grusskarte|geburtstagskarten|animalsex|".                "job|bureau|employer|jobster|salary|jobs|bangalore|india|employee|blackberry|".                "freelancing|career|medical|airlines|government|florida|".                "toyota|mercedes|benz|chevrolet|honda|".                "diet|snack|carbohydrates|diets|cholesterol|vitamins|".                "brokers|banks|insurance|fargo|commonwealth|bank|credit|federal|wachovia|".                "sex-with|dogsex|adultchat|adultlive|camsex|sexcam|livesex|sexchat|footjob|".                "chatsex|onlinesex|adultporn|adultvideo|adultweb.|hardcoresex|hardcoreporn|".                "teenporn|xxxporn|lesbiansex|livegirl|livenude|livesex|livevideo|camgirl|pussy|".                "spycam|voyeursex|casino-online|online-casino|kontaktlinsen|cheapest-phone|".                "laser-eye|eye-laser|fuelcellmarket|lasikclinic|cragrats|parishilton|".                "paris-hilton|paris-tape|2large|fuel-dispenser|fueling-dispenser|huojia|".                "jinxinghj|telematicsone|telematiksone|a-mortgage|diamondabrasives|".                "reuterbrook|sex-plugin|sex-zone|lazy-stars|eblja|liuhecai|".                "buy-viagra|-cialis|-levitra|boy-and-girl-kissing|". # These match spammy words                "dirare\.com|".           # This matches dirare.com a spammer's domain name                "overflow\s*:\s*auto|".   # This matches against overflow:auto (regardless of whitespace on either side of the colon)                "height\s*:\s*[0-4]px|".  # This matches against height:0px (most CSS hidden spam) (regardless of whitespace on either side of the colon)                "\\s*a\s*href|".         # This blocks all href links entirely, forcing wiki syntax                "display\s*:\s*none".     # This matches against display:none (regardless of whitespace on either side of the colon)                "/i";                     # The "/" ends the regular expression and the "i" switch which follows makes the test case-insensitive                                          # The "\s" matches whitespace                                          # The "*" is a repeater (zero or more times)                                          # The "\s*" means to look for 0 or more amount of whitespace
User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird »

the regex is based mostly on the words used to spam our wiki
User avatar
jurchiks
Posts: 6769
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

Re: wiki

Post by jurchiks »

1) if you put "sex" and "porn" in there, you can throw out all words that contain those words in them... shortens the regex by a good amount.
2) job/jobs - only the former one is necassary...
3) I'd put just "viagra" instead of "buy-viagra".
4) what is the actual code that parses this pattern? Does it go directly into preg_match*?
And what's with all the "-"?
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.
Post Reply