i have list of words positive , negative sentiment e.g. ['happy', 'sad']
now when processing tweets i'm removing repeating characters (allowing 2 repetitions):
happpppyyy -> happyy saaad -> saad the check if e.g. saad part of word list should return true because similar sad.
how can implement behaviour?
i build regular expressions dynamically turning word:
happy into
h+a+p+p+y+ pass list of "happy" words this:
import re re_list = [re.compile("".join(["{}+".format(c) c in x])) x in ['happy', 'glad']] then test (using any return true if happy regex matches:
for w in ["haaappy","saad","glaad"]: print(w,any(re.match(x,w) x in re_list)) result:
haaappy true saad false glaad true
Comments
Post a Comment