$Natural Language Toolkit$
NLTK is a leading platform for building Python programs to work with human language data.
pip show nltk
pip install nltk
import nltk
nltk.download()
报错:
[nltk_data] Error loading punkt: <urlopen error [Errno 11004]
[nltk_data] getaddrinfo failed>
(1)应该是从 https://raw.githubusercontent.com/nltk/nltk_data/ 下载数据失败,
(2)然后去 https://gitee.com/qwererer2/nltk_data/tree/gh-pages/ 下载整个仓库。653mb
(3)将packages下所有文件,复制到jupyter找得到的路径下(文件夹名为nltk_data,可自己创建)。
(4)将nltk_data\tokenizers
下的 punkt.zip
解压到当前目录即可。
(5)运行测试代码,环境基本好了
import nltk
sentence = "What a happy day"
tokens = nltk.word_tokenize(sentence)
tokens
# Out: ['What', 'a', 'happy', 'day']
参考资料:
- nltk官网: http://www.nltk.org/
推荐阅读: