python单词查找练习题

题目要求

现有一个文件dictionary.txt,要求对数据进行处理,分别将英文单词和中文解释保存到mysql数据库,并且可通过输入关键字进行模糊匹配查询。
dictionary.txt文件格式

1
2
3
4
5
6
abandon   v.抛弃,放弃
abandonment n.放弃
abbreviation n.缩写
abeyance n.缓办,中止
abide v.遵守
ability n.能力

题目分析

  1. 数据库建表,表字段至少包括单词含义。涉及知识点为SQLAlchemy,主要是创建表。
  2. 读取文件并对内容进行处理,然后插入数据库。涉及知识点为文件读写字符串处理SQLAlchemy插入数据。
  3. 通过关键字进行模糊匹配。涉及知识点为SQLAlchemy查询。

    实现

    数据库建表

    createTable.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    from sqlalchemy import create_engine, Column, String, Integer
    from sqlalchemy.ext.declarative import declarative_base

    engine = create_engine("mysql+pymysql://root:123456@localhost/test")
    Base = declarative_base()

    class Dictionary(Base):
    __tablename__ = 'dictionary'
    id = Column(Integer, primary_key=True)
    wod = Column(String(50))
    mean = Column(String(50))

    Base.metadata.create_all(engine)

mysql数据库查询

1
2
3
4
5
6
7
8
9
mysql> desc dictionary;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| key | varchar(50) | YES | | NULL | |
| value | varchar(50) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

数据处理

insertData.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import codecs
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from fuzzyMatching.createTable import Dictionary

engine = create_engine("mysql+pymysql://root:123456@localhost/test?charset=utf8")
DBsession = sessionmaker(bind=engine)
session = DBsession()

class HandlerData(object):
def __init__(self, dataFile):
self.dataFile = dataFile
def make_data_to_str(self):
with codecs.open(self.dataFile, encoding='utf-8') as file:
for (num, value) in enumerate(file):
line = value.strip().split()
diction = Dictionary(id=num+1, wod=line[0], mean=line[1])
session.add(diction)
session.commit()

handlerData = HandlerData('dictionary.txt')
handlerData.make_data_to_str()
session.close()

mysql数据库查询结果

1
2
3
4
5
6
7
mysql> select count(*) from dictionary;
+----------+
| count(*) |
+----------+
| 7988 |
+----------+
1 row in set (0.04 sec)

数据模糊匹配

fuzzyMatch.py

1
2
3
4
5
6
7
8
9
10
11
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from fuzzyMatching.createTable import Dictionary

engine = create_engine("mysql+pymysql://root:123456@localhost/test?charset=utf8")
DBsession = sessionmaker(bind=engine)
session = DBsession()
word = input("Please input a word:")
zero_result = session.query(Dictionary).filter(Dictionary.wod.like("%{0}%".format(word))).all()
for i in zero_result:
print(i.id, i.wod, i.mean)

输出结果

1
2
3
4
5
6
7
8
Please input a word:test
]1517 contest n.,v.争夺,比赛
1518 contestant n.参赛人
5604 protest v.,n.抗议,反对
5605 protestant n.清教徒,新教徒
7159 test v.,n.测验,试验
7160 testify v.证明,证实
7161 testimony n.证据

错误提示

  1. UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 18-22: ordinal not in range(256)
  2. 查询结果汉字乱码

处理方法
在创建engine时需要添加?charset=utf8参数

Recommended Posts