大学中Pyrhon数据基础与挖掘

这段代码为大一上半学期选修课程Python数据基础与挖掘

如何使用 if 和 else 命令

Name = input("请输入用户名")
Password = input("请输入密码")
if Name == ('admin') and Password == ('123'):
    print("登陆成功")
else:
    print("登陆失败,账户or密码错误")

使用 for (sth) in range 进行循环命令后使用 break 停止命令

for i in range(3):
    Name = input("请输入用户名")
    Password = input("请输入密码")
    YanZheng = input("请输入验证码")
    if Name == ('admin') and Password == ('123') and YanZheng == ("321"):
        print("登陆成功")
        print("您的余额剩余100万元")
        break
    else:
        print("登陆失败,账户or密码错误")

字符串中的储存方式 —— 下标编号

1 2	`name = "abcdef" print(name[0],name[1])`

什么叫切片

1	`range[起始:结束:步长]`

使用切片加下标编号

name = "abcdef"
print(name[:1:2],name[::-2])

string ="Life is short,we need python!"
print(string[8:13:1])

字符串内建函数 —— index函数:检测字符串是否包括子字符串

1
2
3

# 具体参数如下:str —— 指定检索字符串、beg —— 开始检索而且默认为零、结束引索而且默认为字符串的长度
# 使用方法跟 find() 一样但是如果str不在string中会出现异常
str.index(str,start = 0,end = len(string))

字符串内建函数 —— replace函数:将旧字符串替换为新字符串

# 具体参数如下:old —— 将被替换的字符串、new —— 新字符串且用于替换old字符串、max —— 可选字符串且替换不超过max次
str.replace(old,new[,max])
web = "https://yyxy.sisu.edu.cn/"
dep = ["be","dyx","zwx","media","sibm","sis"]
for a in dep:
    url = web.replace("yyxy",a)
    print(url)

字符串内建函数 —— split函数:通过指定分隔符对字符串进行切片

# 具体参数如下:str —— 分隔符而且默认为所有空字符、num —— 分割次
str.split(str = "",num = string.count(str))

string ="Life is short,we need python"
print(string.split("s")[-1])

string ="Life is short,we need python"
print(string.split("h")[1].split("w")[0])

字符串内建函数 —— capitalize与title函数

# capitalize:第一个字符大写而且其他字符小写
str.capitalize()
# title:所有单词首字母大写而且其余字母小写
str.title()

字符串内建函数 —— startswith函数:检查字符串是否以制定子串开头

1
2

#  具体参数如下:strbeg —— 可选参数用于设置字符串检查的起始值、strend —— 可选参数用于设置字符串检查的结束位置
str.startswith(str,beg = 0,end = len(string))

字符串内建函数 —— endswith函数:检查字符串是否以制定子串结尾

1
2

# 具体参数如下:suffix —— 该参数可以是一个字符串或者是一个元素、start —— 字符串中的开始位置、end —— 字符串中的结束位置
str.endswith(suffix[,start[,end]])

字符串内建函数 —— upper函数:将小写字母转为大写字母

str.upper()
mystr = "I love my lover RongHua"
NewMystr = mystr.upper
print(NewMystr)

字符串内建函数 —— ljust函数:左对齐而且使用空格填充至指定长度的新字符串

1 2	`# 具体参数如下:width —— 指定字符串长度、fillchar —— 填充字符而且默认为空格 str.ljust(width[,fillchar])`

字符串内建函数 —— strip函数:截掉字符串左右边的空格或指定字符

# 具体参数如下:chars —— 移除字符串头尾指定的字符
str.strip([chars])

string ="Life is short,we need python!"
print(string.strip("Life i"))

条件语句if-elif-else的使用:

if condition1:
    # 如果条件 1 成立，则执行此处的代码块
elif condition2:
    # 如果条件 2 成立，则执行此处的代码块
else:
    # 如果所有条件都不成立，则执行此处的代码块

mouth = int(input("请输入你的出生月份: "))
day = int(input("请输入你的出生日期: "))
if mouth == 3 and 21 <= day <= 31 or mouth == 4 and 1<= day <=19:
    print("你的星座是白羊座")
elif mouth == 4 and 20 <= day <= 30 or mouth == 5 and 1<= day <=20:
    print("你的星座是金牛座")
else:
    print("你是其他星座")

字符串中使用“ ”或\t转义字符空一格

string1 = "Hello"
string2 = "World"
print( string1," ",string2 )
print(string1,"\t",string2)

列表aList元素访问与计数

score = [99,88,92,100,66,85,57,79,90,61]
score[0:5:]

score = [99,88,92,100,66,85,57,79,90,61]
score[2]

列表aList元素的增加指令—append、extend、insert、*

1
2
3

score = [99,88,92,100,66,85,57,79,90,61]
score.insert(2,6)
score[::1]

列表aList元素的删减指令—pop()、remove()

1
2
3

score = [99,88,92,100,66,85,57,79,90,61]
score.pop(1)
score

在列表外/中去使用循环语句去除指定元素

a = [1,2,1,2,1,2]
for i in a:
    if i = 1:
        a.remove()
print(a)

a = [1,2,1,2,1,2]
a = [i for i in a if i !=1]
print(a)

在列表中使用count()指令查询指定元素次数

score = [99,88,92,100,66,85,57,79,90,61]
num = score
count = num.count(90)
print(count)

在列表中使用len()指令查询总元素数量

1
2
3

score = [99,88,92,100,66,85,57,79,90,61]
num = len(score)
print(num)

在列表中使用sort()函数，对函数从小到大排序并输出

1
2
3

score = [99,88,92,100,66,85,57,79,90,61]
score.sort()
print(score)

在列表中使用reverse()函数，颠倒函数顺序

1
2
3

score = [99,88,92,100,66,85,57,79,90,61]
score.reverse()
print(score)

元组的创建与删除 —— 使用tuple函数和del删除

aList = [-1,-4,6,7.5,-2.3,9,-11]
tuple(aList)
print (tuple(aList)[1])

a = [99,88,92,100,66,85,57,79,90,61]
b = tuple(a)
print(b)
del b 
print b

序列解包 —— 对于列表和字典同样有效

# 同时对多个变量进行赋值
v_tuple = (False,3.5,'exp')
(x,y,z) = v_tuple
print(x)

# 列表和字典的解包
a = [1,2,3]
b,c,d = a
print(a)
s = {'a':1,'b':2,'c':3}
b,c,d = s.items()
print(b,c,d)

aList = [1,2,3]
bList = [4,5,6]
cList = [7,8,9]
dList = zip(aList,bList,cList)
for index,value in enumerate(dList):
    print(index,':',value)

将元组转换成列表的指令 —— list指令

1
2
3

a = ('茸华似雪',520,'唯世恋茸',521)
b = list(a)
print(a)

将元组转换成列表后增添/减少元素

a = ('茸华似雪',520,'唯世恋茸',521)
b = list(a)
b[0] = '茸华是小丑'
print(b)

字典创建与删除 —— 创造字典

1
2
3

a_dict = {'name':'茸华似雪','age':18}
x = {} #空字典
print(a_dict,x)

字典创建与删除 —— 相关dict使用

d = dict(name = '茸华',age = 18)
x = dict() # 空字典
print(d,x)

keys = ['a','b','c']
values = [1,2,3]
dictionary = dict(zip(keys,values))
print(dictionary)

字典创建与删除 —— del使用

1
2
3

add=dict = dict.fromkeys(['name','age','gender'])
print(add)
del add

字典创建与删除 —— get、items的使用

a_dict = {'name':'茸华似雪','age':18}
print(a_dict.get('na'))
print(a_dict.get('name'))

a = {'name':'茸华似雪','age':18}
for item in a.items():
    print(item)

keys = ['a','b','c']
values = [1,2,3]
dictionary = dict(zip(keys,values))
a = {'name':'茸华似雪','age':18,'gender':'male'}
for keys,values in a.items():
    print(keys,values)

字典创建与删除 —— del、clear()、pop()、popitem的使用

a = {'name':'茸华似雪','age':18}
del a['a']
a = {'name':'茸华似雪','age':18}
a.clear()
a = {'name':'茸华似雪','age':18,'b':5}
a.pop('b')
a = {'name':'茸华似雪','age':18,'b':5}
a.popitem()

Python数据类型

# 整数 int
# 小数 float
# 字符串 string
# 布尔函数 bool
# 列表 list
# 字典  dictionary
# 元组 tuple
# 集合 set
# 函数 def

函数代入数值

def add2num(a,b):
    c = a + b
    print(c)
add2num(11,22)

最常用的有参数有返回值的函数

def Anum(num):
    result = 0
    i = 1
    while i <= num:
        result = result + i
        i += 1
    return result
result = Anum(100)
print('1~100的累积和为:',result)

使用结巴分词库

1
2
3

import jieba
text = '我喜欢茸华似雪'
print(text)

结巴分词的精确模式

1 2	`words = jieba.cut(text,cut_all = False) print('精确模式:','/'.join(words))`

结巴分词的全模式

1 2	`words = jieba.cut(text,cut_all = True) print('全模式:','/'.join(words))`

结巴分词的搜索模式

1 2	`words = jieba.cut_for_search(text) print('搜索模式:','/'.join(words))`

结巴分词的读取自带文件

1
2
3

with open(r'E:/管理学/debug.log',encoding = 'ansi') as f:
    text = f.read()
text[0:20] ### 查看前前二十个字母

结巴分词的自定义词典

import jieba
text = '南京市长江大桥'
jieba.add_word('南京市长')
words = jieba.cut(text)
print('分词后:','/'.join(words))

结巴分词的词性标注

import jieba.posseg as pseg
text = '我喜欢茸华似雪'
words = pseg.cut(text)
print(type(words))
words = list(words)
print(words)
for word,flag in words:
    print(word,flag)

结巴分词的命名实体识别

import jieba.analyse
text = '我喜欢茸华似雪'
keywords = jieba.analyse.extract_tags(text,topK=6)
print(keywords)

结巴分词的情感分析

import jieba.analyse
from snownlp import SnowNLP
text ="我很喜欢这部电影，它让我感到非常感动。"
keywords = jieba. analyse. extract_tags(text,topK=5)
s= SnowNLP(text)
sentiment = s.sentiments
print("关键词:", keywords)
print("情感分析:", sentiment)

结巴分词的提取成语

import jieba
import jieba.posseg as pseg
### 加载自定义的词典——jieba.load_userdict("idiom_dict.txt")
text = "他说了一句顺口溜:“十年树木，百年树人”，意思是培养人才要从长远考虑。"
words = pseg. cut (text) ### 存储成语及其频次的字典
idioms= {}
for word, flag in words:
    ### 如果词性标签是i，说明是成语
    if flag== "i" :
        ### 如果成语已经在字典中，则将其频次加 1
        if word in idioms:
            idioms[word] += 1
        ### 如果成语不在字典中，则将其加入字典，并将频次初始化为 1
        else:
            idioms[word] = 1
for idiom,count in idioms.items():
    print(idiom,count)

结巴分词的提取成语和介词

import jieba
text = "他在考场上表现得很出色，不过他还有很多不足之处，需要继续努力。"
### 定义用于存储成语和代词的列表
idioms = []
pronouns = []
### 定义成语和代词的集合
idioms_set ={'半途而废','不可思议','不胫而走', '不言而喻','大公无私', '大言不pronouns_set = ('我', '你','他','她', '它','我们','你们', '他们','她们','它们')
### 对文本进行分词
words = jieba.cut(text)
### 遍历分词后的结果，提取成语和代词
for word in words :
    if len(word) == 4 and word in idioms_set :
### 如果是成语，则加入到成语列表中
        idioms. append(word)
    elif word in pronouns_set:
### 如果是代词，则加入到代词列表中
        pronouns. append (word)
### 输出提取结果
print('成语:',idioms)
print('代词:',pronouns)

课堂上词云图作业

from wordcloud import WordCloud                       	#导入
import jieba
import jieba.analyse
import collections
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
with open("C:/Users/69099/Desktop/二十大报告1.txt","r",encoding="utf-8") as file:		#打开文件
    txt = file.read()									#读取文件
    seg_list = jieba.analyse.extract_tags(txt)
    wc_mask = np.array(Image.open("C:/Users/69099/Desktop/f94efe25cfc14c46934a05f046d05e33.png"))
    wc = WordCloud(font_path="C:\Windows\Fonts\simkai.TTF",	#设置属性
                          collocations=False,
                          background_color="white",
                          width=1000,
                          height=800,
                          max_font_size=100,
                          contour_color = 'Blue',
                          mask = wc_mask, #设置背景图
                          contour_width = 2,   #词云边框大小
                          max_words=100).generate(txt)
    wc.generate_from_frequencies(word_counts)

image_colors = wordcloud.ImageColorGenerator(wc_mask)

plt.figure(figsize = [10,10])
plt.imshow(wc)
plt.axis("off")
plt.show()

#Python

大学中Pyrhon数据基础与挖掘

http://ronghuasixue.com/2023/10/19/大学中Pyrhon数据基础与挖掘/

作者

茸华似雪

发布于

2023年10月19日

许可协议

大一期末作业上一篇

My New Way 下一篇

大学中Pyrhon数据基础与挖掘

这段代码为大一上半学期选修课程Python数据基础与挖掘

如何使用 if 和 else 命令

使用 for (sth) in range 进行循环命令后 使用 break 停止命令

字符串中的储存方式 —— 下标编号

什么叫切片

使用切片加下标编号

字符串内建函数 —— index函数:检测字符串是否包括子字符串

字符串内建函数 —— replace函数:将旧字符串替换为新字符串

字符串内建函数 —— split函数:通过指定分隔符对字符串进行切片

字符串内建函数 —— capitalize与title函数

字符串内建函数 —— startswith函数:检查字符串是否以制定子串开头

字符串内建函数 —— endswith函数:检查字符串是否以制定子串结尾

字符串内建函数 —— upper函数:将小写字母转为大写字母

字符串内建函数 —— ljust函数:左对齐而且使用空格填充至指定长度的新字符串

字符串内建函数 —— strip函数:截掉字符串左右边的空格或指定字符

条件语句if-elif-else的使用:

字符串中使用“ ”或\t转义字符 空一格

列表aList元素访问与计数

列表aList元素的增加指令—append、extend、insert、*

列表aList元素的删减指令—pop()、remove()

在列表外/中去使用循环语句去除指定元素

在列表中使用count()指令查询指定元素次数

在列表中使用len()指令查询总元素数量

在列表中使用sort()函数，对函数从小到大排序并输出

在列表中使用reverse()函数，颠倒函数顺序

元组的创建与删除 —— 使用tuple函数和del删除

序列解包 —— 对于列表和字典同样有效

将元组转换成列表的指令 —— list指令

将元组转换成列表后增添/减少元素

字典创建与删除 —— 创造字典

字典创建与删除 —— 相关dict使用

字典创建与删除 —— del使用

字典创建与删除 —— get、items的使用

字典创建与删除 —— del、clear()、pop()、popitem的使用

Python数据类型

函数代入数值

最常用的有参数有返回值的函数

使用结巴分词库

结巴分词的精确模式

结巴分词的全模式

结巴分词的搜索模式

结巴分词的读取自带文件

结巴分词的自定义词典

结巴分词的词性标注

结巴分词的命名实体识别

结巴分词的情感分析

结巴分词的提取成语

结巴分词的提取成语和介词

课堂上词云图作业

使用 for (sth) in range 进行循环命令后使用 break 停止命令

字符串中使用“ ”或\t转义字符空一格