原文towardsdatascience.com/removing-spikes-from-raman-spectra-a-step-by-step-guide-with-python-b6fd90e8ea77?sourcecollection_archive---------4-----------------------#2024-11-06查找、移除并用插值值替换尖峰https://medium.com/nicopez?sourcepost_page---byline--b6fd90e8ea77--------------------------------https://towardsdatascience.com/?sourcepost_page---byline--b6fd90e8ea77-------------------------------- 尼古拉斯·可卡博士·发表于Towards Data Science ·7 分钟阅读·2024 年 11 月 6 日–本教程是系列文章的一部分系列文章题为使用 Python 进行拉曼光谱的数据科学该系列在Towards Data Science上发布。它基于在《Analytica Chimica Acta》期刊上发布的这篇文章。通过跟随本教程您将为您的数据分析工具包添加一项宝贵工具——一种已经在公开研究中使用的有效方法用于清理拉曼光谱。https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/7b4c0ebe0c2398f9f9dbcf05ecffd68a.png去除石墨烯拉曼光谱中的尖峰。图片由作者提供。简介尖峰去除是拉曼数据预处理中的一个重要步骤。尖峰是由宇宙射线撞击探测器引起的它们表现为强烈、狭窄的峰值可能会扭曲分析结果。这些能量爆发撞击电荷耦合设备CCD摄像头产生尖锐、高强度的峰值如果不加以修正可能会干扰进一步的处理步骤如归一化、光谱搜索或多元数据分析。因此清除这些伪影是一个优先事项。本教程将介绍一种实用的算法用于去除拉曼光谱中的尖峰。我们将使用 Python逐步演示一种用户友好、可定制的尖峰检测与修正方法确保拉曼数据的准确性和可靠性。图 1 展示了一个含有峰值的石墨烯拉曼光谱示例。石墨烯的卓越物理特性——如其电导率和热导率——使其成为一个备受研究的材料。其拉曼光谱包含反映结构特征的峰值揭示有关掺杂、应变和晶界的信息。因此拉曼光谱技术广泛用于表征石墨烯。然而为了充分利用这一工具必须事先去除峰值。https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/0de7cbb79612e8ba4c036fcb4f2ef06f.png图 1. 含峰值的石墨烯拉曼光谱。生成此图的代码如下所示。图片来自作者。importnumpyasnp# Load data directly into a numpy arraydatanp.loadtxt(spiked_spectrum.asc,delimiter,,skiprows1)# Extract Raman shift from the first column (index)ramanshiftdata[:,0]# Extract intensity from the second column (index 1in Python)intensitydata[:,1]# Plot the dataimportmatplotlib.pyplotasplt figplt.figure(figsize(5,3))plt.plot(ramanshift,intensity)plt.xlabel(Raman shift (cm$^{-1}$))plt.ylabel(Intensity (a.u.))plt.show()峰值移除算法这里提出的峰值移除算法包含四个主要步骤1. 峰值寻找2. 峰值检测3. 峰值标记4. 光谱校正让我们来看一下包含 Python 代码片段的不同步骤1. 峰值寻找首先算法通过检查局部最大值设置最小突出度阈值来识别显著的峰值。添加突出度阈值有助于排除由噪声生成的小峰值因为我们并不打算修正所有噪声。请参见下图进行对比。fromscipy.signalimportfind_peaks# Find the peaks in the spectrum (with and without prominence threshold)peaks_wo_p,_find_peaks(intensity)# Peaks found without a prominence thresholdpeaks_w_p,_find_peaks(intensity,prominence20)# Peaks found without a prominence thresholdfig,axplt.subplots(1,2,figsize(10,3))ax[0].plot(ramanshift,intensity,zorder0,labelRaw spectrum)ax[0].scatter(ramanshift[peaks_wo_p],intensity[peaks_wo_p],marker.,colorred,labelFound peaks)ax[1].plot(ramanshift,intensity,zorder0,labelRaw spectrum)ax[1].scatter(ramanshift[peaks_w_p],intensity[peaks_w_p],marker.,colorred,labelFound peaks)plt.show()https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/df05fdd0e60900a17df9f01be6273e5a.png第 1 步在光谱中找到峰值。图片来自作者。2. 峰值检测接下来根据峰值的特征窄宽度进行标记。这一点可能有助于大型光谱数据集的自动化。如果我们知道光谱中存在的拉曼带的宽度我们可以选择低于该值的阈值。例如在我们的系统分辨率下我们不预期出现宽度小于 10 cm-1 的石墨烯拉曼带。fromscipy.signalimportpeak_widths widthspeak_widths(intensity,peaks_w_p)[0]fig,axplt.subplots(figsize(5,3))ax.plot(ramanshift,intensity,zorder0,labelRaw spectrum)ax2ax.twinx()ax2.scatter(ramanshift[peaks_w_p],widths,marker,colorred,labelPeak widths)plt.show()https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/2ac54ea631976e354b792ba9f176db79.png第 2 步寻找峰值的宽度。图片来自作者。3. 峰值标记接下来任何受峰值影响的数据点都会被标记标记范围是根据峰值的突出程度计算的从而有效地隔离损坏的像素。换句话说我们选择必须修正的窗口。# Lets set the parameters:width_param_rel0.8width_threshold10# Estimation of the width of the narrowest Raman band# Calculation of the range where the spectral points are asumed to be corruptedwidths_ext_apeak_widths(intensity,peaks_w_p,rel_heightwidth_param_rel)[2]widths_ext_bpeak_widths(intensity,peaks_w_p,rel_heightwidth_param_rel)[3]# Create a vector where spikes will be flag: no spike 0, spike 1.spikesnp.zeros(len(intensity))# Flagging the area previously defined if the peak is considered a spike (width below width_threshold)fora,width,ext_a,ext_binzip(range(len(widths)),widths,widths_ext_a,widths_ext_b):ifwidthwidth_threshold:spikes[int(ext_a)-1:int(ext_b)2]1figplt.figure(figsize(5,3))plt.plot(ramanshift,intensity,zorder0,labelRaw spectrum)a1plt.scatter(ramanshift[int(widths_ext_a[a])-1:int(widths_ext_b[a])1],intensity[int(widths_ext_a[a])-1:int(widths_ext_b[a])1],colorred,labelcorrupted points)plt.axvline(xramanshift[int(widths_ext_a[a])-1],linestyle--,colorred)plt.axvline(xramanshift[int(widths_ext_b[a])1],linestyle--,colorred)plt.show()https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/6ca5b8f53884b5222fac8b105f5c4608.png第 3 步标记损坏的箱体。图片来自作者。4. 光谱校正最后这些点通过插值方法与邻近值进行修正保持光谱的完整性以便进行后续分析。fromscipyimportinterpolate# Lets set the parameter:moving_average_window10intensity_outintensity.copy()# Interpolation of corrupted pointsfori,spikeinenumerate(spikes):ifspike!0:# If we have an spike in position iwindownp.arange(i-moving_average_window,imoving_average_window1)# we select 2 ma 1 points around our spikewindow_exclude_spikeswindow[spikes[window]0]# From such interval, we choose the ones which are not spikesinterpolatorinterpolate.interp1d(window_exclude_spikes,intensity[window_exclude_spikes],kindlinear)# We use the not corrupted points around the spike to calculate the interpolationintensity_out[i]interpolator(i)# The corrupted point is exchanged by the interpolated value.figplt.figure(figsize(5,3))plt.plot(ramanshift,intensity,zorder0,colorred,labelRaw spectrum)plt.plot(ramanshift,intensity_out,zorder0,labelCorrected spectrum)plt.show()https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/1a41d905031018d2af7adab04d555510.png第 4 步光谱校正。图片来自作者。Python 中的完整峰值移除功能所有这些代码片段可以汇总成一个单独的函数。该函数被设计为根据特定的数据需求进行定制具有调整突出度和宽度的参数importnumpyasnpfromscipy.signalimportfind_peaks,peak_widths,peak_prominencesfromscipyimportinterpolatedefspike_removal(y,width_threshold,prominence_thresholdNone,moving_average_window10,width_param_rel0.8,interp_typelinear): Detects and replaces spikes in the input spectrum with interpolated values. Algorithm first published by N. Coca-Lopez in Analytica Chimica Acta. https://doi.org/10.1016/j.aca.2024.342312 Parameters: y (numpy.ndarray): Input spectrum intensity. width_threshold (float): Threshold for peak width. prominence_threshold (float): Threshold for peak prominence. moving_average_window (int): Number of points in moving average window. width_param_rel (float): Relative height parameter for peak width. tipo: type of interpolation (linear, quadratic, cubic) Returns: numpy.ndarray: Signal with spikes replaced by interpolated values. # First, we find all peaks showing a prominence above prominence_threshold on the spectrapeaks,_find_peaks(y,prominenceprominence_threshold)# Create a vector where spikes will be flag: no spike 0, spike 1.spikesnp.zeros(len(y))# Calculation of the widths of the found peakswidthspeak_widths(y,peaks)[0]# Calculation of the range where the spectral points are asumed to be corruptedwidths_ext_apeak_widths(y,peaks,rel_heightwidth_param_rel)[2]widths_ext_bpeak_widths(y,peaks,rel_heightwidth_param_rel)[3]# Flagging the area previously defined if the peak is considered a spike (width below width_threshold)fora,width,ext_a,ext_binzip(range(len(widths)),widths,widths_ext_a,widths_ext_b):ifwidthwidth_threshold:spikes[int(ext_a)-1:int(ext_b)2]1y_outy.copy()# Interpolation of corrupted pointsfori,spikeinenumerate(spikes):ifspike!0:# If we have an spike in position iwindownp.arange(i-moving_average_window,imoving_average_window1)# we select 2 ma 1 points around our spikewindow_exclude_spikeswindow[spikes[window]0]# From such interval, we choose the ones which are not spikesinterpolatorinterpolate.interp1d(window_exclude_spikes,y[window_exclude_spikes],kindinterp_type)# We use the not corrupted points around the spike to calculate the interpolationy_out[i]interpolator(i)# The corrupted point is exchanged by the interpolated value.returny_out然后带有该算法的函数可以应用于带有峰值的石墨烯光谱如下所示intensity_despikedspike_removal(intensity,width_threshold3,prominence_threshold20,moving_average_window10,width_param_rel0.8,interp_typelinear)fig,axplt.subplots(1,2,figsize(2*5,3))ax[0].plot(ramanshift,intensity,labelspike,colorred,linewidth0.9)ax[0].plot(ramanshift,intensity_despiked)ax[1].plot(ramanshift,intensity_despiked)plt.show()https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/f9d4520f321b03ec94650f47c718108d.png拉曼光谱中峰值移除的示例。图片来自作者。使用这种去除尖峰的方法你可以确保你的拉曼光谱干净可靠最小化伪影同时不丢失重要的光谱细节。该方法非常适合自动化尤其是在已知预期的最小峰宽的情况下使其非常适合大规模光谱数据集和高通量分析。希望你喜欢这个教程。如果有任何问题或者想分享你自己的拉曼数据挑战欢迎在评论中留言——我很想听听这个算法如何帮助你的项目准备好尝试一下吗你可以在这里下载 Jupyter Notebook。如果你觉得这对你有帮助请记得引用原始工作这对我会有很大帮助