从 C# 客户端使用 Python 脚本(包括绘图和图像)

演示如何从 C# 运行 Python 脚本

AI吧Python

介绍

本文介绍了一个类,可让您从 C# 客户端 ( PythonRunner) 运行 Python 脚本。脚本可以生成文本输出以及将转换为 C# Image的图像。这样,PythonRunner该类不仅使 C# 应用程序可以访问数据科学、机器学习和人工智能的世界,而且还使 Python 用于图表、绘图和数据可视化的详尽库(例如matplotlibseaborn)可用于 C# .

背景

一般注意事项

我是一名 C# 开发人员已有十多年了——我能说什么:多年来,我深深地爱上了这门语言。它为我提供了极大的架构灵活性,拥有庞大的社区支持,以及大量支持几乎所有可以想到的用例的第三方工具(免费和商业,其中大部分是开源的)。C# 是一种通用语言,也是业务应用程序开发的第一选择。

在过去的几个月里,我开始学习用于数据科学、机器学习、人工智能和数据可视化的 Python——主要是因为我认为这项技能将推动我作为自由软件开发人员的职业生涯。我很快意识到 Python 非常适合上述任务(远胜于 C#),但它完全不适合开发和维护大型业务应用程序。所以C# 与 Python的问题(互联网上广泛讨论的话题)完全没有抓住重点。C# 适用于从事商业规模应用程序开发工作的开发人员 – Python 适用于擅长数据科学、机器学习和数据可视化的数据科学家。这两个工作没有太多共同点。

任务不在于 C# 开发人员额外成为数据科学家或机器学习专家(或相反)。成为这两个领域的专家实在是太过分了。这是恕我直言,微软的ML.NETSciSharp STACK等组件不会被广泛使用的主要原因。一般的 C# 开发人员不会成为数据科学家,数据科学家也不会学习 C#。他们为什么会?他们已经拥有非常适合他们需求的出色编程语言,并且拥有庞大的科学第三方库生态系统。

考虑到这些考虑,我开始寻找一种更简单、更“自然”的方式来将 Python 世界和 C# 世界结合在一起。这是一种可能的解决方案…

示例应用程序

在我们深入细节之前,先做一个简短的初步说明:我编写这个示例应用程序的唯一目的是演示 C#/Python 集成,只使用了一些稍微复杂的 Python 代码。我不太关心 ML 代码本身是否有用的问题,所以请在这方面宽恕。

话虽如此,让我简要介绍一下示例应用程序。基本上,它:

  1. 为您提供可以从 (6-30) 中选择的股票列表,
  2. 绘制(标准化)每月股票价格的汇总折线图,
  3. 根据价格走势执行所谓的“k 均值聚类分析”,并将结果显示在treeview.

让我们一个一个地快速浏览应用程序的三个部分……

选股

应用程序窗口左侧的DataGrid为您提供可供选择的可用股票列表。您需要至少选择六个项目才能采取进一步行动(所选股票的最大数量为 30)。您可以使用顶部的控件来过滤列表。此外,可以通过单击列标题对列表进行排序。Check Random Sample按钮从列表中随机选择 18 只股票。

调整其他参数

除了选股,您还可以调整分析的其他参数:分析的日期范围和用于 k 均值分析的聚类元参数的数量。此数量不能大于所选股票的数量。

分析结果

如果您完成了选股和参数调整,您可以按窗口右下角的分析按钮。这将(异步)调用执行上述步骤的 Python 脚本(绘制图表并执行 k 均值聚类分析)。返回时,它将处理并显示脚本的输出。

窗口的中间部分是一个图表,其中包含所选股票的价格,标准化使得开始日期的价格设置为零,股票价格从这个起始点按百分比变化。运行脚本产生的图像被包装在ZoomBox控件中,以增强可访问性和用户体验。

在窗口的最右侧,显示一棵树,其中包含聚类分析的处理结果。它根据股票的相对价格变动对股票进行分组(分组)(换句话说:两只股票越靠近,它们就越有可能在同一个集群中)。这棵树也用作图表的颜色图例。

兴趣点

守则的主要结构

一般来说,项目包括:

  • C# 文件
  • 脚本子文件夹(chart.pykmeans.pycommon.py)中的 Python 脚本
  • 由 C# 代码和 Python 脚本访问的 SQLite 数据库 ( stockdata.sqlite)

其他需要注意的事项:

  • 在 C# 方面,使用 EF6 和此 Codeproject 文章中的配方访问数据库。
  • 一些 WPF UI 控件来自Extended WPF Toolkit™
  • 当然,必须在目标系统上安装包含所有必需包的 Python 环境。相应的路径是通过app.config文件配置的。
  • 应用程序的 C# 部分使用 WPF 并遵循 MVVM 模式。根据应用程序主窗口的三重整体结构,共有三个视图模型(StockListViewModelChartViewModelTreeViewViewModel),由第四个视图模型( )编排MainViewModel

C# 方面

PythonRunner类_

运行 Python 脚本的核心组件是PythonRunner类。它基本上是Process类的包装器,专门用于 Python。它支持同步和异步的文本输出和图像输出。这是public这个类的接口,以及解释细节的代码注释:

/// <summary>
/// A specialized runner for python scripts. Supports textual output
/// as well as image output, both synchronously and asynchronously.
/// </summary>
/// <remarks>
/// You can think of <see cref="PythonRunner" /> instances <see cref="Process" />
/// instances that were specialized for Python scripts.
/// </remarks>
/// <seealso cref="Process" />
public class PythonRunner
{
    /// <summary>
    /// Instantiates a new <see cref="PythonRunner" /> instance.
    /// </summary>
    /// <param name="interpreter">
    /// Full path to the Python interpreter ('python.exe').
    /// </param>
    /// <param name="timeout">
    /// The script timeout in msec. Defaults to 10000 (10 sec).
    /// </param>
    /// <exception cref="ArgumentNullException">
    /// Argument <paramref name="interpreter" /> is null.
    /// </exception>
    /// <exception cref="FileNotFoundException">
    /// Argument <paramref name="interpreter" /> is an invalid path.
    /// </exception>
    /// <seealso cref="Interpreter" />
    /// <seealso cref="Timeout" />
	public PythonRunner(string interpreter, int timeout = 10000) { ... }

	/// <summary>
	/// Occurs when a python process is started.
	/// </summary>
	/// <seealso cref="PyRunnerStartedEventArgs" />
	public event EventHandler<PyRunnerStartedEventArgs> Started;

	/// <summary>
	/// Occurs when a python process has exited.
	/// </summary>
	/// <seealso cref="PyRunnerExitedEventArgs" />
	public event EventHandler<PyRunnerExitedEventArgs> Exited;

    /// <summary>
    /// The Python interpreter ('python.exe') that is used by this instance.
    /// </summary>
    public string Interpreter { get; }

    /// <summary>
    /// The timeout for the underlying <see cref="Process" /> component in msec.
    /// </summary>
    /// <remarks>
    /// See <see cref="Process.WaitForExit(int)" /> for details about this value.
    /// </remarks>
    public int Timeout { get; set; }

    /// <summary>
    /// Executes a Python script and returns the text that it prints to the console.
    /// </summary>
    /// <param name="script">Full path to the script to execute.</param>
    /// <param name="arguments">Arguments that were passed to the script.</param>
    /// <returns>The text output of the script.</returns>
    /// <exception cref="PythonRunnerException">
    /// Thrown if error text was outputted by the script (this normally happens
    /// if an exception was raised by the script). <br />
    /// -- or -- <br />
    /// An unexpected error happened during script execution. In this case, the
    /// <see cref="Exception.InnerException" /> property contains the original
    /// <see cref="Exception" />.
    /// </exception>
    /// <exception cref="ArgumentNullException">
    /// Argument <paramref name="script" /> is null.
    /// </exception>
    /// <exception cref="FileNotFoundException">
    /// Argument <paramref name="script" /> is an invalid path.
    /// </exception>
    /// <remarks>
    /// Output to the error stream can also come from warnings, that are frequently
    /// outputted by various python package components. These warnings would result
    /// in an exception, therefore they must be switched off within the script by
    /// including the following statement: <c>warnings.simplefilter("ignore")</c>.
    /// </remarks>
    public string Execute(string script, params object[] arguments) { ... }

	/// <summary>
	/// Runs the <see cref="Execute"/> method asynchronously. 
	/// </summary>
	/// <returns>
	/// An awaitable task, with the text output of the script as 
    /// <see cref="Task{TResult}.Result"/>.
	/// </returns>
	/// <seealso cref="Execute"/>
    public Task<string> ExecuteAsync(string script, params object[] arguments) { ... }

	/// <summary>
	/// Executes a Python script and returns the resulting image 
    /// (mostly a chart that was produced
	/// by a Python package like e.g. <see href="https://matplotlib.org/">matplotlib</see> or
	/// <see href="https://seaborn.pydata.org/">seaborn</see>).
	/// </summary>
	/// <param name="script">Full path to the script to execute.</param>
	/// <param name="arguments">Arguments that were passed to the script.</param>
	/// <returns>The <see cref="Bitmap"/> that the script creates.</returns>
	/// <exception cref="PythonRunnerException">
	/// Thrown if error text was outputted by the script (this normally happens
	/// if an exception was raised by the script). <br/>
	/// -- or -- <br/>
	/// An unexpected error happened during script execution. In this case, the
	/// <see cref="Exception.InnerException"/> property contains the original
	/// <see cref="Exception"/>.
	/// </exception>
	/// <exception cref="ArgumentNullException">
	/// Argument <paramref name="script"/> is null.
	/// </exception>
	/// <exception cref="FileNotFoundException">
	/// Argument <paramref name="script"/> is an invalid path.
	/// </exception>
	/// <remarks>
	/// <para>
	/// In a 'normal' case, a Python script that creates a chart would show this chart
	/// with the help of Python's own backend, like this.
	/// <example>
	/// import matplotlib.pyplot as plt
	/// ...
	/// plt.show()
	/// </example>
	/// For the script to be used within the context of this <see cref="PythonRunner"/>,
	/// it should instead convert the image to a base64-encoded string and print this string
	/// to the console. The following code snippet shows a Python method (<c>print_figure</c>)
	/// that does this:
	/// <example>
	/// import io, sys, base64
	/// 
	/// def print_figure(fig):
	/// 	buf = io.BytesIO()
	/// 	fig.savefig(buf, format='png')
	/// 	print(base64.b64encode(buf.getbuffer()))
	///
	/// import matplotlib.pyplot as plt
	/// ...
	/// print_figure(plt.gcf()) # the gcf() method retrieves the current figure
	/// </example>
	/// </para><para>
	/// Output to the error stream can also come from warnings, that are frequently
	/// outputted by various python package components. These warnings would result
	/// in an exception, therefore they must be switched off within the script by
	/// including the following statement: <c>warnings.simplefilter("ignore")</c>.
	/// </para>
	/// </remarks>
    public Bitmap GetImage(string script, params object[] arguments) { ... }

 	/// <summary>
	/// Runs the <see cref="GetImage"/> method asynchronously. 
	/// </summary>
	/// <returns>
	/// An awaitable task, with the <see cref="Bitmap"/> that the script
	/// creates as <see cref="Task{TResult}.Result"/>.
	/// </returns>
	/// <seealso cref="GetImage"/>
    public Task<Bitmap> GetImageAsync(string script, params object[] arguments) { ... }
}

检索股票数据

如前所述,示例应用程序使用 SQLite 数据库作为其数据存储(Python 端也可以访问该数据库 – 见下文)。为此,使用了实体框架以及本 Codeproject 文章中的配方。然后将股票数据放入支持过滤和排序的ListCollectionView中:

private void LoadStocks()
{
	var ctx = new SQLiteDatabaseContext(_mainVm.DbPath);

	var itemList = ctx.Stocks.ToList().Select(s => new StockItem(s)).ToList();
	_stocks = new ObservableCollection<StockItem>(itemList);
	_collectionView = new ListCollectionView(_stocks);

	// Initially sort the list by stock names
	ICollectionView view = CollectionViewSource.GetDefaultView(_collectionView);
	view.SortDescriptions.Add(new SortDescription("Name", ListSortDirection.Ascending));
}

获取文本输出

在这里,PythonRunner正在调用一个产生文本输出的脚本。该KMeansClusteringScript属性指向要执行的脚本:

/// <summary>
/// Calls the python script to retrieve a textual list that 
/// will subsequently be used for building the treeview.
/// </summary>
/// <returns>True on success.</returns>
private async Task<string> RunKMeans()
{
	TreeViewText = Processing;
	Items.Clear();

	try
	{
		string output = await _mainVm.PythonRunner.ExecuteAsync(
			KMeansClusteringScript,
			_mainVm.DbPath,
			_mainVm.TickerList,
			_mainVm.NumClusters,
			_mainVm.StartDate.ToString("yyyy-MM-dd"),
			_mainVm.EndDate.ToString("yyyy-MM-dd"));

		return output;
	}
	catch (Exception e)
	{
		TreeViewText = e.ToString();
		return string.Empty;
	}
}

这是脚本生成的一些示例输出:

0 AYR 0,0,255
0 PCCWY 0,100,0
0 HSNGY 128,128,128
0 CRHKY 165,42,42
0 IBN 128,128,0
1 SRNN 199,21,133
...
4 PNBK 139,0,0
5 BOTJ 255,165,0
5 SPPJY 47,79,79

第一列是 k 均值分析的簇号,第二列是相应股票的股票代码,第三列表示用于在图表中绘制该股票线的颜色的 RGB 值。

获取图像

这是使用 viewmodel 的PythonRunner实例异步调用所需的 Python 脚本(其路径存储在DrawSummaryLineChartScript属性中)以及所需的脚本参数的方法。然后,一旦可用,结果就会被处理为“WPF 友好”的形式:

/// <summary>
/// Calls the python script to draw the chart of the selected stocks.
/// </summary>
/// <returns>True on success.</returns>
internal async Task<bool> DrawChart()
{
	SummaryChartText = Processing;
	SummaryChart = null;

	try
	{
		var bitmap = await _mainVm.PythonRunner.GetImageAsync(
			DrawSummaryLineChartScript,
			_mainVm.DbPath,
			_mainVm.TickerList,
			_mainVm.StartDate.ToString("yyyy-MM-dd"),
			_mainVm.EndDate.ToString("yyyy-MM-dd"));

		SummaryChart = Imaging.CreateBitmapSourceFromHBitmap(
			bitmap.GetHbitmap(),
			IntPtr.Zero,
			Int32Rect.Empty,
			BitmapSizeOptions.FromEmptyOptions());

		return true;
	}
	catch (Exception e)
	{
		SummaryChartText = e.ToString();
		return false;
	}
}

Python 方面

抑制警告

需要注意的重要一点是,PythonRunner一旦被调用的脚本写入stderr. 当 Python 代码由于某种原因引发错误时就是这种情况,在这种情况下,需要重新引发错误。stderr但是,如果某些组件发出无害的警告,例如当某些东西很快被弃用,或者某些东西被初始化两次,或者任何其他小问题时,脚本也可能会写入。在这种情况下,我们不想中断执行,而只是忽略警告。下面代码段中的语句正是这样做的:

import warnings

...

# Suppress all kinds of warnings (this would lead to an exception on the client side).
warnings.simplefilter("ignore")
...

解析命令行参数

如我们所见,C#(客户端)端调用具有可变数量位置参数的脚本。参数通过命令行提交给脚本。这意味着脚本“理解”这些参数并相应地对其进行解析。提供给 Python 脚本的命令行参数可通过sys.argv string数组访问。下面的代码片段来自kmeans.py脚本并演示了如何执行此操作:

import sys

...

# parse command line arguments
db_path = sys.argv[1]
ticker_list = sys.argv[2]
clusters = int(sys.argv[3])
start_date = sys.argv[4]
end_date = sys.argv[5]
...

检索股票数据

Python 脚本使用与 C# 代码相同的 SQLite 数据库。这是通过将数据库路径作为应用程序设置存储在 C# 端的app.config中,然后作为参数提交给调用的 Python 脚本来实现的。上面,我们已经看到了这是如何从调用方以及 Python 脚本中的命令行参数解析完成的。现在是 Python 辅助函数,它从参数构建 SQL 语句并将所需数据加载到dataframe数组中(使用sqlalchemy Python 包):

from sqlalchemy import create_engine
import pandas as pd

def load_stock_data(db, tickers, start_date, end_date):
    """
    Loads the stock data for the specified ticker symbols, and for the specified date range.
    :param db: Full path to database with stock data.
    :param tickers: A list with ticker symbols.
    :param start_date: The start date.
    :param end_date: The start date.
    :return: A list of time-indexed dataframe, one for each ticker, ordered by date.
    """

    SQL = "SELECT * FROM Quotes WHERE TICKER IN ({}) AND Date >= '{}' AND Date <= '{}'"\
          .format(tickers, start_date, end_date)

    engine = create_engine('sqlite:///' + db)

    df_all = pd.read_sql(SQL, engine, index_col='Date', parse_dates='Date')
    df_all = df_all.round(2)

    result = []

    for ticker in tickers.split(","):
        df_ticker = df_all.query("Ticker == " + ticker)
        result.append(df_ticker)

    return result

文本输出

对于 Python 脚本,生成可从 C# 端使用的文本输出仅意味着:像往常一样打印到控制台。调用PythonRunner类将处理其他所有事情。这是来自kmeans.py的片段,它产生上面看到的文本:

# Create a DataFrame aligning labels and companies.
df = pd.DataFrame({'ticker': tickers}, index=labels)
df.sort_index(inplace=True)

# Make a real python list.
ticker_list = list(ticker_list.replace("'", "").split(','))

# Output the clusters together with the used colors
for cluster, row in df.iterrows():

	ticker = row['ticker']
	index = ticker_list.index(ticker)
	rgb = get_rgb(common.COLOR_MAP[index])

	print(cluster, ticker, rgb)

图像输出

图像输出与文本输出没有太大区别:首先,脚本像往常一样创建所需的图形。然后,不是调用该show()方法来使用 Python 自己的后端显示图像,而是将其转换为 abase64 string并将其打印string到控制台。你可以使用这个辅助函数:

import io, sys, base64

def print_figure(fig):
	"""
	Converts a figure (as created e.g. with matplotlib or seaborn) to a png image and this 
	png subsequently to a base64-string, then prints the resulting string to the console.
	"""
	
	buf = io.BytesIO()
	fig.savefig(buf, format='png')
	print(base64.b64encode(buf.getbuffer()))

在您的主脚本中,您可以像这样调用辅助函数(该gcf()函数只是获取当前图形):

import matplotlib.pyplot as plt
...
# do stuff
...
print_figure(plt.gcf())

然后在 C# 客户端,这个由 使用的小助手类PythonRunner会将其转换string图像(准确地说是位图):

/// <summary>
/// Helper class for converting a base64 string (as printed by
/// python script) to a <see cref="Bitmap" /> image.
/// </summary>
internal static class PythonBase64ImageConverter
{
	/// <summary>
	/// Converts a base64 string (as printed by python script) to a <see cref="Bitmap" /> image.
	/// </summary>
	public static Bitmap FromPythonBase64String(string pythonBase64String)
	{
		// Remove the first two chars and the last one.
		// First one is 'b' (python format sign), others are quote signs.
		string base64String = pythonBase64String.Substring(2, pythonBase64String.Length - 3);

		// Convert now raw base46 string to byte array.
		byte[] imageBytes = Convert.FromBase64String(base64String);

		// Read bytes as stream.
		var memoryStream = new MemoryStream(imageBytes, 0, imageBytes.Length);
		memoryStream.Write(imageBytes, 0, imageBytes.Length);

		// Create bitmap from stream.
		return (Bitmap)Image.FromStream(memoryStream, true);
	}
}

发表评论

邮箱地址不会被公开。 必填项已用*标注