关于《C++ Primer》中cin.fail可能造成死循环的问题分析

在《C++ Primer》一书中有这样一个例子（如果是中文第四版，就在P248）：

int ival;
while (cin >> ival, !cin.eof())
{
	if (cin.bad())
		throw runtime_error("IO stream corrupted");
	if(cin.fail())
	{
		cerr << "bad data, try again";
		cin.clear(istream::failbit);

		continue;
	}
}

该段程序要实现的功能是循环不断读入cin，直到到达文件结束符或者发生不可恢复的读取错误为止。但该例程是有问题的，一旦走入if(cin.fail()) 分支，将陷入死循环。主要是有两方面的问题，下面逐一讨论。

问题1：对cin.clear的理解

对于C++的IO流，有一个条件状态（condition state）的概念。条件状态用于标记给定的IO对象（流）是否处于可用状态，或者是碰到了哪种特定的错误。目前，有四种枚举状态：

状态	条件状态检测函数	状态说明
badbit	s.bad()	用于指出被破坏的流，不可恢复
eofbit	s.eof()	用于指出流已经到达文件结束符
failbit	s.fail()	用于指出失败的IO操作
goodbit	s.good()	用于指出流是OK的

同时，提供了几个修改流状态的API：

函数名	函数作用
s.clear()	将流s的状态设置为goodbit状态
s.clear(flag)	将流s的状态设置为flag状态，flag的类型是ios_base::iostate类型
s.setstate(flag)	给流s添加指定条件
s.rdstate()	返回流s的当前条件，返回值类型为ios_base::iostate类型

这里需要重点注意的是clear成员函数。先看cpluscplus.com网站的说明：

public member function
<ios> <iostream>
std::ios::clear
void clear (iostate state = goodbit);
Set error state flags
Sets a new value for the stream's internal error state flags.
The current value of the flags is overwritten: All bits are replaced by those in
state
; If
state
is
goodbit
(which is zero) all error flags are cleared.
In the case that no stream buffer is associated with the stream when this function is called, the
badbit
flag is automatically set (no matter the value for that bit passed in argument
state
).
Note that changing the
state
may throw an exception, depending on the latest settings passed to member
exceptions
.
The current state can be obtained with member function rdstate.

而《C Primer》中文版一书中对于s.clear()的解释为：将流所有状态值都重设为有效状态；对于s.clear(flag)的解释为：将流s中某个指定条件状态设置为有效。显然，后者的解释还讲的通，但前者的解释就是有问题的。但是，书中对于clear函数的描述还是有些模糊。其实，该函数的作用与它的名字大相径庭。从上面的英文解释中可以看出，该函数用于设置流内部的错误状态。而且默认值为goodbit。也就是说，s.clear()就相当于s.clear(goodbit)，作用是将流s的状态设置为goodbit。而s.clear(flag)的作用就是讲流s的状态设置为flag。

这样，我们就发现了上述程序的第一个错误之处：cin.clear(istream::failbit); 当流出现错误时，程序的本意是提醒用户，然后恢复流状态。但是，这里使用cin.clear(istream::failbit); 语句并不是恢复流状态，而是将流设置为failbit状态，即错误状态，这样下一次循环又进入出错分支。所以应该将原代码中的cin.clear(istream::failbit); 改为cin.clear()或者cin.clear(istream::goodbit).然后我们再看另外一个问题。

问题2：缓冲区的问题

每一个IO对象都管理一个缓冲区，输入时先把内容输入到缓冲区中，当缓冲区被刷新时将内容写入到真是的输出设备或者文件，缓冲区被刷新有以下几种情况：

程序正常结束，作为main函数返回的一部分，将清空所有的缓冲区
在一些不确定的情况下缓冲区可能已经满了，因此在写下一个值之前会对缓存区进行刷新
用操纵符显示的进行刷新：（1）endl:刷新流，输出内容换行（2）flush：刷新流，但不在输出中添加任何形式的字符（3）ends：在缓冲区中插入空字符null并刷新
在每次输出操作执行完后，用unitbuf操作符设置流的内部状态，从而清空缓冲区
将输入流与输出流关联。

当我们输入一个错误的输入（比如，此例中输入一个非int值），流状态被设置为failbit，然后走进if(cin.fail()) 分支。虽然我们用clear清除了错误状态。但之前输入的错误值仍然留在缓冲区里面，且等到continue后，又被cin读入，所以陷入了死循环。

所以，我们要在clear之后，再将错误的缓冲区清空。类似于C程序，我们可以读取缓冲区中的值，然后将其丢弃。C++的IO提供了两个可以成员函数可以使用：sync和ignore：

sync：

public member function
<istream> <iostream>
std::istream::ignore
istream& ignore (streamsize n = 1, int delim = EOF);
Extract and discard characters
Extracts characters from the input sequence and discards them, until either
n
characters have been extracted, or one compares equal to
delim
.
The function also stops extracting characters if the end-of-file is reached. If this is reached prematurely (before either extracting
n
characters or finding
delim
), the function sets the
eofbit
flag.
Internally, the function accesses the input sequence by first constructing a
sentry
object (with
noskipws
set to true). Then (if
good
), it extracts characters from its associated stream buffer object as if calling its member functions
sbumpc
or
sgetc
, and finally destroys the
sentry
object before returning.

ignore：

public member function
<istream> <iostream>
std::istream::ignore
istream& ignore (streamsize n = 1, int delim = EOF);
Extract and discard characters
Extracts characters from the input sequence and discards them, until either
n
characters have been extracted, or one compares equal to
delim
.
The function also stops extracting characters if the end-of-file is reached. If this is reached prematurely (before either extracting
n
characters or finding
delim
), the function sets the
eofbit
flag.
Internally, the function accesses the input sequence by first constructing a
sentry
object (with
noskipws
set to true). Then (if
good
), it extracts characters from its associated stream buffer object as if calling its member functions
sbumpc
or
sgetc
, and finally destroys the
sentry
object before returning.

上面是cplusplus.com网站的说明，下面还有两个精简的：

cin.ignore discards characters, up to the number specified, or until the delimiter is reached (if included). If you call it with no arguments, it discards one character from the input buffer.
For example, cin.ignore (80, 'n') would ignore either 80 characters, or as many as it finds until it hits a newline.
cin.sync discards all unread characters from the input buffer. However, it is not guaranteed to do so in each implementation. Therefore, ignore is a better choice if you want consistency.
cin.sync() would just clear out what's left. The only use I can think of for sync() that can't be done with ignore is a replacement for system ("PAUSE");:
cin.sync(); //discard unread characters (0 if none) cin.get(); //wait for input
With cin.ignore() and cin.get(), this could be a bit of a mixture:
cin.ignore (std::numeric_limits<</span>std::streamsize>::max(),'n'); //wait for newline//cin.get()
If there was a newline left over, just putting ignore will seem to skip it. However, putting both will wait for two inputs if there is no newline. Discarding anything that's not read solves that problem, but again, isn't consistent.

从上面的解释可以看出，二者各有利弊。因为我们没法确定错误输入的内容、长度，所以如果使用ignore来丢弃缓冲区的数据的话，ignore的参数指定将是一个问题。似乎，sync是一个更好的选择。但是sync也有问题，即不同平台实现可能有差异，移植性差。所以具体选会哪种，需要根据具体场景决定。

另外一个需要注意的问题是，不管是sync还是ignore，都一定要在恢复流状态为goodbit之后再使用，否则两个函数都不会起作用。具体可参加函数实现。

综上，修改后的程序代码为：

#include <iostream>
#include <stdexcept>

using namespace std;

int main()
{
	int ival;
	while (cin >> ival, !cin.eof())
	{
		if (cin.bad())
			throw runtime_error("IO stream corrupted");
		if(cin.fail())
		{
			cerr << "bad data, try again" << endl;

			cin.clear(istream::goodbit); // 或  cin.clear();
			cin.sync();	 // 或 cin.ignore();
			continue;
		}
	}

	return 0;
}

遗留问题：上述代码我使用MinGW（GCC版本为5.1.0）在Windows下编译发现cin.sync()和cin.ignore()都是OK的。但是在Ubuntu 14.04（GCC版本为4.8.2）下面编译，发现使用cin.sync还是会有死循环的问题。但是二者sync的实现是相同的，问题原因还不清楚。

本文讨论的这个例子虽然是一个小例子，但是涉及的知识却是在C++ IO里面比较重要的东西。

关于《C++ Primer》中cin.fail可能造成死循环的问题分析

评论已关闭