[ee122] What constitutes a "spurious token"?
dank at eecs.berkeley.edu
Mon Sep 24 13:53:14 PDT 2007
Richard Schmidt wrote:
> The specs say that the version numbers for HTTP can be more than one
> digit long, and the space between the CRLF can be zero or greater, so
> I don't know when a character stops being attached to the version
> number and becomes a 'spurious token'.
A character stops 'being attached' to the HTTP Version token once it not
longer matches what constitutes an HTTP Version token, as specified by
the BNF grammar. That sentence may seem obvious, but it is the answer to
your question. Continue reading below for further explanation.
> I suppose that you mean it's "floating" as in seperated from the
> version number by at least one whitespace character.
> GET /index.html HTTP/1.0aCRLF --> Invalid Version Number error
> GET /index.html HTTP/1.0 aCRLF --> Spurious Token error
> HTTP-Version = "HTTP" "/" +DIGIT "." +DIGIT
> Request-Line = Method +Space Request-URI +Space HTTP-Version *Space CRLF
> Spurious text appearing between the version token and the end of the
> line. --> ERROR -- Spurious token before CRLF.
> So I guess for something to be a token, it must be seperated by
> whitspace from another 'token'.
Something continues to be part of HTTP-Version as long as it matches the
BNF grammar. For example, let's match against the following string:
It would match ""HTTP/321441.32132121" and stop matching at the 'a',
since it is not a digit, and we were matching one or more digits at that
location. Instead of an 'a' at that location, it could have been a
space, a CRLF, or anything besides a digit, and that would have caused
the match to stop.
After you have read in the "HTTP/" <one or more digits> "." you are now
looking for one or more digits (greedily). So you consume characters
that are digits, until you encounter something that is not a digit. Once
you encounter a non-digit, your match of HTTPVersion has come to an end.
Assuming you did consume at least one digit, the match was successful.
Now the BNF grammar says you may encounter zero or more characters of
Space class (which I believe include spaces and horizontal tabs),
followed by CRLF. If you matched HTTP Version, but ran into something
after it that wasn't in the grammar (before finding CRLF), that's a
spurious token before your CRLF. So by reading the grammar, you see that
after a successful match of the HTTP Version token, the only things that
are allowed before a CRLF is zero or more characters of type Space.
More information about the ee122