[ee122] What constitutes a "spurious token"?

Daniel Killebrew dank at eecs.berkeley.edu
Mon Sep 24 13:53:14 PDT 2007



Richard Schmidt wrote:
> The specs say that the version numbers for HTTP can be more than one 
> digit long, and the space between the CRLF can be zero or greater, so 
> I don't know when a character stops being attached to the version 
> number and becomes a 'spurious token'.
>  
A character stops 'being attached' to the HTTP Version token once it not 
longer matches what constitutes an HTTP Version token, as specified by 
the BNF grammar. That sentence may seem obvious, but it is the answer to 
your question. Continue reading below for further explanation.
> I suppose that you mean it's "floating" as in seperated from the 
> version number by at least one whitespace character.
>  
> so:
>  
> GET /index.html HTTP/1.0aCRLF --> Invalid Version Number error
>  
> but:
>  
> GET /index.html HTTP/1.0 aCRLF --> Spurious Token error
>  
>
> ____
>
> HTTP-Version = "HTTP" "/" +DIGIT "." +DIGIT
>
> Request-Line = Method +Space Request-URI +Space HTTP-Version *Space CRLF
>
> Spurious text appearing between the version token and the end of the 
> line. --> ERROR -- Spurious token before CRLF.
>
> ____
>
> So I guess for something to be a token, it must be seperated by 
> whitspace from another 'token'.
>
Not exactly.
>
> Rick
>
Something continues to be part of HTTP-Version as long as it matches the 
BNF grammar. For example, let's match against the following string: 
"HTTP/321441.32132121a"
It would match ""HTTP/321441.32132121" and stop matching at the 'a', 
since it is not a digit, and we were matching one or more digits at that 
location. Instead of an 'a' at that location, it could have been a 
space, a CRLF, or anything besides a digit, and that would have caused 
the match to stop.

After you have read in the "HTTP/" <one or more digits> "." you are now 
looking for one or more digits (greedily). So you consume characters 
that are digits, until you encounter something that is not a digit. Once 
you encounter a non-digit, your match of HTTPVersion has come to an end. 
Assuming you did consume at least one digit, the match was successful.

Now the BNF grammar says you may encounter zero or more characters of 
Space class (which I believe include spaces and horizontal tabs), 
followed by CRLF. If you matched HTTP Version, but ran into something 
after it that wasn't in the grammar (before finding CRLF), that's a 
spurious token before your CRLF. So by reading the grammar, you see that 
after a successful match of the HTTP Version token, the only things that 
are allowed before a CRLF is zero or more characters of type Space.


Daniel




More information about the ee122 mailing list