A most common problem while developing a compiler for c-like languages are
the c-strings, because the escape codes.
Instead of declaring them as normal symbols in lex most people decide to
make an ad-hoc solution for this kind of symbol. One day, I found a lex-solution
for this problem with many other interesting declarations.
The solution may be like this (note that not all possiblitites are considered,
but it's useful):
alpha [a-zA-Z_]
alnum [a-zA-Z_0-9]
oct [0-7]
dec [0-9]
hex [0-9a-fA-F]
sign [+-]?
exp ([eE]{sign}{dec}+)
L [lL]
X [xX]
%%
"#".*\n ECHO;
"/*"([^*]|"*"+[^/*])*"*"+"/"
printf("C comment\t%s\n", yytext);
"{"[^}]*"}" printf("Pascal
comment\t%s\n", yytext);
"(*"([^*]|"*"+[^*)])*"*"+")"
printf("Pascal comment\t%s\n", yytext);
"//".*$ printf("Objective
C comment\t%s\n", yytext);
\"([^"\\\n]|\\.|\\\n)*\"
printf("C string\t%s\n", yytext);
\`([^`\n]|\`\`)+\` printf("Pascal
`string`\t%s\n", yytext);
0{oct}+ printf("C int base 8\t%s\n",
yytext);
0{oct}+{L} printf("C long base
8\t%s\n", yytext);
0{X}{hex}+ printf("C int base 16\t%s\n",
yytext);
0{X}{hex}+{L} printf("C long base
16\t%s\n", yytext);
{dec}+ printf("C int\t\t%s\n",
yytext);
{dec}+{L} printf("C long\t\t%s\n",
yytext);
{dec}+"."{dec}*{exp}? |
{dec}*"."{dec}+{exp}? |
{dec}+{exp} printf("C double\t%s\n",
yytext);
\'[^'\\\n]\' |
\'\\[^0-7\n]\' |
\'\\([0-7]{1,3})\' printf("C char\t\t%s\n",
yytext);
{alpha}{alnum}* printf("C name\t\t%s\n",
yytext);
. |
\n ;
|