here code snippet using:
stringwriter writer = new stringwriter(); csvwriter csvwriter = new csvwriter(writer); string[] originalvalues = new string[2]; originalvalues[0] = "t\\est"; originalvalues[1] = "t\\est"; system.out.println("original values: " + originalvalues[0] +"," + originalvalues[1]); csvwriter.writenext(originalvalues); csvwriter.close(); csvreader csvreader = new csvreader(new stringreader(writer.tostring())); string[] resultingvalues = csvreader.readnext(); system.out.println("resulting values: " + resultingvalues[0] +"," + resultingvalues[1]);
the output of above snippet is:
original values: t\est,t\est resulting values: test,test
back slash ('\') character gone after conversion!!!
by basic analysis figured happening because csvreader
using slash ('\') default escape character csvwriter
using double quote ('"') default escape character.
what reason behind inconsistency in default behavior?
to fix above problem managed find following 2 solutions:
1) overwriting default escape character of csvreader null character:
csvparser csvparser = new csvparserbuilder().withescapechar('\0').build(); csvreader csvreader = new csvreaderbuilder(new stringreader(writer.tostring())).withcsvparser(csvparser).build();
2) using rfc4180parser strictly follows rfc4180 standards:
rfc4180parser rfc4180parser = new rfc4180parserbuilder().build(); csvreader csvreader = new csvreaderbuilder(new stringreader(writer.tostring())).withcsvparser(rfc4180parser).build();
can using of above approach cause side effects on other characters?
also why rfc4180parser
not default parser? maintaining backward compatibility rfc4180parser
got introduced in later versions?
i think looking @ 2 types of escaping here.
1) escaping double quote in csv:
test,"monitor 24"", samsung" test,"monitor 24\", samsung" // linux style
since have comma in second field, field has surrounded double quotes. double quotes inside field have escaped, ""
or \"
.
2) \
general escape character, example \t
(tab) or \n
(newline).
and since 'e'
not in list of characters escape, \
ignored , removed.
so if write "t\\\\est"
file contain "t\\est"
(escaped backslash) , show "t\est"
after reading. or writing "\\test"
show tab
, "est"
after reading.
to keep \
after reading, indeed have tell parser somehow ignore sequences, current behaviour doesn't inconsistent me - both treating \
escape character.
Comments
Post a Comment